This guide provides an overview of text mining and different text mining methods. It also shares several U of I-supported tools and software for researchers at all levels of experience.
This resource is a freeware tool for profiling the vocabulary level and complexity of texts. AntWord Profiler is a free download available for Windows, Mac OS X, or Linux.
Developed here at Illinois, ConText is a free, open-source application for performing a variety of text analysis techniques, including network graphs and topic models, based on textual data.
Gephi is an open graph visualization platform that supports exploration of all kinds of networks and complex systems. Gephi can be downloaded for free onto any Linux, Windows, or Mac OS X device.
Mallet is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
PhiloLogic is a full-text search, retrieval, and analysis tool developed by the ARTFL Project and the Digital Library Development Center (DLDC) at the University of Chicago. It is free software that can be downloaded for a wide range of systems.
Scrapy is an open source and collaborative framework for extracting the data you need from websites. It is available as a free download for Linux, Windows, Mac OS X.
Textal is a free smartphone app that allows you to analyze websites, tweet streams, and documents to explore the relationship between words in the text via an intuitive word cloud interface. The app allows you to generate graphs and statistics, as well as share the data and visualizations in any way you like. Textal is available as a free download from the App Store on your Apple iOS device.
TXM is a free, open source cross-platform Unicode & XML based text/corpus analysis environment and graphical client. It is available as a free download for Windows, Linux, and Mac OS X. It has a comprehensive range of analysis tools, such as concordances, collocate search, frequency list, etc.