Skip to main content

University Library, University of Illinois at Urbana-Champaign

Text Mining Tools and Methods

This guide contains resources for researching with text mining

About Web-Based Tools

Web-based tools provide a variety of easy to use and manage visualization and analysis tools. Some of the possible tools available include word clouds, charts, graphics, and other analysis tools that create visual images and statistically interpret your text.

Web-based Tools Options

Analysis Tools

Lexos
Lexos is a great resource for visualizing large text sets through a web-based platform. The site has capabilities to upload multiple files, prepare, visualize, and analyze your data. The visualization tools encompassed in this tool include word clouds, multicloud, bubbleviz, and rollingwindow graph. The analysis tools included are statistical analysis, clustering, similarity query, and topword.

OverviewDocs
Overview is an open-source webtool designed for analyzing large sets of documents, and has unique features such as built-in OCR (Optical Character Recognition), clustering, tagging and metadata services. Overview requires an account to use. Only you can view documents you upload, but still be careful with any private data. Overview also has an API you can utilize to customize the tool. 

Social Media Macroscope
Developed at Illinois, the Social Media Macroscope has the goal of providing social media analytics tools and data to students and researchers. Check out tools like the Brand Analytics Environment to see how the public interacts with brands, or download a dataset.

Textalyser
Textalyser is an online text analysis tool that generates statistics about your text. This analysis tool provides instant results for analyzing word groups, keyword density, the prominence of word or expression, and word count.  

Voyant
Voyant is a web-based reading and analysis tool for digital texts.  The tool allows you to type in multiple URLs, paste in full text, or upload your own files for analysis. The site is a collaborative project by Stefan Sinclair and Geoffrey Rockwell specifically built for digital humanities projects. The site also provides helpful instruction guides for getting started and additional information about other Voyant tools. 

Word and Phrase
Word and Phrase is an online text analysis tool that has a variety of capabilities for analyzing text. Text can be copied and pasted into a text box or take advantage of the data from the Corpus of Contemporary American English (COCA). The tool will first highlight all the medium and lower-frequency words in the text and create lists of the words. Secondly, the words can be clicked upon to create a "word sketch" of any of the words--this will show their definitions and detailed information from from the COCA. Finally, the tool has the capability to conduct powerful searches on select phrases and show related phrases in the COCA.

Visualization Tools

Infogr.am
Infogr.am is a great tool for graphically visualizing your data. This web-based tool has the capability to create over 30+ charts and graphs, including maps, to visualize data for research. When your project is created through the site, you can publish your visualization and/or generate an embed code to post onto a blog or website.  

Tagxedo
Tagxedo is a web-based graphic word cloud generator that creates word clouds from famous speeches, news articles, slogans, tweets, etc. This resource is particularly valuable for creating word clouds from websites, news, web searches, RSS feeds, and Twitter IDs. While this resource does not have the capability of creating word clouds from copied and pasted text, it does provide great capabilities for creating word clouds from other data. 

Wordle
Wordle is a web-based word cloud tool that creates word clouds from text you provide. The tool has the capability to tweak images, fonts, layouts, and color schemes that allow you to customize your Wordle for your particular project. You can print or create a PDF of your project to share your work. Wordle is a very accessible resource and is incredibly easy to use. Its a great tool for getting started with creating word clouds and for small visualization projects.