This guide was created by Erica Parker, Literatures and Languages Library graduate assistant.
This guide was updated Fall 2015 by Victoria Henry, Scholarly Commons graduate assistant.
This guide was updated Spring 2019 by Kayla Abner, Digital Humanities graduate assistant.
Text mining centers on identifying patterns and trends in unstructured texts. This often involves using a program or software to “read” text files and provide data about them, including data on word frequencies, common word patterns, tone indicators, and more. It is sometimes referred to as a "distant reading" method, in which you take a step back to identify patterns in language across a large group of texts.
Many research questions and methods fall within the scope of text and data mining, including:
Why do text mining?
Text mining helps researchers detect patterns and connections in large volumes of textual material.
According to researcher Marti Hearst, "In text mining, the goal is to discover heretofore unknown information, something that no one yet knows and so could not have yet written down." Text mining enables researchers to draw conclusions from large volumes of material they would not be able to otherwise read, synthesize, and incorporate into their scholarship.
Researchers in fields ranging from biological sciences to the humanities have begun using text mining to detect patterns and discover unknown information.
If you have questions about text mining, reach out to Scholarly Communications and Publishing (scp@library.illinois.edu).
Except where otherwise indicated, original content in this guide is licensed under a Creative Commons Attribution (CC BY) 4.0 license. You are free to share, adopt, or adapt the materials. We encourage broad adoption of these materials for teaching and other professional development purposes, and invite you to customize them for your own needs.