LibGuides: Text Mining Tools and Methods: Text Analysis Methods

Choosing a method

The text analysis method you choose will depend on your research question. When choosing a method to use, first consider what you expect to learn from your research and what form you would like your results to take. The methods described below can be combined in different ways during the course of a research project. For example, natural language processing algorithms might reveal the names of people in your text, to which you could apply network analysis to study how the actors are connected.

Word Frequencies

Computing word frequencies is a basic building block of higher level textual analysis algorithms, although they can sometimes be revealing in themselves. This can include raw word counts, or calculating the percentage of words in a text or set of texts and comparing that across texts or time. Frequencies can also be counted for "n-grams," or phrases with a certain number (n) of words.

Related Tools:

Word frequencies generated using HathiTrust bookworm

Related Tools Available Online:

Related Library Guides:

ATLAS.ti

Example Project Using Word Frequencies

Clement, T.E. (2008). ‘A Thing Not Beginning and Not Ending’: Using Digital Tools to Distant-Read Gertrude Stein’s The Making of Americans. Literary and Linguistic Computing, vol. 23(3), 361-81. http://doi.org/10.1093/llc/fqn020.

Machine Learning

Text analysis often relies on machine learning, a branch of computer science that trains computers to recognize patterns. There are two kinds of machine learning used in text analysis: supervised learning, where a human helps to train the pattern-detecting model, and unsupervised learning, where the computer finds patterns in text with little human intervention. An example of supervised learning is Naive Bayes Classification. See Natural Language Processing and Topic Modeling for examples of unsupervised machine learning.

Example Project Using Classification (Supervised Machine Learning):

Horton, R., Morrissey, R., Olsen, M., Roe, G., Voyer, R. (2009). Mining Eighteenth Century Ontologies: Machine Learning and Knowledge Classification in the Encyclopédie. Digital Humanities Quarterly, vol. (3)2. Retrieved from http://www.digitalhumanities.org/dhq/vol/3/2/000044/000044.html.

Topic Modeling

Topic modeling, a form of machine learning, is a way of identifying patterns and themes in a body of text. Topic modeling is done by statistical algorithms, such as Latent Dirichlet Allocation, which groups words into "topics" based on which words frequently co-occur in a text.

Related Tools:

Credit: Visualization by Digital Environmental Humanities available by CC BY-NC-SA 3.0

R (also available for free online)
Python (also available for free online)

Related Tools Available Online:

Related Library Guides:

"Text Mine HathiTrust" on HathiTrust

Example Project using Topic Modeling:

Mendenhall, R., Brown, N., Black, M., Van Moer, M., Lourentzou, I., Flynn, K., McKee, M., Zerai, A. (2016). Rescuing lost history: Using big data to recover black women's lived experiences. In Proceedings of XSEDE 2016: Diversity, Big Data, and Science at Scale (Vol. 17-21-July-2016). https://doi.org/10.1145/2949550.2949642. - Illinois Authors

Natural Language Processing

Natural language processing, a kind of machine learning, is the attempt to use computational methods to extract meaning from free text. Among other things, natural language processing algorithms can derive names of people and places, dates, sentiment, and parts of speech.

Related Tools:

Python (also available for free online)

Related Tools Available Online:

Related Library Guides:

"Text Mine HathiTrust" on HathiTrust

Example Project using Natural Language Processing:

Underwood, T., Bamman, D., & Lee, S. (2018). The Transformation of Gender in English-Language Fiction. Journal of Cultural Analytics. http://doi.org/10.22148/16.019. - Illinois Authors

Network and Citation Analysis

Network analysis is a method for finding connections between nodes representing people, concepts, sources, and more. These networks are usually visualized into graphs that show the interconnectedness of the nodes.

Citation analysis can be used to discover connections and relationships between various citations of documents and then visualized.

Related Tools Available Online:

Gephi (network analysis)
VOSViewer (citation and network analysis)

Example Project:

Kaufman, M. (2014-2015). Quantifying Kissinger. Retrieved from http://blog.quantifyingkissinger.com/.

Visualizations

Generating visualizations is a way to "see" your data. Text mining visualization can help researchers see relationships between certain concepts. An example of a visualization of data can be word clouds, graphs, maps, and other graphics that produce a visual depiction the data.

Related Tools:

Word cloud of Jane Austen's *Pride and Prejudice* created in Wordle

ATLAS.ti
NVivo
R (also available for free online)

Text Mining Tools and Methods

Choosing a method

Word Frequencies

Related Tools:

Related Tools Available Online:

Related Library Guides:

Example Project Using Word Frequencies

Machine Learning

Example Project Using Classification (Supervised Machine Learning):

Topic Modeling

Related Tools:

Related Tools Available Online:

Related Library Guides:

Example Project using Topic Modeling:

Natural Language Processing

Related Tools:

Related Tools Available Online:

Related Library Guides:​

Example Project using Natural Language Processing:

Network and Citation Analysis

Related Tools Available Online:

Example Project:

Visualizations

Related Tools:

Related Tools Available Online:

Related Library Guides:

Related Library Guides: