Skip to Main Content University Library LibGuides

Introduction to Digital Humanities

A guide meant to help newcomers to digital humanities learn about what it is, where to get started, and what are common tools and methods used in the field.

What is Text Mining?

Text mining centers on identifying patterns and trends in unstructured texts. This often involves using a program or software to “read” text files and provide data about them, including data on word frequencies, common word patterns, tone indicators, and more. It is sometimes referred to as a "distant reading" method, in which you take a step back to identify patterns in language across a large group of texts. 

Many research questions and methods fall within the scope of text and data mining, including: 

  • Identifying word frequencies
  • Concordance (what passages mention specific key terms)
  • Keyness (how often key terms appear in certain texts when compared to others) 
  • Topic modelling (grouping key terms together to identify common themes and topics) 
  • Named entity recognition (identifying names of people, places, things across texts)
  • Sentiment analysis (identifying positive or negative tone)

Tools and Software

For more advanced text mining techniques, such as sentiment analysis (identifying the tone of a text or texts) or named entity recognition (identifying people, places, and names in a text or texts), researchers often have to code their own text mining environments. R and Python are two commonly used programming software for text mining. Further resources for using programming software for text mining are linked below. 

Resources

Example Text Mining Projects