What kinds of texts do I want to use? And where are they?
What kinds of word patterns am I interested in seeing?
How much time do I have to devote to this project?
Text mining centers on identifying patterns and trends in unstructured texts. This often involves using a program or software to “read” text files and provide data about them, including data on word frequencies, common word patterns, tone indicators, and more. It is sometimes referred to as a "distant reading" method, in which you take a step back to identify patterns in language across a large group of texts.
Many research questions and methods fall within the scope of text and data mining, including:
For more advanced text mining techniques, such as sentiment analysis (identifying the tone of a text or texts) or named entity recognition (identifying people, places, and names in a text or texts), researchers often have to code their own text mining environments. R and Python are two commonly used programming software for text mining. Further resources for using programming software for text mining are linked below.