What is text mining?
Text mining is a research practice that involves using computers to discover information in large amounts of unstructured text.
Unstructured text is data not formatted according to an encoding structure like HTML or XML.
Examples of unstructured data used for text mining include journal and news articles, blog posts, and email.
Researchers use text mining tasks such as:
- sentiment analysis
- entity extraction
- document summarization
By using these methods, researchers can make connections and draw conclusions about the content of large text corpora.
The image on the right is one example of what you can do with text mining. This pie chart represents the total words spoken by characters in the Jacobean play The Revenger's Tragedy.
Credit: Chart by Pgogy, available via Creative Commons license.