The text analysis method you choose will depend on your research question. When choosing a method to use, first consider what you expect to learn from your research and what form you would like your results to take. The methods described below can be combined in different ways during the course of a research project. For example, natural language processing algorithms might reveal the names of people in your text, to which you could apply network analysis to study how the actors are connected.
Computing word frequencies is a basic building block of higher level textual analysis algorithms, although they can sometimes be revealing in themselves. This can include raw word counts, or calculating the percentage of words in a text or set of texts and comparing that across texts or time. Frequencies can also be counted for "n-grams," or phrases with a certain number (n) of words.
Text analysis often relies on machine learning, a branch of computer science that trains computers to recognize patterns. There are two kinds of machine learning used in text analysis: supervised learning, where a human helps to train the pattern-detecting model, and unsupervised learning, where the computer finds patterns in text with little human intervention. An example of supervised learning is Naive Bayes Classification. See Natural Language Processing and Topic Modeling for examples of unsupervised machine learning.
Topic modeling, a form of machine learning, is a way of identifying patterns and themes in a body of text. Topic modeling is done by statistical algorithms, such as Latent Dirichlet Allocation, which groups words into "topics" based on which words frequently co-occur in a text.
Natural language processing, a kind of machine learning, is the attempt to use computational methods to extract meaning from free text. Among other things, natural language processing algorithms can derive names of people and places, dates, sentiment, and parts of speech.
Network analysis is a method for finding connections between nodes representing people, concepts, sources, and more. These networks are usually visualized into graphs that show the interconnectedness of the nodes.
Citation analysis can be used to discover connections and relationships between various citations of documents and then visualized.
Generating visualizations is a way to "see" your data. Text mining visualization can help researchers see relationships between certain concepts. An example of a visualization of data can be word clouds, graphs, maps, and other graphics that produce a visual depiction the data.
|
|