Text Mining Tools and Methods

This guide contains resources for researching with text mining

What is ATLAS.ti?

While ATLAS.ti is a tool primarily used for performing qualitative data analysis, where researchers apply codes to collections of unstructured text, it provides functionality for identifying and visualizing content that can be used for basic text analysis.

This tutorial is on the text analysis capabilities of ATLAS.ti only, for a more comprehensive introduction to the software, please refer to the ATLAS.ti guide.

ATLAS.ti Frequently Asked Questions

What are the advantages of using ATLAS.ti?

ATLAS.ti allows researchers to collect and consolidate primary data and evaluate their significance using a variety of tools. Since ATLAS.ti accepts a wide variety of data formats, it encourages drawing qualitative analytical connections between many different materials, from video and images to survey data to case study transcripts.

It maintains all related coding, memos, and annotations under the same project bundle. In addition, you can export reports, data visualizations and other analyses, and the full project file in multiple formats.

What are the disadvantages of ATLAS.ti?

ATLAS.ti offers many options for data analysis, but is not intuitive for a first-time user. The latest version of ATLAS.ti (version 8) is much more user-friendly than previous iterations. The program is also not designed for text mining specifically, but the tools available are versatile. Once you’ve imported your data and experimented with the tools, it becomes easy to build connections for your research.

What data formats can it ingest?

ATLAS.ti ingests many data types, including plain text, PDF, images, audio/video, geodata (including full Google Earth integration), spreadsheet data, and transcription files from software like F4/F5 and Transana.

How does it work?

ATLAS.ti allows you to essentially highlight quotations that correspond to certain categories; in qualitative data analysis, this is called coding. With your codes all set, you can create visualizations of codes and words, find patterns in your text, and more.

Get Data into ATLAS.ti

When you open ATLAS.ti for the first time, a splash screen will appear asking if you want to create a new project, or import an existing one. To start from scratch with your own documents, choose "Create New Project." If you already have an .altproj ATLAS.ti project, choose "Import Project Bundle." Mobile projects are used for projects in an iPad or Android format, and choose Legacy if you have a project from an older ATLAS.ti version (5, 6, or 7). If you have a "Hermeneutic Unit/HU" project, choose Legacy import!

the ATLAS.ti splash screen, options create new project, import project bundle, import mobile project, import legacy project

Important Note About Saving Your Project!

If you want to move your project file to a different computer, or otherwise create a copy to work from, you must Export! Saving the project only saves the your working copy stored in your AppData folder, which is inaccessible to the user. Exporting creates the neat .altproj file package, which you can then import into ATLAS.ti on another machine. However, if you linked documents (usually audio/video files too large for import), you need to bring those to the new machine as well. See information about linked documents below.

Word Frequencies

ATLAS.ti has a built-in function to generate a word cloud or a detailed list of each word with its number of occurrences. To access this feature, simply right click on a document from the left-hand panel and select Word Cloud or Word List. You can also create a Word Cloud or Word List from multiple documents, using the Document Manager. Open the Document Manager from the Home tab, select the desired documents, and repeat the above process. You can export these results to an Excel spreadsheet.

Stop Words

Stop words are common words that are removed from the count (words like "a," "am," etc.). You can choose whether to omit stop words or not when you select Word List. If there is an additional word you want omitted, select the "Remove from text before counting" checkbox and type the word into the box below. To add a stop word to a list from a Word Cloud, right click on the word and select "Add to stop word list."

Note: If you created a new project instead of importing a project bundle, you will need to create your own stop word list and point the program towards it. Try this website for stop words in many languages

Standard Text Search

As with webpages, PDFs, and just about any other computer-based text, ATLAS.ti offers exact match text search.  This tool is invaluable when it comes to finding certain words or phrases quickly.  This is another tool with multiple access points.  

Click the "Search Project" tab at the top of the screen (CTRL/CMD+F will not work).
Once you invoke the search, a box will appear prompting you for a Search Expression.  

search box

Notice you can select case sensitive and GREP options.  Case Sensitive will return exact matches with same case.  See more about GREP below. It is also good to note that using word roots can be an effective means of finding all instances of a word.  Verbs, for example, often end in "ing" or "ed," so just searching the root of those words will return all occurrences.  

The asterisk (*) is a wildcard operator.  Use this to find words with similar meaning but different spellings.  For example, to find both library and librarian, you could search librar*, which will return all occurrences of both words. Similarly, searching wom*n will return "woman" and "women."

Read more about text searching on the ATLAS.ti guide.

Query Tool

The Query Tool facilitates quotation retrieval via code and/or code group combinations. The Query Tool relies upon search expressions using three types of operators to build queries incrementally: Boolean or set operators, semantic, and proximity.

To access the Query Tool, select from the main toolbar Analysis > Query Tool.

query tool button, orange magnifying glass

The following operators are available using the Query Tool: OR, AND, ONE OF, NOT, UP, DOWN, SIBLINGS, CO-OCCURS, WITHIN, ENCLOSES, OVERLAPS, OVERLAPPED BY, FOLLOWS, PRECEDES. Operators are visible in the upper toolbar.

Double click the code or code group you want to analyze, then the operator from the top toolbar, then the other code or code group for comparison. Quotations that match your query will show up in the lower pane.