Skip to main content

University Library, University of Illinois at Urbana-Champaign

Text Mining Tools and Methods

This guide contains resources for researching with text mining

Campus Resources

The following departments and projects on campus use text mining and other forms of data analysis.  They each provide assistance and resources for text mining.  

Center for Informatics Research in Science and Scholarship (CIRSS)

CIRSS is a center within the School of Information Sciences that focuses on information problems in scientific and scholarly research and how digital information can advance work in these areas. The Digital Humanities is a large area of interest to CIRSS concentration in scientific communication.

CITL Data Analytics

CITL is a campus initiative that provides consultations and workshops for popular software used by researchers and students. They provide specialized services for ATLAS.ti help.  To set up a consultation or request assistance email them at citl-data@illinois.edu.   

Cline Center for Advanced Social Research

The Cline Center's mission involves "advancing human flourishing around the world by using extreme-scale analysis of global news coverage to extract structured insight out of unstructured information."  The Center investigates political and social issues around the world, including civil unrest, through development and use of text analysis tools and by supporting fellowship programs and other forms of public engagement on campus.

Illinois Informatics Institute

The Illinois Informatics Institute is an on campus organization devoted to supporting "research projects that involve applications of computing and information in areas from the natural sciences, engineering and business to the humanities and social sciences." The Illinois Informatics Institute offers several educational programs including an undergraduate minor in informatics, MS in bioinformatics, and PhD in Informatics.

Illinois Program for Research in the Humanities (IPRH)

The Illinois Program for Research in the Humanities promotes interdisciplinary study in the humanities, arts, and social sciences at the University of Illinois. IPRH offers graduate and faculty fellowships for humanities studies, as well as hosts lectures, symposia, and discussions on humanities issue.

Institute for Computing in Humanities, Arts, and Social Sciences (I-CHASS)

The Institute for Computing in Humanities, Arts, and Social Sciences is a cross-disciplinary digital research initiative on campus. According to its website, I-CHASS "offer[s] humanities, arts, and social sciences scholars access to hardware, computer applications, graphical user interfaces and portals, and educational opportunities to train them to best use these resources."

HathiTrust Research Center (HTRC)

The HTRC is the research arm of the HathiTrust Digital Library (HTDL). The HathiTrust Consortium is a multi-institutional partnership that seeks to preserve the cultural record by digitizing books, serials, and other forms of information and making them available digitally through the HTDL. The Hathi Trust Research Center allows researchers to create and use data (worksets) from public domain materials for text mining and analysis. To get started, see the Introduction to Hathi Trust Research Center guide.

Research Data Service (RDS)

The RDS is a campus-wide resource to help researchers access best practices for data management and comply with funding-related data policies and standards.

Scholarly Commons

The Scholarly Commons Library is located on the 3rd floor of the Main library in room 306.  They provide provide research support for digital projects, and their space has ample software available on computers for university affiliates.  

Data Services at the Scholarly Commons

The Scholarly Commons provides many numeric and spatial data resources for University of Illinois researchers, including assistance in procuring, processing, and analyzing data.

Additional Resources

Digital Humanities @ Illinois

The emerging field of Digital Humanities (DH) connects humanities research with digital methods, and text mining is a chief example of this connection. Check out the Digital Humanities webpage produced by the Scholarly Commons for example projects, additional tools and resources, and more information on DH methods.

TAPoR

TAPoR is a free directory of text mining, analysis, and visualization tools. You can search and browse for tools, read reviews, find resources related to specific tools, and tag and comment on tools you have used.

Text Mining Online Class

"Text Mining and Analytics" class offered by the University of Illinois as part of Coursera's Data Mining Specialization (available for free or as a paid certification program).

HathiTrust Research Center (HTRC)

The HTRC is the research arm of the HathiTrust Digital Library (HTDL). The HathiTrust Consortium is a multi-institutional partnership that seeks to preserve the cultural record by digitizing books, serials, and other forms of information and making them available digitally through the HTDL. The Hathi Trust Research Center allows researchers to create and use data (worksets) from public domain materials for text mining and analysis. See the library guide on HathiTrust and all they offer here.

The Stone and the Shell

UIUC professor Ted Underwood uses text mining and analytics to study 18th/19th century literature. He posts about his research, including methods and data, on his blog.

Digital Research Tools (DiRT)

Digital Research Tools (formerly DiRT and now Bamboo Dirt) is an index that collects information about tools and resources for all types of digital projects. This is a great place to start to find tools for text mining projects.