The HTRC provides tools and services for doing text analysis with the HathiTrust collection
Algorithms
There are off-the-shelf algorithms built into the HTRC that you can use for basic text analysis processes, such as topic modeling or making a word cloud. Learn more on the HTRC documentation wiki: https://wiki.htrc.illinois.edu/x/HoJnAQ
HathiTrust+Bookworm
This visualization tool lets you explore word frequency over time. You can read more information in this guide: http://guides.library.illinois.edu/htbookworm or on the HTRC documentation wiki: https://wiki.htrc.illinois.edu/x/AoCXAQ
HTRC Derived Datasets
The HTRC releases datasets for text analysis, such as the Extracted Features dataset, which includes words, word counts, and page-level metadata for volumes in the HathiTrust. Learn more here: https://wiki.htrc.illinois.edu/x/WQCGAQ
HTRC Data Capsules
Researchers can provision their own secure virtual machine "capsule" for performing their own, advanced text analysis workflows. Results are vetted before they are released to the researcher. Documentation is available here: https://wiki.htrc.illinois.edu/x/SAFRAQ