Skip to main content

University Library, University of Illinois at Urbana-Champaign

A Guide to the HathiTrust Research Center: Glossary

An introductory guide to the tools and resources of the HathiTrust Research Center.

Glossary of Terms

  • Algorithms are executable programs that you can run on your workset. You can customize each algorithm's parameters.
  • An API, or Application Programming Interface, is a set of procedures that make data available for exchange. Users can retrieve HTRC volumes in bulk using the HTRC Data API within the HTRC Data Capsule environment.
  • A corpus is a collection of texts. For example, Hathi Trust has nearly 4 million volumes in its public domain corpus.
  • Jobs are what you submit when you run algorithms in HTRC. You can view the status of the jobs that you have submitted and delete jobs.
  • Non-consumptive  research involves computational analysis of one or more books without the researcher having the ability to reassemble the collection. With this analytical approach, you can detect trends in a corpus (e.g. 19th century literature) through machine processing instead of reading a book or collection of books. See Franco Moretti's Graphs, Maps, Trees for more information.
  • Results are the results of your job(s). You can either view the results in HTRC or download them.
  • The Sandbox is a good place to begin working with Hathi Trust data and tools. It has hundreds of thousands public domain volumes available as data.
  • Topic Modeling is a process that involves locating the major themes of a large volume of texts by identifying topics, or groups of words that frequently appear together.
  • Worksets are collections of volumes and other data to be processed.