Skip to Main Content
Glossary of Terms
- Algorithms are executable programs that you can run on your workset. You can customize each algorithm's parameters.
- An API, or Application Programming Interface, is a set of procedures that make data available for exchange. Users can retrieve HTRC volumes in bulk using the HTRC Data API within the HTRC Data Capsule environment.
- A corpus is a collection of texts. For example, Hathi Trust has nearly 4 million volumes in its public domain corpus.
- Jobs are what you submit when you run algorithms in HTRC. You can view the status of the jobs that you have submitted and delete jobs.
- Non-consumptive research involves computational analysis of one or more books without the researcher having the ability to reassemble the collection. With this analytical approach, you can detect trends in a corpus (e.g. 19th century literature) through machine processing instead of reading a book or collection of books. See Franco Moretti's Graphs, Maps, Trees for more information.
- Results are the results of your job(s). You can either view the results in HTRC or download them.
- The Sandbox is a good place to begin working with Hathi Trust data and tools. It has hundreds of thousands public domain volumes available as data.
- Topic Modeling is a process that involves locating the major themes of a large volume of texts by identifying topics, or groups of words that frequently appear together.
- Worksets are collections of volumes and other data to be processed.