Algorithms are executable programs that you can run on your workset. You can customize each algorithm's parameters.
An API, or Application Programming Interface, is a set of procedures that make data available for exchange. Users can retrieve HTRC volumes in bulk using the HTRC Data API within the HTRC Data Capsule environment.
A corpus is a collection of texts. For example, Hathi Trust has nearly 4 million volumes in its public domain corpus.
Jobs are what you submit when you run algorithms in HTRC. You can view the status of the jobs that you have submitted and delete jobs.
Non-consumptive research involves computational analysis of one or more books without the researcher having the ability to reassemble the collection. With this analytical approach, you can detect trends in a corpus (e.g. 19th century literature) through machine processing instead of reading a book or collection of books. See Franco Moretti's Graphs, Maps, Trees for more information.
Results are the results of your job(s). You can either view the results in HTRC or download them.
The Sandbox is a good place to begin working with Hathi Trust data and tools. It has hundreds of thousands public domain volumes available as data.
Topic Modeling is a process that involves locating the major themes of a large volume of texts by identifying topics, or groups of words that frequently appear together.
Worksets are collections of volumes and other data to be processed.