Skip to main content

University Library, University of Illinois at Urbana-Champaign

Introduction to OCR and Searchable PDFs: Downloading Tesseract

Learn OCR best practices and how to begin an OCR project using ABBYY FineReader, Adobe Acrobat Pro, or Tesseract with this guide.

Package Managers

When trying to download Tesseract, you may have difficulties because you need a package manager. A package manager (or package management system) is a collection of software tools that automates the instillation and removal of programs for your computer's operating system. If they do their job correctly, a package manager should eliminate the need for manual installs and updates, so they can be useful tools for users.

There are literally thousands of package managers to choose from, many of which you can download for free. Below are a few suggested options that are closely integrated with GitHub, but play around and find what works best for you and your system.

Important Links

Downloading Tesseract

Downloading Tesseract can be a little confusing, especially if you're not used to working with your Command Line Interface (CLI). But don't worry! We'll walk you through the steps to downloading Tesseract on this page.


The Basics

  1. Go to the Tesseract GitHub Wiki

  2. Find the instructions for your OS system


OS System and Package Managers

This is where things can get confusing. It is very important that you pay attention to what your system is, and what the specific needs of your system are. Some people -- namely, Mac users -- will either have to use or download a package management system to download Tesseract. Information on package managers is located in the left column of this page.

There is no one way to download Tesseract. You may find that what works for your computer may not work for the person sitting next to you. Don't worry about that. If you're having difficulties downloading Tesseract, email the Scholarly Commons, or come in during our hours and we can help you figure out which way will work for you.


An Important Note

You will need to make sure that you download both parts of Tesseract: the engine and the training data for a language. How you will do this will differ based on your OS system as well as what package manager you may be using. For example, you can download both Tesseract and all of the languages it naturally offers together at once using Homebrew with the command brew install tesseract --all-languages. If you don't want to take up the space on your computer, you can also choose individual languages and install them manually. Other package managers and OS systems may have similar options.

To see all of Tesseract's language options, and to download training data for individual languages, go to the tessdata GitHub page.


Installing Tesseract on Windows

Tesseract suggests you use the Tesseract installer from UB Mannheim (Mannheim University Library). From there, you can download the installer, and simply follow those directions. You can download older versions of Tesseract using the archive on SourceForge or by downloading the Cygwin package manager and downloading Tesseract through that software.


Installing Tesseract on Mac

For Mac, you will definitely need a package manager. The Tesseract GitHub Wiki suggests either MacPorts or Homebrew, though there are other options. Once you have your package manager settled, you just need to run a few commands in the Command Line Interface.

MacPorts

  • To install Tesseract:
sudo port install Tesseract
  • To install language data:
sudo port install tesseract -<langcode>

A list of langcodes is found on the MacPorts Tesseract page

Homebrew

  • To install Tesseract:
brew install tesseract
  • To install with all languages:
brew install tesseract --all-languages
  • To install languages individually:
brew install tesseract
mkdir -p ~/Downloads/tessdata
cd ~/Downloads/tessdata
wget <URL for language data>
  • For more information on installing individual languages manually, head to this link

Scholarly Commons

Scholarly Commons's picture
Scholarly Commons
Contact:
306 Main Library
Drop-ins welcome
Monday-Friday 8:30am-6:00pm
Phone: 217-244-1331
Website