Skip to main content

University Library, University of Illinois at Urbana-Champaign

Introduction to OCR and Searchable PDFs: Reading and Editing Documents

Learn OCR best practices and how to begin an OCR project using ABBYY FineReader, Adobe Acrobat Pro, or Tesseract with this guide.

The Top Panel

The top panel includes options for including new content in your PDF, or changing the appearance of the whole document. These include:

  • Add Text
  • Add Image
  • Link options
  • Cropping
  • Header & Footer options
  • Watermark options
  • Background options
  • Numbering

These are useful options for editing your PDF. That being said, some of these options are permanent, and can altar your PDF entirely, so you must be thoughtful before you use them.

The Format Box

The Format box -- located on the right panel -- allows you to make edits to text within your document. The options will only be applicable after you click inside one of the bounding boxes. You can only edit one bounding box at a time.

The Objects Box

The Objects box -- located on the right panel -- allows you to edit "objects" (mostly images or handwritten text) in your document. Options include flipping, cropping, deleting, and replacing. You can also click the 'Edit Using...' tool, which links up to Adobe Photoshop or Microsoft Paint for increased editing capabilities.

Performing OCR on an Imported Document

Once you have imported or scanned your document and are now on the Document panel, you are ready to perform OCR. Doing so is as simple as clicking the 'Edit PDF' option in the side Tools panel.

Once you press 'Edit PDF,' the program will automatically perform OCR on your document. 

For OCR to be performed on all of your pages in one read-through, you must go to 'Scanned Document Editing Settings' option in the side Tools panel. (Scanned Documents>Settings) and check the box that reads 'Make all the pages editable.' If the page recognition was already performed, it will do so again.

 Note: this process can take anywhere from a few seconds to a few minutes, depending on the size of your document.  After the software is done, the page will reload and show you the varying editing options you have. It will look like this:

From this page, you can edit the document, correcting any mistakes that the software may have made. No OCR software is 100% accurate, so if accuracy is important to you, you will have to go through and read everything, to ensure that the document is correct. To edit, click inside one of the bounding boxes that have appeared on your document (which you can edit). From there, you can change the style, size and color of the font, as well as correct spelling and grammar errors.

Scanned Documents Options

The Scanned Documents box -- located near the bottom of the right panel -- is where you can change a few of your OCR options. The Settings tool allows you to change what language you would like your text to be recognized in (English (US) being the default), and will rerun the OCR process with your selected language. You also have the option to use the available system font, as opposed to the default synthesized font that attempts to match your initial image, or to make all the pages editable, which will be helpful if you need to perform OCR on a longer document.

Scholarly Commons

Scholarly Commons's picture
Scholarly Commons
Contact:
306 Main Library
Drop-ins welcome
Monday-Friday 8:30am-6:00pm
Phone: 217-244-1331
Website