Skip to main content

University Library, University of Illinois at Urbana-Champaign

Introduction to OCR and Searchable PDFs: ABBYY FineReader Activities

Learn OCR best practices and how to begin an OCR project using ABBYY FineReader, Adobe Acrobat Pro, or Tesseract with this guide.

Scholarly Commons

Scholarly Commons's picture
Scholarly Commons
Contact:
306 Main Library
Drop-ins welcome
Monday-Friday 8:30am-6:00pm
Phone: 217-244-1331
Website

Activity #1: PDF -> Excel in ABBYY FineReader

This activity will help you familiarize yourself with importing PDFs, provide an introduction to correcting areas, and teach you to export a document as a table.

Step 1: Import PDF Document

  1. Open ABBYY Finereader (it should be on your desktop, or you can look through the programs on the widow button in the lower left-hand corner)
  2. Once ABBYY Finereader is open, the next step is to locate the document that you would like to work with.  Choose the task from the task manager when you first open up ABBYY. 
    1. HINT: You’re planning on outputting this document into Excel.
  3. The file name is Activity #1- On-Campus Student Enrollment, which you can download from the Documents box in this LibGuide.
  4. Once you’ve selected your document the software should import and begin analyzing.

Step 2: Ensure ABBYY is recognizing tables.

Helpful tool and reminder for checking areas:

  1. Check the areas in the document to ensure that ABBYY Finereader imported the document properly.
    1. Did ABBYY recognize the information as a table?
      1. Hint: is the table within a blue box?
  2. After areas are changed “Read the page” or pages to ensure the program recognizes the change.
  3. If it did, then the document is likely ready to be saved as an excel document.

Step 3: Output in Excel

  1. Save the document as an Excel document
    1. Hint: Saving/converting a document as a different format can be found in the toolbar.
    2. Hint: You can also change the output with the drop down arrow on the right of the Save icon.

  1. Open the document in Excel and check your work.

Activity #2: Verification in ABBYY FineReader

The purpose of this activity is to gain practice with area identification and text verification. This process is a way of correcting ABBYY FineReader and making your document findable.

Step 1: Area Identification

  1. Click “new task” under file in the left hand corner.
  2. Quick Task of your choice (as long as it includes a document already scanned).
  3. Import document Activity #2- Complete Writings of Nathaniel Hawthorne found in the Documents box of this LibGuide.
  4. Verify that all areas are correct.

Step 2: Text Verification

  1. Check the results of the document (verify that the text is correct)
  2. Hint:        

 

  1. Edit the words that are grammar or spell checked from the software. Edit anything else that seems amiss.
    1. You should do all your edits in a screen that looks like the image below.
    2. You can add correct words into the dictionary if marked wrong.
    3. Helpful word suggestions are found in the box on the right.
  2. If you make an edit in the text box, the skip button will turn into a “confirm” button — click “confirm” to verify your change.
  3. When verification is complete, you will have a screen that says:

  1. If time permits:  Redact text in the document for “sensitive information.”
    1. This document does not have sensitive information, but if you were presenting, sharing data or a report that does, this might be helpful
    2. Hint:    
  2. Save the document in a format of your choosing.