Skip to Main Content

University Library, University of Illinois at Urbana-Champaign

Introduction to OCR and Searchable PDFs

Learn OCR best practices and how to begin an OCR project using ABBYY FineReader, Adobe Acrobat Pro, or Tesseract with this guide.

What is ABBYY FineReader

Abby Fine Reader Logo

ABBYY FineReader is an optical character recognition (OCR) system. It is used to convert scanned documents, PDF documents, and image documents (including digital photos) into editable/searchable documents. ABBYY FineReader 16 can automatically recognize and processes documents with any combination of 198 languages and provides full dictionary support for 53 languages. The software is available for Mac and Windows machines.

ABBYY FineReader can analyze documents in multiple ways:

  • As images/documents are scanned into the program
  • Already existing image or PDF files

Basic OCR Operations in ABBYY:

  • Before performing OCR, the program analyzes the structure of the entire document and detects the areas that contain text, images, tables, and/or barcodes
  • Recognition results are then displayed in the text window
  • Uncertain characters are highlighted in this window and the user can locate possible errors and quickly correct them within ABBYY FineReader

With the resulting files being editable and searchable, researchers will be able to:

  • Copy and paste passages of text
  • Search the text in PDF readers or word processing programs
  • Ingest the text into analysis programs like ATLAS.ti or NVivo
  • Make information easier to find via the web by creating searchable documents