Skip to main content

University Library, University of Illinois at Urbana-Champaign

Introduction to OCR and Searchable PDFs: An Introduction to OCR

Learn OCR best practices and how to begin an OCR project using ABBYY FineReader, Adobe Acrobat Pro, or Tesseract with this guide.

Navigation

What is OCR?

Are you curious about optical character recognition (OCR) software? Interested in learning how OCR software may be able to enhance your research project? Or, maybe you're interested in the ways in which OCR can aid in textual comparisons. This guide aims to help you explore the special features of different OCR software.

Optical character recognition (OCR) is the electronic identification and digital encoding of typed or printed text by means of an optical scanner and specialized software. Using OCR software allows a computer to read static images of text, and convert them into editable, searchable data. OCR typically involves three steps: opening and/or scanning a document in the OCR software, recognizing the document in the OCR software, and then saving the OCR-produced document in a format of your choosing.

OCR can be used for a variety of applications. In academic settings, it is oftentimes useful for text and/or data mining projects, as well as textual comparisons. OCR is also an important tool for creating accessible documents, especially PDFs, for blind and visually-impaired persons.

This Guide

This guide is meant to serve as an introduction to OCR, explaining the basic concepts of what OCR is, how to use it, software options, and best practices. It will give in-depth instructions on ABBY FineReader, Adobe Acrobat Pro, and Tesseract, three popular OCR software options. If you have questions after reading this guide, or would like some guidance on using OCR software, please contact the Scholarly Commons.

Scholarly Commons

Scholarly Commons's picture
Scholarly Commons
Contact:
306 Main Library
Drop-ins welcome
Monday-Friday 9am-6pm
Phone: 217-244-1331
Website