Skip to main content

University Library, University of Illinois at Urbana-Champaign

Introduction to OCR and Searchable PDFs: Adobe Acrobat Pro Activities

Learn OCR best practices and how to begin an OCR project using ABBYY FineReader, Adobe Acrobat Pro, or Tesseract with this guide.

Activity Documents

Scholarly Commons

Scholarly Commons's picture
Scholarly Commons
Contact:
306 Main Library
Drop-ins welcome
Monday-Friday 8:30am-6:00pm
Phone: 217-244-1331
Website

Activity #1: JPEG -> PDF -> Microsoft Word in Adobe Acrobat Pro

This activity will help you familiarize yourself with the Adobe Acrobat Pro interface. The goal of this exercise will be to convert a scanned image into a PDF file, implement OCR, and then export the file as a Microsoft Word document.

Step 1: Import JPEG file

  1. Open Adobe Acrobat Pro (it should be on your desktop, or you can look through the programs on the window button in the lower left-hand corner).
  2. Once Adobe Acrobat Pro is open, the next step is to locate the document you would like to work with. For this activity, we will use document titled 'Activity #1- Pride & Prejudice', found in the Activity Document section of this page. Download the document.
  3. Return to Acrobat, and from the file menu select 'Create’ -> ‘PDF from File’. Import the document you just downloaded.
  4. The image will be immediately converted into a non-editable PDF.

Step 2: Using OCR

  1. On the right hand tools bar, click ‘Edit PDF’.
  2. Wait for Adobe to work its magic.
  3. You should now be able to highlight text, and use the edit tools. Play around with the 'Edit Text & Images' tools to familiarize yourself with them.

Step 3: Exporting as a Microsoft Word Document

  1. Once you have familiarized yourself with the 'Edit Text & Images' tools, go to 'File' -> 'Export to'.
  2. From the drop down list of options, choose Microsoft Word Document.
  3. Give your document a name and save.
  4. Give the program a few seconds to load, then go to your desktop and open up the document. Notice what did/did not translate correctly, and how time-intensive it would be to fix every oddity or mistake. Let's discuss this as a group.

Activity #2: Using OCR on Multiple Files at Once in Adobe Acrobat Pro

This activity will show you one of Adobe Acrobat Pro's most useful features, the ability to use OCR on multiple files at once.

Step 1: Downloading Files

  1. Adobe Acrobat Pro can use this feature on multiple file types, including PDFs, JPEGs, PNGs, etc.. For this activity, download the 'Activity #2- Sense & Sensibility' zip file in the Documents section on the top of this page.
  2. Once you download the file, unzip it simply by opening it up. Inside should be three PNG files, named 1, 2, and 3.
  3. Create a new folder in your documents with the name Activity #2.
  4. Move the three PNG files to the Activity #2 folder.
    1. HINT: This is an important step because Adobe Acrobat Pro cannot use files located in zip files.

Step 2: Importing Files into Adobe Acrobat Pro

  1. Open Adobe Acrobat Pro go to the ‘Tools’ tab.
  2. Choose the ‘Create PDF’ option, then click the 'Combine Files into a Single PDF’.
  3. Add files then press 'Combine'. When the files are uploaded, select 'Edit Text'.
  4. Save files as a PDF in the same folder.