The purpose of this activity is to gain practice with correcting the result of OCR performed by ABBYY FineReader.
Step 1: Opening a document
- Download Activity #2 – Franklin Roosevelt Letter.jpg from the Documents box in this LibGuide.
- When you first open ABBYY FineReader, click “Open in OCR Editor”. This will open a file dialogue. Navigate to the FDR letter you just downloaded. Click Open.
- ABBYY will then perform recognition on the image, which will only take a few minutes.
- On the left of the screen will be the original image with boxes on top. On the right will be the text it recognized. On the right side, any text highlighted in blue is text that ABBYY is less sure is correct.
Step 2: Text Correction and Training
- To adjust the recognized text, click in the text boxes on the right side and type in the corrections.
- To perform training on the text of the image, click Tools > Options > OCR. Click the Use training to recognize new characters and ligatures radio button.
- Now if you right click on any of the green boxes and click Recognize, ABBYY will ask you whether it is recognizing a particular character correctly or not.
In this screenshot, it only has part of the letter M selected. Click the >> button several times to expand the box to cover the entire letter. Once it does, it may correctly identify it as an M, but if not, type M into the text input. Then click Train.
- You can go through several letters in the image until you either run out of letters or you get tired of training the OCR. If you want to stop training it, click Close and then Yes to save changes to the trained OCR pattern.
Step 3: Correcting Boxes
On the left, you’ll notice that Franklin Roosevelt’s signature is in a red box. This means that it is being recognize as an image, not text. If you were to save the document to Microsoft Word, this element would remain an image.
Since this contains text, we need to ensure that we put text in this area.
- Click on the red box and delete it. Next, click the text button at the top in the toolbar. Then draw two green rectangles, one around “Very sincerely yours,” and another around FDR’s signature. Then right click each box and click Recognize.
- This will generate the text at the right, but you will likely have to type out the name for the signature, as OCR does not do a good job recognizing handwriting.
Step 4: Output as PDF
- Save the document as a PDF
- Hint: Saving/converting a document as a different format can be found in the toolbar.
- Open the document in a PDF viewer and check your work.