Book Magic

Book Scanning: Step 2, Pre-process the captured content using Scan Tailor

In a previous post, I broke down how to capture the physical book content using a camera. We’ll go over how to pre-process the captured content to maximize successful optical character recognition (OCR).

Getting a physical book onto your Kindle requires 3 basic steps:

  1. Capture the book content using a camera.
  2. Pre-process the captured content using Scan Tailor.
  3. Use ABBYY FineReader for optical character recognition (OCR) to produce an eBook.

In this post, I’ll be covering step 2.

Materials

Only 2 items are needed to pre-process the captured book content  from step 1.

  1. Scan Tailor. The Scan Tailor software is an “interactive post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. You give it raw scans, and you get pages ready to be…” fed into optical character recognition (OCR) software. Scan Tailor is Free Software.
  2. Book Image Files. In step 1, you captured a book chapter as image files. These image files should all be residing in a single directory.

Creating a new Scan Tailor project

  1. Launch Scan Tailor.
  2. Create a New Project.
  3. When prompted for the input directory, choose the directory containing your image files.
  4. Scan Tailor will now load your image files.

Let Scan Tailor do its magic

  1. At the time of this blog post, Scan Tailor consists of 6 steps: Fix Orientation, Split Pages, Deskew, Select Content, Margins and Output.
  2. Click the Play button for each of these steps and let Scan Tailor do its magic. 🙂
  3. The Output step takes the longest and will generate *.tiff image files in a sub-directory named out.
  4. You can save your Scan Tailor project, if you think you’ll make tweaks in the future.

Navigate to your out sub-directory and check-out the newly generated *.tiff image files.

These pre-processed image files should be a huge improvement over your original image files from step 1. It’s possible to use this pre-processed content to generate quality PDFs, but we’ll take it a step further in the next post.

Congratulations! You’ve just pre-processed your first chapter!


About the Author

Ray Li

Ray is a software engineer and data enthusiast who has been blogging for over a decade. He loves to learn, teach and grow. You’ll usually find him wrangling data, programming and lifehacking.

Leave a Reply

Your email address will not be published. Required fields are marked *

five × four =

This site uses Akismet to reduce spam. Learn how your comment data is processed.