In a previous post, I broke down how to capture the physical book content using a camera. We'll go over how to pre-process the captured content to maximize successful optical character recognition (OCR).
Getting a physical book onto your Kindle requires 3 basic steps:
- Capture the book content using a camera.
- Pre-process the captured content using Scan Tailor.
- Use ABBYY FineReader for optical character recognition (OCR) to produce an eBook.
In this post, I'll be covering step 2.
Only 2 items are needed to pre-process the captured book content from step 1.
- Scan Tailor. The Scan Tailor software is an "interactive post-processing tool for scanned pages. It performs operations such as page splitting, deskewing, adding/removing borders, and others. You give it raw scans, and you get pages ready to be..." fed into optical character recognition (OCR) software. Scan Tailor is Free Software.
- Book Image Files. In step 1, you captured a book chapter as image files. These image files should all be residing in a single directory.
Creating a new Scan Tailor project
- Launch Scan Tailor.
- Create a New Project.
- When prompted for the input directory, choose the directory containing your image files.
- Scan Tailor will now load your image files.
Let Scan Tailor do its magic
- At the time of this blog post, Scan Tailor consists of 6 steps: Fix Orientation, Split Pages, Deskew, Select Content, Margins and Output.
- Click the Play button for each of these steps and let Scan Tailor do its magic. 🙂
- The Output step takes the longest and will generate *.tiff image files in a sub-directory named out.
- You can save your Scan Tailor project, if you think you'll make tweaks in the future.
Navigate to your out sub-directory and check-out the newly generated *.tiff image files.
These pre-processed image files should be a huge improvement over your original image files from step 1. It's possible to use this pre-processed content to generate quality PDFs, but we'll take it a step further in the next post.
Congratulations! You've just pre-processed your first chapter!