Digital Book Conversion Pt. 2 – Post Processing

If you followed Pt. 1 of this little series you should have lots of scanned images in the TIFF format. What is needed is a little post processing. This will tidy up a lot of the minor issues such as deskewing the text, rotating the pages so they are perfectly straight and so on. To do this we need to install a piece of software called ScanTailor. The good news its free, the bad news it can be a little fiddly to install on OS X (simple install on windows though).

Installing ScanTailor on OS X (10.8+ – Mountain Lion)

The first step is to install Xcode, this can be found in the App Store, so is a simply click to install. Once installed you need to change a setting within Xcode, in order to do this open Xcode and go to > preferences > downloads > and select command line tools.

This will install the software needed for the next step. Installing ScanTailor from source code can be a little fiddly so there is an easier way, utilising MacPorts. Download the latest version of MacPorts and install. After installing open a Terminal window and enter the following commands:

sudo port selfupdate
sudo port install scantailor

This will take a while, but once complete ScanTailor will be fully installed and ready to run.

Windows users

People on Windows have it easy, simply download the latest version of ScanTailor and install. If only it was this easy on OS X!

Utilising ScanTailor

The first thing to note with ScanTailor is it can take quite a long time to process the files. On my Quad core i7 a 500 page book can take over 5 hours. It does depend on the complexity of the content within the scans but be aware it takes a long time. You don’t have to be sat in front of the computer while everything is running though, it just takes a long time to run.

Another important note, when you make a change in ScanTailor generally you will want to apply to all pages, so make sure you click “apply to” and apply to “all pages”.

**IMPORTANT** – think about what you want to do with your scanned images, do you need the preface? do you only need the bibliography? only the main body? import only what you want to use for this process. Especially if you want page numbers to match, only import the main body!

Once your TIFF’s have been imported we can begin processing them. The first step “fix orientation” should not apply if you used a document fed scanner. If you scanned on a flatbed this may apple. So configure accordingly.

Split Pages – again this depends on your scanning method, if you used a document fed scanner make sure to elect the page icon on the left. Do not however in order to apple this to all pages you must first select one of the other icons then the icon on the far left. Click apply to and apply to all pages. Click the start button next to split pages and it will apply the settings to all pages. If you used a flatbed scanner these settings need to be changed accordingly.

Deskew – Leave on auto and apply to all pages, then click the start button next to deskew

Select Content – if you used a document fed scanner this is simple, drag the selection box to increase its size to its maximum on the X and Y axis and apply to all pages. The built in auto select can be a little shaky so if you didn’t use a document fed scanner this step will prove difficult. Click the start arrow next to Contents Selection.

Margins – As you made the Contents Selection box as large as it can be, there is no need for a margin. Set all to 0 and apply changes to all. Click the start arrow next to Margins.

Output – This really is the tricky one, I would highly recommend you avoid the black and white setting, unless you are confident your book has 0 colour and no images. The safest bet is to stick to mixed and apply to all pages. Now click the start arrow next to output.

This step will take a long time, but when its complete all the post processing will be complete and it is finally time to begin converting to readable formats. The next parts of the HOWTO will cover how you can convert the book to searchable PDF’s, RTF format and HTML. All these formats will work well with screen readers so will allow the visually impaired to read the scanned book.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.