Digital Book Conversion Pt. 1 – Scanning

Perhaps the most important step of converting any book to a digital format is the initial scanning stage. There are multiple options to choose in order to scan the actual book that break down into two categories, destroying the book or keeping it intact.

Keeping it intact by far takes the longest, this is definitely the choice to make if you only want a specific chapter of a book. The best way to do this is to use PlusTek OpticBook scanner, I highly recommend the PlusTek OpticBook 3600. This is the scanner I started out with but it quickly became far too time consuming to convert an entire book.

Destroying the book by cutting off the spine allows for a far quicker and efficient scanning process. The spines can be cut off at any friendly print shop, just tell them you are visually impaired and you are allowed to convert the book to any accessible format you like.

Once the spine is cut it is simply a case of feeding it into a document fed scanner. I highly recommend the Canon DR2010M, there are lots of different versions of this scanner. The M simply means it is the Apple Macintosh version, however the software can be downloaded from Canon’s website; so all the models are practically the same.

Optimum Scanner Settings (these apple to both methods of scanning)

The most important setting when doing the scanning is the correct dpi. I generally choose the dpi depending on the font I will be scanning. For paperbacks that use a serif font I like to use a higher DPI; preferably 600DPI. This is because the cheap printing can cause a lot of bleed so its best to have the book at the highest possible resolution for later Optical Character Recognition (OCR – the bit that converts the scanned images to text).

If the book is using a sans serif font and is something lieka text book there is generally less bleed so you can utilise a lower DPI, I usually use 400DPI but you could certainly go lower.

A little about process (applies to destroying the book)

Once the spine is cut off its very important you have a process in which to scan the pages. After all if you start making mistakes the book may come out of order and putting it back into the correct order is a nightmare! (talking from experience). So this is how I divide the book.

I break it down into the following sections:
1. Front Cover, Preface, Contents Page
2. Main Body of book
3. Bibliography, Index, Glossary etc

I do this to give 3 unique naming systems to the book. Lets say the book is called “Textbook”.

Step 1
Scan section 1
Naming structure “aTextbook_001” using incremental numbering

Step 2
Scan section 2
Naming structure “Textbook_001” using incremental numbering

Step 3
Scan section 3
Naming structure “xTextbook_001” using incremental numbering

I use this naming system so the page numbers are correct. As usually for a textbook you may have to read pages 345-499. Using this naming method you would simply look for Textbook_345-400. Getting the naming structure correct at this early stage really helps in the later stages and in reading the book through accessibility tools.

I will assume this naming structure has been used in all subsequent posts about converting the books.

Software specific settings (applies to destroying book method)

If you are using the Canon DR2010M these settings will apply. For any other software similar settings will be possible and should roughly apply.

Scanning settings
Scan Duplex (scans both side of pages)
Auto Select Colour
Auto Select Size
DPI = dependent on original book (cheap paperbacks 600DPI, Textbooks 400DPI)

Saving settings
Name appropriately as highlighted above
Save as TIFF
Make sure to save each scanned image as a separate page.

Following the above steps should result in some high quality scanned images. The resulting file sizes will be very large, however this will be tackled by some post processing. I will go into detail of post processing in the next update, including how to install ScanTailor on OS X.

Leave a Reply Cancel reply