-- techteach
menu -- main menu -- scan menu --
Scanning Text -- details
Options:
Scan: You can scan a printed page as 1) an image, or you can
use 2) OCR (Optical Character Reading) software to convert it to text.
A single sheet is easy to scan. A page in a thin magazine or
paperback is fairly easy to place correctly -- break the spine and hold
down the scanner cover so the page will touch the scanner glass as
evenlyas possible.
A page in a hard cover book is hard to scan as you may not want to
damage the book -- Have it done in a copy shop, for a sheet for each
page that you then can scan successfully.
Edit the images: they may need
cropping (to get rid of the dark margin where the copy did not touch
the scanner), straightening, sharpening, better contrast, spot
removal.....
OR convert each page to text with OCR
software if it has text only. The resulting text will need
careful editing, especially when the copy is not good quality or a very
small font is used or there are italics or accents. Remember, for
example, that 1 and ! and l and i look very similar, as do g and q.
Etc. Use your imagination or try to decipher an old handwritten
document to find out how difficult this can be.
Then: For use on Bb or elsewhere on
the web, you have alternatives:
You can use each page as an image, OR you can use the converted-to-text
and edited version.
In either case, add the images or text to a Word doc or, preferably, to
an html document that will open automatically on the web.
Needed:
You need a scanner -- very inexpensive.
OCR software -- Needed for conversion
of printed text to digitized text:
Most scanners include basic software for image editing and text
conversion. Professional OCR (Optical Character
Recognition) software: TextBridge or OmniPage are recommended,
especially for foreign language text. The latest version of OmniPage
includes more than 100 languages and dictionaries.
The scanner makes an image of a page. Then the software
tries to recognize the text in the image.
This is hard for the software and leads to mistakes. While we “read”
words in context, the poor stupid
software
must recognize dot combinations individually.
Think of a handwritten or printed 1 and l and i
and I and !
and | -- or think of g and p and q and
j. They look very much alike – especially when the print is tiny or the
copy is
bad.
Or visualize text in italics: 1liI!|
- gpqj especially
when the print is very small -- or underlined: - 1liI!| -
gpqj
Be ready to proofread all scanned text very
carefully!
Make sure that what you post is accurate and
easily
legible!
Tips for OCR:
- It is faster to type short text than
to scan and proofread it.
- It is important to have good clean
pages for scanning.
- Computer generated pages scan very
accurately.
- To scan pages or articles from a book
or magazine, one needs to have a good copy of each page made in a copy
shop so they will be straight and not have very dark shadows where the
page
does not touch because of the spine of the book or magazine.
- On a page with an illustration or
printed in multiple columns, select the text areas before scanning.
- Usually, scan text at 100% -- but you
should enlarge a page or part of a page to, say, 200%, when it has very
small print, to get better character recognition results.
- If your copy is dark or crooked, scan
it as an image and then edit the image before you use character
recognition.
Finally:
If you want to use an illustrated page as is, scan
the whole
page as an image. If it is in a book, get a professional copy of each
page from a copy shop – that way it is straight and free of dark
shadows. Scan and then crop the image to get
rid of shadows and, if needed, edit it for contrast and sharpness.
If you want to scan handwritten text, such as a
letter or an
ancestor’s birth certificate, scan it as an image.
If you want to scan just an image from an illustrated page, select it,
scan it, then crop and edit it with the imaging software.
My preference is for posting pages as images, in
Acrobat .pdf format.
Hoffmann, Nov. 2006