Making the contents of printed material available to students

by Ursula Hoffmann

The old way is still easiest: Put the material on reserve in the library. This works fine for students living near the campus but not for those far away. If you have students with no access to the library, you need to put the material online.
Or you can write a lecture about the material and post it.

Or you can put a scanned digitized version of the material online:

Be sure to respect the rules on copyright and fair use

ABSOLUTE REQUIREMENTS for scanned material:

Clean and easily legible and readable pages -- small file sizes so each file will load quickly and reliably.
All right, you may need to spend a bit more time and money, but it will work and not turn the students off, electronically or psychologically.

Recommended format: Acrobat .pdf.
Yes, the files tend to be big as they include printer codes etc. but they also allow the user to zoom in to make the page more visible, scroll text and pages.

Large files may time out while downloading. Many students do not have fast internet connections, or the server may be busy. So they will have no access.
Create smaller files containing just a few pages each.

Materials suitable for scanning:
Cleanly printed individual pages -- but they must have excellent quality.
Printed images, photos, postcards, slides  -- but they may require careful editing.

Materials not worth scanning:
Do not bother to scan short text passages or memos with coffee stains, fuzzy or too dark faxes, italics in a very small font, etc.– retyping the text is faster than scanning and proofreading it.

Long text:
Assuming you have an old printed article you want to revise but you have no copy of it on your computer (you used a typewriter or your hard disk died), you might want to scan it using OCR (Optical Character Recognition) -- be aware that this takes careful proofing.

For reading or viewing online, I recommend scanning all text and images to image files, and then combining them in .pdf files. See below.

Materials that require some time and money to scan successfully:
A page that has text superimposed on -- not separated from -- an image:

Though you can separate the text from the image, you cannot separate the image from the text.
Bound volumes, such as books or magazines.

Scanning pages from a hardcover book or paperback or magazine:

This does not work when you use a flatbed scanner – even when you break the spine of the volume, light will get in from the edges, creating black areas on the sides and in the center that obscure the text. Moreover, the text will be curved if the volume is fat and will thus be hard to read.

Take the volume to a copy shop – there, an opaque flexible cover (rather than the rigid lid of a flatbed scanner) can be used to minimize black areas created by external light creeping in.

Now you can scan the individual pages:
The scanning software gives you two options:
  1. Scan each page as a graphic and save it as an image file, tif/tiff or jpg/jpeg format – these formats, esp. jpg, create much smaller files than those saved as bitmap .bmp
  2. Scan each page as text (with or without the images): here, the scanning software uses OCR (Optical Character Recognition) that converts the scanner image to text, selecting characters or words that it is not sure of for you to correct, and then produces a Word .doc. See OCR about this.

Use option 1: it is easier and faster and better.
If your scanner permits it, select the web as your destination: this will produce the image in 72 dpi resolution sufficient for viewing on screen (rather than for printing) and with a much smaller file size, therefore much faster to load.But the image file names will just be numbered.
Then: Use your image editor to view each image, rotate, crop, edit each if needed -- and rename it, with File, Save as.
Finally, create an acrobat .pdf file.

Acrobat .pdf files are large as they include printer codes etc. but, as a very desirable feature, they allow the user to zoom in to make a page easier to read, to scroll text and pages, as mentioned above. If the .pdf files are composed of image files, you cannot search for text -- but if they are created from digitized text, you can.

Do not combine more than about 20 imaged pages in a single file, for a 2 or 3 MB file size or so – else, the user will have to wait too long to see the file or even time out. Have your students test larger file sizes, and inform John Dono of the results.

How to create an Acrobat .pdf file from a series of jpg image files:

Open Adobe Acrobat.
Click File, Open, go to the folder that has your image files, Files of type – in the Dialog window, select All Files.
Open image1 (or whatever name you chose for the first page).
Now, click Document, Insert Pages, select image2, and After on the following dialog window.
Repeat this step for image3 etc.

Adobe Acrobat will create a .pdf file containing your, say, 20 images. It will allow the user to zoom in to make the page more visible, search, scroll text and pages.
The file name will be that of your first image but with the extension .pdf. Rename as desired -- no spaces in the file name, please.

.pdf file size examples:

  1. Three sheets (actually, 6 pages in a paperback photocopied side by side), with text and images: each page is a .tif file of 300-350 KB – but the .pdf file including all three pages is only 307 KB.
  2. The same three sheets: each page is a .jpg file 70% compressed and 61-87 KB, much smaller than the .tif file –the .pdf file including all three pages is only 217 KB. This is as easy to read as the .tif version but loads faster. Recommended format.
  1. A fifteen-page Word .doc with both text and images and its .pdf file are both about 650 KB, with the .pdf file slightly smaller than the .doc file.
  2. I am using Acrobat version 5. Later versions might produce smaller file size.

Software that claims to compress .pdf files: CVista PdfCompressor 3.1 (http://www.cvisiontech.com/pdf_compressor_31_g.html?jpeg_to_pdf_converter)
Desktop edition (without OCR) is $349 for .pdf files including up to 100 pages. This may we worth purchasing if you plan to put many long texts online. 

A few formats of digitized text on the web -- check them out:
Google (books.google.com) uses single-page images, but with arrows for previous and next page.The French and Germans (www.gutenberg.org) use OCR to put whole works online, as does The Literature Network, online-literature.com – html format, searchable but not zoomable text.

Ursula Hoffmann, November 2006