Building the Digital Collection

Scanning

Because the content of Newspaper Pictorials: World War I Rotogravures was intended to showcase the rotogravure printing process, high quality images were desired. This required working from the original pages, as opposed to microfilm surrogates. To minimize risk to the brittle pages, all scanning was done using overhead type scanners.

Uncompressed TIFF images were captured at 300 pixels per inch relative to the original material. Many of the pages were originally printed using colored ink in a monochromatic way, to create a sepia tone effect. To preserve this design information, those pages were scanned in full color. Pages originally printed using only black ink were captured in grayscale.

Image Processing

To create manageable service images for web delivery, the TIFF files were compressed by MrSID software to a ratio of 22:1. A single Small Graphics Interchange Format (GIF) file was created for each page image, to be used as a navigator image. To create the GIFs, Image Alchemy software was used in a batch process. First the originals were blurred, then the image height was reduced to 300 pixels while retaining aspect ratio, then the images were sharpened and finally produced as GIF files according to GIF89a specifications.

A database was developed to drive the calendar-browse mode and provide issue information. Each issue of the newspaper was given one record in the database, with its title, date, and enumeration included to allow citation information to be displayed with each page. The viewer uses file-name information to drive its page-turning function within each issue.

Image Specifications

Master image
Image resolution: 300 dpi
Tonal resolution: 24 bit (color) and 8 bit (grayscale)
File format: TIFF
Compression: uncompressed

Zoom Window Image
Image resolution: 300 dpi
Tonal resolution: 24 bit (color) and 8 bit (grayscale)
File format: MrSID
Compression: Wavelet

Navigator Image
Image size: 300 pixels on the long side
Tonal resolution: 24 bit (color) and 8 bit (grayscale)
File format: GIF89a
Compression: Lempel-Ziv

Text Processing

To permit text searching, the images were processed with PrimeOCR optical-character-recognition software. Settings allowed automatic zoning to recognize columns and ignore pictorial elements containing no text. Output was in the form of PrimeOCR's proprietary ".pro" format, which included the text gleaned from the images, along with information about each character's location and confidence level. One file was created for each page image. The information in the separate files was merged into one large file in a batch process to make it more efficiently searchable. A full-text search will first access the textual information. Once a specific record is selected for display, the character-location information is accessed, highlighting the location of the best match on the page.

Top