The Thomas Jefferson Papers

Building the Digital Collection

Jump to:

Digitizing Microfilm

The Thomas Jefferson Papers at the Library of Congress is one of several manuscript collections being digitized in their entirety from the vast collection of microfilm produced by the Library of Congress Photoduplication Service. All available documents and text captured on the microfilm appear in this digital collection, with the exception of volume 10, Series 7, consisting of Clippings, and volume 3, Series 8, Virginia Records. The microfilm of these volumes cannot usefully be scanned because of the poor quality of the original documents.

The Thomas Jefferson Papers was microfilmed in 1974 as part of the Presidential Papers Project, instituted by Congress in 1957. The goal of this program was to process and microfilm the presidential papers held by the Library of Congress. The sixty-five-reel Jefferson collection, captured on 35 millimeter roll microfilm, was the final product of this program. The microfilm collection was scanned by Preservation Resources at their facility in Bethlehem, Pennsylvania, in 1998 and early 1999.

Microfilm collections of historical documents present a number of challenges to digitization because of the quality of the microfilm being scanned. In addition, there are problems of original document condition, a wide range of tonal values, document sizes, and document orientation on the microfilm. For optimal capture of detail, the Jefferson Papers microfilm was raster-scanned from a duplicate negative microfilm copy. Scanning from the negative microfilm can reduce the appearance of flaws in the microfilm images. The negative was printed directly from the master microfilm and produced for scanning by Preservation Resources.

The digital images scanned from microfilm were produced in JPEG File Interchange Format (JFIF), a compressed grayscale format often used in digitizing historical manuscript documents because of its ability to capture and display a wide range of tonal variations, from those in the document paper to diverse qualities of pencil and ink. This 8-bit grayscale capture was also found to suppress the bleedthrough typical of handwritten documents in the microfilm collection. Grayscale GIF (Graphics Interchange Format) images for access were also created by Preservation Resources. These 4-bit grayscale images provide quick access since the larger JPEG archival images require additional time to download.

In the Thomas Jefferson Papers, microfilm frames of individual manuscript leaves that were originally folded to make two to four pages or writing surfaces, have not been split. Large or oversized bound volumes in Series 8, Virginia Records, were filmed with one page per frame. Smaller-sized volumes were often filmed in open book format with two pages to a frame. In the latter case, these frames were split during digitization into single-page images to improve visual access. Among individual manuscripts, oversized documents have sometimes been microfilmed in sections over two or more frames. To increase legibility, many of these separate images were stitched together electronically by National Digital Library Program staff using Adobe Photoshop. Preservation Resources also used Photoshop to remove some cosmetic defects inherent in the microfilm of all series in the Thomas Jefferson Papers. In Series 8, the Virginia Records, Photoshop's "unsharp mask filter" tool was used to enhance ink-to-background contrast in the images of the manuscript volume pages.

The varying formats in the Jefferson Papers, which range from individual manuscripts to commonplace books, account books, and other kinds of manuscript volumes, received custom cropping. Manuscript leaves or bound volume pages containing text not oriented for reading in the microfilm were re-oriented for reading as digital images. Pages containing multiple texts oriented in a variety of directions were left in their original orientations.

 Top

Digitizing Text

The text transcriptions accompanying the images were converted at an accuracy rate of 99.95 percent and encoded with Standard Generalized Markup Language (SGML) according to the American Memory DTD. The text was translated with an OmniMark 5.1 program to HTML 3.2 for indexing and viewing with Web browsers. A unique identifier in the database record for each document links the text to the corresponding manuscript images. The tables of contents, prefatory material, and indices from the published Records of the Virginia Company were also transcribed and encoded. However, there are no transcriptions for the Early Modern English texts in these volumes. These text pages are presented as GIF images (Graphics Interchange Format) for access and as TIFF images (Tagged Image File Format) for archival reference.

 Top

Digitizing Original Manuscripts

Most of the items from Series 10, Addenda to the Thomas Jefferson Papers, were scanned on an i2S Digibook scanner in the Information Technology Services Digital Scan Center at the Library of Congress. Oversize materials were scanned by an overhead Phase One camera. The original items were digitized as 300-dpi grayscale images, which were compressed using JPEG compression, producing images in the JPEG File Interchange Format (JFIF). GIF images were also created.

The digital images reflect the original physical condition of the Addenda items. Some of the manuscripts are discolored or have extremely faded ink. Others may have tears, folds, or other markings. Several documents received conservation treatment before digitization. The Digital Scan Center staff took great care in the handling of the manuscripts.

 Top

Database Access

Access to this collection is through search and browse pages that link to a database created from the guide to the microfilm edition, Index to the Thomas Jefferson Papers (Washington: Library of Congress, 1976), and also through searchable text transcriptions for some of the Thomas Jefferson correspondence and volumes. Every record in the database contains the name of the author of the document, the date, and a link to the associated set of document images. Other fields display the recipient's name, brief explanatory notes, and a link to available transcriptions. Some text transcriptions have been taken from The Works of Thomas Jefferson in Twelve Volumes, ed. Paul Leicester Ford (New York and London: G. P. Putnam's Sons, 1904), Thomas Jefferson and the National Capital, ed. Saul K. Padover (Washington, D.C.: Government Printing Office, 1946), and The Writings of George Washington from the Original Manuscript Sources, 1745-1799, ed. John C. Fitzpatrick (Washington, D.C.: Government Printing Office, 1931-1944).

 Top