The Chinese in California, 1850-1925

Building the Digital Collection

The Bancroft Library received an award in the 1998/99 of The Library of Congress / Ameritech National Library Competition to support the digitization of materials related to The Chinese in California, 1850- 1925. Additional partners in this project were The California Historical Society, San Francisco and The Ethnic Studies Library, University of California, Berkeley. The materials proposed for digitization reflected a broad range of formats and topics illustrating the complex history of the Chinese/Chinese American experience in California.

Important aspects of building this digital collection are described below.

Selecting the Source Materials

The Chinese in California, 1850-1925 is a virtual archive of materials selected from a number of diverse collections at The University of California at Berkeley's Bancroft Library and Ethnic Studies Library and collections held by the California Historical Society. The materials selected document the nineteenth and early twentieth century Chinese immigration to and settlement in California. Because of the complexity of the social and political history of the time, the presentation of the materials is organized thematically. In selecting the material and constructing this digital archive it became apparent that much of the material reflected an outsiders' view of the Chinese communities. Information on the Chinese in 19th century and early 20th century comes to us via books, periodicals, newspapers, and other published records as well as through original source documentation such as manuscripts and photographs, drawings, and other pictorial materials. These materials are often filled with caricatures and derogatory designations. Yet these sources are often used even today because of the scarcity of written documentation on certain aspects of Chinese-American history. These documents tell us the history of what immigrants faced coming to the American West and the inter-ethnic tensions that were present. But they also can document the specific contributions of the Chinese to commerce, architecture, and cultural and social life. Moreover, in surveying materials at the three repositories, there was a visual and textual trail that reflected the Chinese point of view. This was especially true of materials from the Chinese American Archives at the Ethnic Studies Library, reflecting the political, economic, social and cultural life of the Chinese communities in California. But much rich resource materials is also available at both The Bancroft Library and The California Historical Society that sheds light on the Chinese and Chinese-American point of view on the many challenges and struggles they encountered in California, as well as their reflections on achievements and accomplishments.

Intellectual Access to the Collection

The UC Berkeley Library's Project Control Database was designed as a reusable tool which curators could use to build digitized collections of diverse archival materials such as photographs, drawings, letters, manuscripts, and books. It has been used in numerous digitization projects including, the NEH-funded Making of America II, projects funded by LSTA to digitze cased photographs, and the Museums in the Online Archive of California (MOAC) project, funded by IMLS. The MOAC project has been described in an April 2002 article in FirstMonday. The database is also in use for the California Cultures project, funded through the Library of Congress.

The project control database manages the process of creating digital objects; creating intellectual description and access information (the database is designed to accommodate all major descriptive standards currently in use in digital projects) and correlating it with text and image files; provides necessary structural metadata; and records image capture data. Information recorded includes fields for: Identification of Item (Collection Name, Call Number, Series name/Sub-collection number, Shelving location, Item identification - (Volume, Container, Folder, Item numbers), Caption); Digital File (Format, Resolution and Dimensions, File location). EAD container listings of finding aids are also automatically generated from information in the database at the end of the process using perl scripts. The database also automatically binds multi-part digital objects together into XML encoded (as Metadata Encoding and Transmission Standard - METS) objects. It tracks all of the administrative metadata for the images, storing important information as to how the archival images and their derivatives were processed, when, and with what methods. The database accommodates different image processing work flows (flatbed scanner, multipage scanner, and digital camera) and the workflow for reformatting and marking up electronic texts. This database is currently the standard tool in OAC digital projects.

Project staff entered a brief descriptive record for each object in the project control databases. Descriptive and administrative metadata was keyed in by staff into separate project control databases set up at the partner institutions. Project staff keyed data into one database at The California Historical Society and into a second database for material from both the Ethnic Studies Library and The Bancroft Library. The descriptions present in the finding aids follow rules set forth in Anglo-American Cataloging Rules, 2nd edition (AACR2). As necessary, local data conventions and guidelines were developed to aid in the consistency of data entry across institutions.

The database is hierarchical, so that entries could be made for collections, with all related item entries made under their collection record. Records contained identifying call numbers, project batch numbers used for routing and tracking material for digitization, physical descriptions and other cataloging information including contextual notes.


The UC Berkeley's Digital Imaging Lab (DIL) served as a service bureau offering digital image capture service. Following other CDL collaborative imaging projects, DIL and project partners met the standards described in the CDL's Digital Image Collections Standards. The image production process used by DIL in this project was originally designed for the California Heritage Digital Image Access Project (funded by NEH) and has been used to produce the images and detailed descriptive information for a number of projects including th eproject s mentioned above and the LSTA-funded Japanese American Relocation Digital Archive (JARDA) project.

DIL performed digital image capture using an Agfa Arcus II flatbed scanner and a PhaseOne Powerphase digital camera. The PhaseOne Powerphase digital camera scanning back was used with a Hasselblad with 120mm Makro-Planar CF lens, mounted on a copystand, with Kaiser daylight fluorescent illumination. The Agfa Arcus II flatbed scanner is used primarily for loose (unmounted) originals; while originals mounted in bound volumes, framed originals, and originals larger than the platen of the scanner (8 x 13 inches) are captured with the PhaseOne Powerphase digital camera scanning back on a Hasselblad with 120mm Makro-Planar CF lens, mounted on a copystand, with Kaiser daylight fluorescent illumination.

As part of the initial capture, images were balanced for brightness, contrast, and color, using the proprietary software supplied by the equipment manufacturers. A compact target including a grayscale, centimeter and color patches was included for reference with each scan. Typical capture resolution is between 300 and 600 dpi, with the 600 dpi level utilized whenever practical. The digital master files are archived onto writeable cd media (CD-ROM) as 24-bit RGB TIFF files. Derivative (viewing) files were created from the digital masters in batch mode using Photoshop and Debabelizer software to produce JPEG (JFIF) and GIF format files at the reduced resolution levels appropriate for viewing. Quality review of work was done at a number of points in each production workflow, first at the point of capture on flatbed scanner and digital camera (since each is a 'one off' operation) and finally just before web presentation when both images and text were viewed with a browser to confirm their accessibility.

Integrating the Collection into American Memory

Once the data was input and digitization of the selected materials complete, the two databases were exported to a single "virtual" EAD XML finding aid using a program written in perl. Originally, each database was divided at the highest level into nine broad topical areas, e.g., "Chinese and Westward Expansion," "San Francisco's Chinatown -- Architectural Space," "Agriculture, Fishing and Related Industries," etc. These nine areas were preserved in the exported finding aid and material from each repository merged under each topic. LC proved to be extremely accommodating in allowing us to choose a format used to submit data to the American Memory Project. EAD was chosen because it most naturally fit the data as it was created.

The data represented a mix of item-level cataloging and collection-level cataloging. Collection-level cataloging is useful for presenting users with a contextual view of an archival collection. Showing items in the context of its series and subseries and neighboring items. An item-level view, such as an interface to browse, search, and sort individual images, necessarily presents items out of context. Accommodating collection-level cataloging for an item-level interface presented its own challenges.

Return to The Chinese in California