METS Creation Tools Comparison

UCB: GenDB
RLG: METSBuilder
CCS: DocWorks
Type of Tool Non-commercial system developed in-house. Non-commercial system developed in-house using Open Source software Commercial system developed in conjunction with the EU sponsored MetaE project
Materials Targeted In inception targeted pictorial materials and manuscript text materials. Also now used for printed materials. Various pictorial and cultural materials, including text-based materials, audio and video. Printed materials: books and journals
Components

Relational DB (SQL Server); Java Servlet driven Web interface; Java batch load interface and input schema; digitization work order generation engine; Java METS creation program

Variety of ad-hoc metadata conversion tools; Batch Load input schema; Java METS creation program built over Castor (an XML binding framework for Java) Analytical digitization and ocr engine; Rules files that guide structural analysis; XML schema (ALTO) for expressing structured text/layout analysis; METS and PDF generation engines.
Main Goals

1)Facilitate manual input of structural, descriptive, and administrative metadata.

2) Alternatively, facilitate batch input of existing metadata once these have been normalized to locally defined XML format

3) Guide the digitization process via work orders that instruct digitization vendors in how to digitize the material.

4) Generate METS Objects once metadata gathering process is complete.

5) Facilitate ongoing maintenance of metadata.

1)Accommodate existing metadata coming into RLG in a variety of formats from a variety of sources.

2) Normalize diverse incoming metadata to a locally defined XML format.

3)Generate METS objects using Open Source tools (Castor)

1) Automate as fully as possible the metadata gathering process: structural, descriptive and administrative.

2) Automatically analyze the physical and logical structure of material being digitized (or already digitized) based on rules files. Separate out the various logical and physical layers of the materials. Allow user to choose level of analysis desired.

3) Allow user intervention/ quality control as needed

3) Generate METS (and/or PDF) objects.

Work Flow

1a) Keyer inputs structural and descriptive metadata manually using configurable user interface.

1b) Alternatively, ad hoc PERL scripts gather metadata from dispersed existing sources and write to an XML-based input file that is automatically loaded into the system.

2) Project Manager generates Work Orders and sends with source materials to digitization vendor(s).

3) Digitization vendor digitizes materials. Returns digital content files along with spreadsheets containing the content filenames and technical metadata.

4) Technical metadata and content filenames is imported into database from spreadsheets, thus completing the metadata gathering process.

5) METS objects generated.

 

1) Metadata and raw digital content files come into RLG from a variety of sources and in a variety of formats.

2) Ad hoc programs, scripts and style sheets convert incoming metadata to standard XML format.

3) Incoming content files converted to RLG format.

4) Java program processes batch load file and generates METS objects using the XML/Java binding framework provided by Castor.

1a) Material scanned and images made available for quality control review.

1b) Alternatively, existing digital content files imported if material has already been digitized outside docWorks.

2) Page images automatically analyzed based on rules files. The results of this analysis is recorded in XML format (ALTO files) along with structured text captured via ocr. Descriptive metadata imported (on request) from external source(s).

3) Structural analysis made available for a quality control review.

4)METS and/or PDF objects generated.

Content File Output System does not itself produce content files. Supports the production of image and structured text (tei) content files by digitization vendors. System itself does not produce original content files; but converts existing content files to RLG format. Engine supports the production of image content files. Produces structured text content in the form of ALTO files (Analyzed Layout and Text Objects).
METS Output

<fileSec> referencing image and structured text (TEI) content files.

Single <structMap> of physical or mixed type. Every <structMap> represents a physical structure at the lowest level of the hierarchy.

<dmdSec> with <mdRef> to external finding aid.

<dmdSec> with MODS encoded descriptive metadata.

<techMD> elements with MIX encoded technical metadata about images

<fileSec> referencing image, text, audio and video content files.

Single <structMap> of logical type.

<dmdSec> with DC encoded descriptive metadata.

<amdSec> with minimal administrative metadata.

<fileSec> referencing image and structured text (ALTO) content files.

Two <structMap>s: one of physical type and one of logical type.

<dmdSec> with <mdRef> to external descriptive metadata (if desired)

<dmdSec> with DC encoded descriptive metadata. (MODS output under consideration)

<techMD> elements with MIX encoded technical metadata about images.