MODS as Data Hub

a presentation given by Tod A. Olson, Systems Librarian, University of Chicago at the ALA Annual 2007 Program on Using Metadata Standards in Digital Libraries

The problem as I see it is that we create descriptive metadata in specific contexts, and then we want to share the metadata and reuse it in other contexts. These metadata are created according to different standards and are stored and maintained in different formats, but we want one metadata format that we can distribute to our peers; one format as a common platform for reuse, such as in new discovery tools and archiving, but that preserves the project-specific aspects of those metadata.

Let me back up a bit. Like many places, our first uses of MODS were in repurposing MARC catalog records. In one case we had a collection of 19th Century classical piano scores that we were making available for music scholars. Very granular descriptive metadata, even recording the dedicatee as an access point. We mapped the MARC into MODS, and put that in a METS file for each score. So now we had METS objects with embedded MODS, and it was easy to transform that into something we could load into Greenstone. And we were looking toward the future were we would archive those METS objects, or exchange them with our OAI partners. In another case, we used MODS in creating a New Acquisitions list. Since we could already harvest MARC into MODS, we then transformed that into the old Unix refer format and loaded into a locally-created tool that we use for a number of small datasets.

So why bother with MODS, why not just use MARC or MARCXML? Well, we could have, but MODS offers some advantages from the programmer's perspective. By virtue of being an XML schema, there are a large number of tools available for development, debugging, validation of MODS. The relationships between the elements of the MODS record are made explicit by the hierarchical structure of the record, rather than relying entirely on an understanding of the semantics of the fields. The tags are more consistent, in that a tag always has the same semantics regardless of where it occurs in the record. <titleInfo> element is always title information, no matter where it is in the hierarchy. A subfield "b" depends entirely on what field it is in. The <titleInfo> always have the same granularity, even if it is under a <relatedItem>, MARC data tends to make more granular distinctions at the top-level, but looses some of that granularity in related items. The structure of MODS seems to be based more on the logical relationships between the data elements, with less influence from the display conventions from catalog cards and print catalogs. And there is no ISBD punctuation in the data! These factors all advantages for the programmer who needs to index and work up different displays for the metadata.

At the University of Chicago, our digitization projects tend to have specific audiences and goals, we have not been a mass digitization shop. Because the projects have specific audiences and goals, the metadata needs for each project, while similar, tend to have project-specific requirements related to their primary use. These requirements might be driven by preservation and archiving needs, specific user interface needs, or the like. But we also want to reuse that metadata for secondary uses, such as to promote broader discovery of the resources, both on our campus and elsewhere. We may do this through OAI or by loading the records into local discovery tools. We want a common data format that we can use for distributing metadata for all of these projects, and that will preserve the granularity we think to be relevant.

Most of our projects do not have pre-existing metadata. The metadata for projects are recorded according to different specifications and with somewhat different structure.

Metadata for (most of) our digitization projects are created locally in a relational database. The database schema for the projects are kept a simple as possible given the project goals, and are mostly consistent, though there are project-specific variations. We crosswalk that metadata into both Dublin Core and MODS, and insert those into a METS record (along with the structural, technical, and administrative metadata). The MODS (and Dublin Core) are loaded into our OAI repository. (By the end of the year, the whole METS object should be able to be submitted to our digital archive.)

So why not just use Dublin Core, why use MODS in addition? The issue is one of metadata granularity. We have projects with different kinds of creators, or different kinds of subjects. These distinctions are made because they are relevant to the goals of the projects, and so we want to make those distinctions available for those who harvest the records, for archiving purposes, and for our own reuse.

For example, we will want to load these digital library records into our next generation discovery tools. Thanks to the work at NCSU and elsewhere, libraries everywhere are eager to put up new faceted search engines to better expose their collections. These faceted engines can leverage the more granular metadata to the advantage of the end user.

Just as a specific example, let's look at subjects. Many of our projects record different kinds of subjects, presumably so the different kinds of subjects can be treated in different ways. It seems natural that some subjects like Constitutional Law or Social History should fall along a different axis from subjects like United States, United Kingdom, or Germany, which seem like a different axis still from something like 19th Century.

That's just one example of where we think a faceted environment will leverage the granularity of our metadata. In cases where the granularity proves not to be useful, we can downgrade if we want, but you can never upgrade to more granular metadata.

In conclusion, MODS has proven to be granular and flexible enough to preserve our project-specific metadata in a common format. We have been able to crosswalk descriptive metadata from different formats into MODS as a common format, and it has preserved the granularity of the metadata in a way that supports the indexing and display needs for different projects and in different target environments. MODS is common enough among our peers that it is useful for sharing metadata. For these reasons, it also plays an important role in our digital archiving strategy. And finally, because of the above reasons, we expect that MODS will allow the metadata we are recording today to crosswalk gracefully into future metadata schemas.

So that's where we've been and where we're going with MODS. Thank you.

HOME >> Using Metadata Standards in Digital Libraries >>MODS as Data Hub

Questions and comments:
Contact Us ( October 18, 2010 )
Legal | External Link Disclaimer