Where are we and where are we going? A brief status update

 

Over the past 2 months I’ve shared information with you about the Copyright records and the plans, challenges, and visions for preserving them in a digital form and making them widely available online.  Today’s post is a brief update on recent progress.

Copyright Card Catalog

 

First, I’m happy to report that digitization of the records is continuing.  In the two months since this blog was started, nearly 2.2 million more catalog cards have been scanned, and the images  inspected and placed in archival storage.  This brings the total to more than 16.6 million cards completed, which is more than a third of the entire catalog.

1928 Catalog of Copyright Entries

 

Progress has also been made in scanning and making available the Catalog of Copyright Entries with 36 more volumes processed since late December bringing the total now to 456 out of the 660 volumes.  We’re nearing 70% completion with some registrations from as far back as 1928 and including all classes of works.

The volumes scanned so far are available at the link below the CCE page shown to the left.

http://www.archive.org/details/copyrightrecords/

 

We also have a small group of subject matter experts and information technologists studying how the records can best be indexed and made available to users.  A test database is being used to assess the efficacy of different approaches to indexing and displaying these older records.  Integration with the post 1977 records is a goal, but accomplishing that is not without its challenges.  Records since 1978 have been collected in a database containing multiple indexes for titles, authors, claimants, and other related index terms.  This granularity has enabled flexibility in searching by title only, by name only, or by combinations of different indexes.  But the card catalog consists largely of a single alphabetical index within each time period with names, titles, and other index terms interfiled.  The plan is to capture all of the text on a card and to programmatically parse as much of the data as possible, placing it in the appropriate fields in the data record.  We are studying the data patterns found in the cards to see if specific types of index terms can be distinguished based on position on the card or on unique characters such as the presence of the copyright notice symbol ©.  But the data in the registration cards for the most part is not labeled making it difficult and perhaps impossible to programmatically distinguish all names and titles.

As we work through these challenges I’ll keep you posted on findings and continue to seek your input.

3 Comments

  1. John Mark Ockerbloom
    February 11, 2012 at 9:14 am

    The 1978 CCE volumes, while mostly duplicating what’s in the online database, also contain a small number of pages of registrations under the old system that weren’t represented in previous CCE volumes. Those registrations do not appear in the database.

    Do you plan to digitize those as well (either the volumes as a whole, or at least the pages with the old-system registrations)? I hope so.

    To see what I’m talking about, I have some of the relevant renewal pages online here:

    http://onlinebooks.library.upenn.edu/cce/1978r.html

    but there are additional pages for original old-system regsitrations that haven’t been scanned, to my knowledge.

    (It may also be worth checking the 1979 volumes to make sure there aren’t any further leftovers. There don’t appear to have been for renewals, but I didn’t check to see whether they contain any delayed old-system original registrations.)

  2. John Mark Ockerbloom
    February 11, 2012 at 9:24 am

    Also: What’s the best way to report errors in the scans? I noticed a couple of pages (1856-1857) missing in the renewals section of the music volume for July-December 1963. From the numbering, it appears to be a page spread that didn’t get included in the scan, rather than a leaf missing in the source volume.

    Unfortunately, I don’t know of any other scan of these two pages (the relevant volume doesn’t appear to be in HathiTrust as of yet).

  3. Mike Burke
    February 15, 2012 at 4:08 pm

    John,

    Thank you for bringing the missing page images to my attention. Both the Internet Archive and the Copyright Office routinely check the images for quality and completeness, but this one got by us. I’ll have those pages scanned and inserted. Flaws found in the digital copies of the CCEs can be reported through the “Report a problem” link under “About this book” by clicking on the italicized “i” at the top right of the book reader. We will also respond to any flaws reported in a comment.

    About the pre-1978 records published in the 1978 CCEs, we are aware of them and we are planning to scan those sections from the 1978 volumes and more recent years if needed.

    Mike

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. Your submission may be subject to disclosure under the Freedom of Information Act (FOIA). The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.