A New Copyright Blog — and a Challenge

One of the largest card catalogs in the world, the U.S. Copyright Office card catalog comprises approximately 46 million cards. Photo by Cecelia Rogers, 2010.

The following is a guest post by Maria A. Pallante, Register of Copyrights and Director of the U.S. Copyright Office. See the new U.S. Copyright Office blog at http://blogs.loc.gov/copyrightdigitization/

Help Wanted: Have you ever attempted to build an electronic index and searchable database of a complex and diverse collection of 70 million imaged historical records? Neither have we.

Current records dating back to 1978 are available online and searchable at www.copyright.gov/records. The office’s records date back to 1870, however, and many pertain to works still under copyright protection. These records are the focus of our current digitization efforts.  This is an ambitious project that I announced recently as one of several priorities and special projects the U.S. Copyright Office is undertaking. To date nearly 13 million index cards from our card catalog and over half of the 660 volume Catalog of Copyright Entries have been scanned, and the images have been processed through quality assurance and moved to long-term managed storage.

So, back to the earlier question: How do we go about creating a searchable database comprised of 70 million digital objects? For that matter, how do we create metadata for such a large volume of records? Assuming we would like to achieve full-level indexing, how do we do so on a rudimentary indexing budget? What technologies and creative approaches can we profitably employ to get this work done? We welcome your ideas and suggestions on these and many other questions related to this project.

The Copyright Office historical catalog serves as the mint record of American creativity, and there are great benefits to making the collection accessible online. We know that working collaboratively will ensure that the final product best meets the needs of the widest audience of users. I hope you will subscribe to our project blog at http://blogs.loc.gov/copyrightdigitization/ and visit our project web page at www.copyright.gov/digitization from time to time. Most of all, I hope that you will be an active partner in this important effort.

 

3 Comments

  1. hooshang danesh
    December 1, 2011 at 2:14 pm

    The articles notes: ” …..1870….and many pertain to works still under copyright protection.”

    Well, how does a work loses its copyright protection to begin with?

    Is a copyright protection a time-limited entity??

    thanks if anyone knows the answers..

  2. Taylor Kendal
    December 1, 2011 at 5:09 pm

    I created an electronic index of roughly 800 images from a trip to New Zealand (unsearchable mind you) and was exhausted and nauseous at completion. 70 million? What does that number even mean?

  3. Matt Powell
    May 2, 2012 at 12:20 am

    This is a big but important project. With digital technology and OCR capabilities I would imagine that this process will develop momentum and be completed faster than we might expect. It reminds me of the DNA sequencing problem. It was thought that the sequencing of DNA would take years, however with technology and computers the time frame was dramatically shortened.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.