A virtual Copyright card catalog? Tell us what you think.

March 22, 2012 by Mike Burke

Of the 25,723 drawers in the Copyright Card Catalog, more than 12,000 have already been scanned resulting in more than 17 million card images safely tucked away in Library storage. The long term plan is to capture index terms from the card images using OCR and keyboarding and to build indexes for online searching. But this will require significant time and money to achieve. Must we wait to share these images with you? Maybe not.

As an interim step, the Copyright Office is considering making the images of the cards in the catalog available online through a hierarchical structure that would mimic the way a researcher would approach and use the physical card catalog. We’re calling this a virtual card catalog. While it would not provide the full record level indexing that remains a principal goal, it would make information available as we’re doing the scanning and as searchable as the actual cards.

The card images have been organized by drawer, each in its own folder, and the image file names contain the time period, the drawer label, a sequential four digit number starting with 0001, and occasionally an alphabetic suffix when information exists on a verso or there are multiple card images for a single entry. So there’s already a hierarchical organization of the images that could enable a virtual card catalog.

But how would this virtual card catalog look and operate? A search would probably begin at the top of the hierarchy with the selection of a catalog segment (e.g., Registrations from 1971 to 1977) perhaps from a drop down list. This would be followed by the entry of a search term (i.e., a name or a title). The software would step down to the next level of the hierarchy within the selected catalog segment and find the “virtual drawer” folder that alphabetically within the segment should contain the term and then display that drawer label along with labels for some number of drawers immediately preceding it and some number of drawers immediately following it. The researcher could select any one of the drawers displayed or return to the initial search screen. For a selected drawer the software could display small scrollable images in one panel and a full card image in another panel. One could scroll through the smaller images to different points in the virtual drawer, select and display a specific card, navigate to the next card, the previous card, the beginning of the drawer, and to the end of the drawer. The software could support a return to the list of drawers, forward and backward navigation at the drawer level, and a return to the initial search screen. The following is a mock-up of how the card images might be displayed. Click on the image to see a larger display.

Mock-up of a virtual card catalog display

We are exploring multiple ways of making the Copyright records available online sooner rather than later. The notion of a virtual card catalog is an example and one that could probably be done at a modest cost. It sounds good to us but we want to hear what you think of it. While not the optimal solution, would it nevertheless be useful to you as an interim step? Do you know of other organizations that have done something similar and done it well? Please take a moment to consider this option and let us know what you think.

26 Comments

Judy G. Russell CG
March 22, 2012 at 4:40 pm
>> While not the optimal solution, would it nevertheless be useful to you as an interim step?<< Yes, yes, yes. And oh yeah… yes.
jo wable
March 22, 2012 at 5:54 pm
this is a wondeful, practical idea and I love it
Susie
March 22, 2012 at 7:11 pm
Interim availability is a welcomed option.
Cath Madden Trindle, CG
March 22, 2012 at 7:26 pm
Absolutely. Having access to earlier copyright records will make all the difference to the genealogical world, and if they are as easy to use as a physical card catalog we should all be happy! The only problem with the physical card catalog is that it was 3000 miles away.
Elias Savada
March 22, 2012 at 10:50 pm
I’d suggest the ability to skip thru a certain number of cards in any drawer. Based on the image in this post, it looks like you can go one ahead or back or all the away ahead or back, but nothing in between.
Elias Savada
March 22, 2012 at 10:51 pm
Or is that what the slider in the left portion will do?
2010 library school graduate, librarian at heart
March 23, 2012 at 9:09 am
1.5 years out of library school and not currently working in the field, I’m perhaps in the dark about this issue. I’m curious, however, to know if work has begun on the long term plan (building indexes for online searching). I definitely understand the need for a quick fix and a virtual card catalog would surely suffice. I wonder though, is it worth the time? It seems to me, in some sense, that in implementing this plan we would just be wasting the time of valuable professionals who could be focusing their efforts towards new and original projects, or towards the long term goal. The only way I can see the completed creation of a virtual card catalog being truly beneficial is if students/interns were involved in the project. In this case, the creation of a virtual catalog would provide an interim solution while also presenting a valuable learning experience. Does this make any sense? I could be missing something…
Mike Burke
March 23, 2012 at 9:25 am
Elias,

The vision right now is that the slider would span the entire virtual drawer and that the smaller images would be readable. There may be some technical challenges with making that happen. But you are correct that some means of skipping to points in the virtual drawer would be required.

Mike
Mike Burke
March 23, 2012 at 9:48 am
I’m pleased to say that we have involved graduate students, interns and experts from academia in seeking viable solutions to making the Copyright records available online. They have brought new and creative ideas to the table and we will continue to reach out and involve them as we move forward. This is a massive project and we are exploring and testing all options. Some of the recent posts provide an update on where we are and where we’re going. Your input and ideas are most appreciated and most welcome.

Mike
Diane Davison Esquire
March 23, 2012 at 10:11 am
First this is a wonderful idea and would be quite useful to legal practitioners. I would hope it would be made available at NO cost as these are public records and many people cannot just drive to the Library of Congress to perform free card catalog searches. It should be free just like the post-1976 Act registration records are online.
Sarah
March 23, 2012 at 10:27 am
Better than $165/hr!

I’m wondering, is crowdsourcing the indexing entirely out of the question? I wouldn’t necessarily let just anyone with computer access transcribe a card, but maybe work with professional organizations to invite their members to log in and transcribe a few things here or there?

I feel like if you’ve got a reasonable person looking for “Forgotten” by Clair Call, but having found it, he or she can do nothing to make it more findable in the future, that’s a lost opportunity.
Mike Burke
March 23, 2012 at 10:52 am
Diane,

The Copyright Office has no plans to charge a fee for access to the pre-1978 records online. Searching and retrieval will be free just as it is for the post-1977 records. In fact we hope to eventually make the searching of the pre-1978 and post-1977 records as seamless as possible.

Mike
Mike Burke
March 23, 2012 at 11:03 am
Sarah,

Crowdsourcing is very much on the table. We’ve already had conversations with companies and organizations that are using it or hosting it as a service and we’ve taken steps to gather more information from the industry and are planning a small test in the near future. We believe that we can define data capture tasks that would be effective through crowdsourcing particularly when combined with double blind data capture for verification.

Mike
Marci Frederick
March 23, 2012 at 11:24 am
This seems like a good interim measure to get this information out.

In the long term, links between the Library of Congress catalog records and the copyright registration records would be a handy way for people to get this information.
Kevin Marcou
March 23, 2012 at 11:46 am
As a library school student (and one of the contractors working on scanning all of these cards. 17 million down, 13 million to go!), I’ve been wondering as to how all those thousands of images I’ve watched flitting across my computer screen would end up being organized and brought to the public.
The mock-up looks great, but I am curious as to how you envision indicating and displaying multi-card sets and verso information in that sort of system.
Mike Burke
March 23, 2012 at 12:52 pm
Kevin,

Thank you for your question and for the excellent quality card images that you and your colleagues are creating for the Copyright Office.

We envision the card images being delivered by virtual drawer, that is a set of images corresponding to the cards in a physical drawer in the catalog. The virtual drawer would be found alphabetically within one of the catalog’s time periods. As would be done if one were working with a physical drawer, a researcher would peruse the images alphabetically moving one or more images at a time, forwards or backwards, through the virtual drawer. A verso image would immediately follow the corresponding recto image and images for sets of cards would be displayed sequentially just as one would have found them in a physical drawer.

Mike
David Hayes
March 23, 2012 at 3:56 pm
To make searches faster and more effective for the user, less resource-using for the Copyright Office, the program could initially present users with an image of just the top 1/2 inch of a card. This would be similar to what a user of the physical cards sees when he thumbs through a drawer, quickly casting aside from further consideration the cards which don’t have the title(s) he wants to see. The user could also be provided with a multi-card skip button, so that ten, fifty or a hundred cards could be skipped at a time, in cases where the first letters of the titles being shown are far afield of what the user is seeking.

Once the user locates a card top which interests him, he might click that image and be presented with the full-card image. At that point, he should have the opportunity to click next-card and previous-card buttons, and thereby see full-card images of the adjacent cards. (The presumption should be that if the user now wants to see one full card rather than just the card top, the subsequent cards he wants to see are also of interest to him (such as additional cards with the same title [different versions of the same work, unrelated works by different authors but the same title, and additional works by the same author in the case of cards filed by author rather than title or copyright owner]).
Ray Beere Johnson II
March 24, 2012 at 1:58 pm
A good solution on hand is worth two optimal ones that will take forever to implement. As a former professional genealogist, and someone who has registered my own work, and someone with family members who have also copyrighted works, this has me drooling. Why would anyone not like this idea? What benefit is there to anyone in waiting when a decent solution already exists?
Dorothy Compton
March 25, 2012 at 7:14 pm
I like the virtual drawer idea and the skip options. Interim availability . . . the sooner the better!
Rene Hohls
March 26, 2012 at 12:33 pm
This would be great – useful for education and practical research.
Please do!
Bobbie Mercy Oliver
March 26, 2012 at 4:05 pm
Souds great to me, go for it!
Ed Summers
March 28, 2012 at 10:09 am
Great to see you blog about this work Mike! Please let me know if there’s anything I can do to help you make it happen.
Geoff Hull
March 28, 2012 at 6:09 pm
Sounds and looks like great idea. Definitely a boon with all the terminations of transfers coming about.
Carl Malamud
April 2, 2012 at 6:17 pm
Yes, yes, yes … please do the interim step. It would also be extremely helpful if you could make the raw scans available for bulk download so other services such as the Internet Archive could work with the data. The Library of Congress FTP server would be an ideal place to put the raw data.
Mike J. Brown
April 2, 2012 at 11:22 pm
Re: Sarah’s wariness about tasking “anyone with a computer” with data entry, rather than a sufficiently vetted, privileged few: well, it may be counterintuitive, but setting the bar of editorship fairly low (e.g. registration and no uncorrected bad behavior) is exactly what works the best in the crowdsourced systems I’m familiar with. The colossal failure of Nupedia and the rapid rise of Wikipedia is a classic example, but more close to home for me is Discogs. A few years back, the powers that be realized that the ever-growing backlog was undermining the quality of the data, all of which is supplied by users. They realized that they needed to get the data—all of it, be it new submissions or edits—in front of the greatest number of eyeballs, and to make it very easy for people to edit. When they switched to the wiki-like system of all submissions and edits going live immediately, and everyone having the same editing privileges (except those whose edits are voted poor by other contributors), it was controversial, to say the least…but it has proven to be successful. I see similarly low thresholds for editorship over at MusicBrainz, Goodreads, and Project Gutenberg. Of course, there are occasional vandals which have to be dealt with, but it’s not that big of a problem. In your case, you have it easy with your finite set of card scans and limited ways in which the OCR/data entry could be screwed up. The scans also would always be available alongside the crowdsourced data; people would always be free to compare the two, and if they spotted errors or vandalism, hopefully they’d have the power to correct it without delay. Combined with a system for rating the data quality and each others’ contributions, and some kind of rollback ability, a monitored forum, and some basic guidelines for dealing with conflicts, it should work out just fine. Plus, you could still have some vetted super-editors who could review and lock the data for particular cards once they’re finished, if you want…it would be the LoC’s seal of approval.
Geoff
April 4, 2012 at 9:25 pm
Put the images up – right away! And coordinate with the Internet Archive to host them, and Distributed Proofreaders to turn them into metadata. I imagine both groups would have great interest in such a project. And I wonder if the Internet Archive wouldn’t be able to figure out a way to improve the scanning speed – from what I’ve seen, they excel at that sort of thing.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. Your submission may be subject to disclosure under the Freedom of Information Act (FOIA). The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.