Collecting and Preserving Websites | Inside Adams: Science, Technology & Business

Library of Congress catalogers at their desks, Library of Congress, Washington, D.C. (circa 1920)

As librarians, we identify, evaluate, select, collect, describe, preserve and provide access to materials to facilitate use. As librarians of the 21st century, we have integrated digital collections such as ebooks, databases, datasets, and other digital objects into our traditional analog collections.

What about websites?

Do libraries collect websites?

Back in January, I presented on the topic of preserving websites at the 15^th Annual Conference of Atmospheric Science Librarians International (ASLI), which was held in conjunction with the 92nd American Meteorological Society (AMS) Annual Meeting in New Orleans, Louisiana. My presentation focused on web archiving from the perspective of a reference librarian. In my presentation, If It Is Not Archived, It Maybe Lost in the Future: Collecting and Preserving Websites at the Library of Congress, I wanted to bring awareness of archiving websites to my ASLI colleagues and encourage them to get involved with web archiving projects of their own.

The Library of Congress has been preserving websites since 2000. Our first projects involved U.S. Elections and September 11. In 2003 the Library of Congress, the national libraries of Canada, Australia, Denmark, Finland, France, Iceland, Italy, Norway, Sweden, and UK, along with Internet Archive formed the International Internet Preservation Consortium (IIPC). In 2004 the Library established an official web archiving team.

Snapshot of Hurricane Katrina and Rita Web Archive (Retrieved 2012)

My first introduction to web archiving was in 2005 when I participated in the Hurricane Katrina and Rita web archive collaboration. In this collaboration the Library of Congress, along with Internet Archive, California Digital Library, and other similar institutions nominated news, personal, relief and government websites to be captured and archived.

Typically, the Library of Congress collects and archives websites based on themes and events, which makes sense. However, in 2006 the Library experimented with a Single Sites project that archived websites without a unifying theme. I was excited to have another opportunity to preserve websites and work with our web archiving team. On working with the Single Sites project, I was able to select a variety of websites in my subject areas- mathematics and meteorology. I nominated a variety of websites that would supplement the Library’s analog collection. For example, I nominated websites devoted to the search for Mersenne prime numbers and the plight of the endangered polar bear.

As of January 2012, the Library has collected about 285 terabytes of web archive data, and it keeps growing! You can view our public web archive collections here.

If you want to read more about archiving websites at the Library, our web archiving team leader Abbie Grotke has written extensively on this topic for The Signal: Digital Preservation blog:

One Comment

mariateresa
February 13, 2012 at 2:34 pm
My team is in charge of the Environment Agency (England and Wales) web mapping application.
http://maps.environment-agency.gov.uk/wiyby/wiybyController?x=357683.0&y=355134.0&scale=1&layerGroups=default&ep=map&textonly=off&lang=_e&topic=floodmap Last year we upgraded our infrastructure (hardware and software) and investigated how we could preserve the previous versions and we could not find any help. The web is a combination of content and applications. It seems that preserving web applications is not yet an issue for anybody, while they are a essential part of the Internet experience. Videos might help, but I hope that more research can be carried out in this topic.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.