Opting In to Preservation

The following is a guest post from Abbie Grotke, Web Archiving Team Lead and National Digital Stewardship Alliance Content Working Group Co-Chair

In November 2010, the Library of Congress convened a Citizen Journalism workshop that brought together researchers, bloggers, journalists, academics and archivists to address selection and preservation issues for hyper-local community news on the web.

During that workshop, a need was identified for an easy way for content creators to “opt-in” to preservation, by way of a plugin or other mechanism built into blogging tools.  As participant Dan Gillmor described soon after the meeting:

The tech industry has a vital role to play in preserving the material we create ourselves, e.g. blogs, at the edges of the networks. It can work with the archiving institutions to ensure that we, the creators of media, can play a role in our own archiving. What do I mean by this? Here’s an example. I use WordPress to create my personal website, and the website that accompanies my soon-to-arrive new book, “Mediactive.” I wish there was a plug-in for WordPress that would let me save my site to the wonderful Internet Archive, the nonprofit that is trying to archive as much online material (among other things) as possible. All blogging software vendors should have features like this, assuming the Internet Archive wants the material, which I’m fairly sure it does.

Others in the meeting, as well as staff in the NDIIPP program, were intrigued by this concept, though the idea remained only wishful thinking until July 2011. It was then that the Content Working Group of the National Digital Stewardship Alliance held a workshop to discuss categories of at-risk digital content. Blogs were on that list, and the idea of a preservation plugin, integrated within blogging software, was again on people’s minds.

With renewed interest and now a working group to sponsor a project, a Content Working Group Action Team was formed and progress is being made to try making Dan’s dream a reality. We’ve held some engaging meetings in the last few months to discuss policy issues and the inevitable, “how exactly do we make this happen?”  We’ve recently expanded our thinking beyond blogs, recognizing that many content owners are using popular blogging software to manage not just blogs, but entire websites. We’re hoping that any site owner using these platforms could easily opt-in.

Easily is a key word. We want to allow content publishers to say “yes, I want to be preserved,” with essentially an install of a plugin, a simple form to fill out, and a click of a button, and for that opt-in data to be easily made available to the preserving organizations. We want to later allow the content publishers to view which NDSA institutions have selected their site for inclusion in their collections.

NDSA members will be able to view publisher submissions and mark those matching their selection criteria, also indicating when the site will be or was captured.

Internet Archive, an NDSA Member, has committed to collecting all submissions at least once per year so site publishers are guaranteed inclusion in at least one archive; but with all NDSA member organizations reviewing the list of “available” sites and blogs, more than one institution may elect to preserve the same site. We see potential duplication as not a bad thing; sites are collected for a variety of reasons by organizations, at different times.

A first step is to build a matrix identifying key publishing platforms to approach and to set up a pilot project. The resulting API could be connected with any number of open source or proprietary blogging platforms and content management systems.

With many of the Content Working Group members activity involved in the Web Archiving community, we are also keeping apprised of other blog preservation projects such as the EU-funded BlogForever project, and work by our colleagues in the International Internet Preservation Consortium.

Watch this blog for future announcements about the pilot program, and visit the Content Working Group’s page to learn more about all of our activities.

One Comment

  1. Tony Brooke
    December 17, 2011 at 12:13 pm

    It looks like a module for Drupal exists that might fit this need,
    http://drupal.org/project/internet_archive

    It seems to be primarily for offsite media storage but perhaps it would work to archive a whole site. (Or could be updated to do so.)

    “The Internet Archive module allows sites to automatically transfer media files (audio & video) as well as associated metadata to and from Archive.org. Once files have been transferred, they can be displayed using Archive.org’s embedded player, and derivatives can be retrieved.”

    And this page mentions the origins of it,
    http://groups.drupal.org/node/91624

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.