Read All About It! An Update on the National Digital Newspaper Program

Deb Thomas

Here at the Library of Congress, there are many projects underway to digitize and make available vast amounts of historic, archival material.  One such project is the National Digital Newspaper Program, providing access to millions of pages from historic newspapers (a previous blog post provides an introduction).  Deb Thomas, NDNP program coordinator here at the Library, answers some questions about this amazing project – what’s been accomplished over the past year, as well as goals for the future.

Susan:  Could you give us a quick summary of this project?

Deb:  The National Digital Newspaper Program supports digitization and enhanced access to millions of historic newspaper pages selected and produced by state libraries, universities and historical societies across the United States. Jointly supported by the National Endowment for the Humanities and the Library of Congress, the program is a collaboration of national proportions, currently providing access to 5.2 million pages from more than 800 newspaper titles, published in 25 states between 1836 and 1922 (and growing – we add additional newspapers regularly.) The collection is made freely available through the Chronicling America website hosted by the Library of Congress. These newspapers provide the first glimpse into historic events, as well as the diverse voices and differing perspectives on the life, times, and activities of nineteenth and early-twentieth century America.

Susan:  Tell us about some highlights from your recent partners meeting

Deb:  Each year, awardees in the program gather in Washington, DC, to share their experiences and activities throughout the program. In late September, fifty-eight representatives of 27 active projects and 2 “alumni” projects joined in 2 days of meetings, including presentations from LC and the NEH on progress in the program, new developments and outreach activities. Throughout the meeting, state participants presented on their own activities, sharing their experiences with the challenges of mass digitization and their own plans for promoting the program, collections and Chronicling America.

Highlights of the agenda this year included discussions on the addition of non-English ethnic newspapers to the Chronicling America site by awardees in New Mexico, Arizona and Louisiana and presentations by several outside scholars on their use of the site’s open access protocols to develop new research approaches through datamining. In addition, a half-day of workshops helped awardees brainstorm ideas for connecting with general users and the educational community,  as well as get technical guidance in working with the LC Newspaper Viewer, the core software applications that support the Chronicling America site, published by LC as open-source software.

Susan:  Describe some of the ways you can search this collection.

Deb:  We have full-text search for newspapers from all across the country covering almost a hundred years – you can find first-hand reporting on the battles of the Civil War, diverse voices during the years of Reconstruction, life events throughout families going back generations, and the scandals and crimes that riveted the reading public during these decades. You can explore land disputes, crop reports, society news in both cities and small communities, American perspectives on events across the world, fact and fiction in technological advances, poetry, serialized literature by such classic writers as Charles Dickens and Arthur Conan Doyle and much, much more. All digitized papers in the collection can be searched by date, location and full-text options in both a simple keyword search and a more advanced approach which allows users to zero in on specific times and places with combinations of words and phrases.

In addition to the digitized newspapers made available through the program (limited to those published between 1836 and 1922), the site also includes a separate searchable directory of US newspaper records, describing more than 150,000 titles published between 1690 to the present and listing libraries that have physical copies in microfilm or original print. This directory, derived from data collected under the US Newspaper Program (1982-2011, a precursor to NDNP supported by NEH and LC, providing funding and guidelines to state institutions to inventory, describe and selectively microfilm their state’s newspapers collections), helps users identify what’s available and where to go to find newspapers beyond those digitized for the Chronicling America site.

Susan:  What can you tell us about any updates and/or milestones during the last year?

Deb:  2012 has been a very exciting year for NDNP. We have experienced record-use of the Chronicling America site over the year as well as crossed the milestone of having more than 5 million pages available. (Actually, at this time, it’s 5.2 million and ever-growing with monthly contributions from awardees.) This year we added 4 new participants to the program representing Iowa, Maryland, Michigan, and North Carolina, and incorporated the technical ability to search non-English language newspapers in French, German, Italian and Spanish. (Currently we have text for French and Spanish only, but other titles are in the works.)

Susan:  What are plans for future updates, both short term and long term?

Deb:  More, more, more! Over the long-term, we want more states to participate and add more content (NEH runs an annual award competition for interested applicants)  All in all, over the next decade or so, we expect the collection to increase by tens of millions of pages gathered from all 54 states and territories.

Susan:  What are some of your favorite items in the collection?

Deb:  Some of my personal favorites are associated with the imaginative speculation by turn-of-the-century journalists. In an age when technology began to change lives dramatically year by year, the possibilities seemed endless. For example, an article in the Saint Paul Globe in 1904, told the story of the author jumping ten years ahead via a time machine to learn how his own works had stood up to time, only to find libraries dramatically changed. Instead of being places where people came to read books, libraries had evolved (in only 10 years!) to be places where books were read to people via phonograph or transferred as sound via wires.  Another favorite is a feature article from the Washington Times in 1907, describing recent seismic disruptions around the world and presenting popular theories about the cause.

From the St. Paul Globe, 1904

Susan:  What are some uses of this material that you’ve heard about from researchers?

Deb:  First, there are the genealogists who find these materials a treasure trove of unexpected details about family histories and events long past. A significant part of our user-base self-identifies as a “family historian.” Such folks can find articles on marriages, deaths, funerals, celebrations, scandals, day to day living and more. Teachers use the collection to teach narrative analysis and how perspective influenced (and still does) the news (e.g., articles during the Civil War from both Confederacy- and Union-aligned papers). Another type of use we see is downloading large parts of the collection through the Application Program Interface.

The data in Chronicling America is all in the public domain and available for open access. The software supporting the Web site has been designed to encourage reuse and holistic analysis of the data as a “big data” resource. This makes the site available to a new kind of research, the “digital humanities,” in which historians join with technologists to analyze large amounts of historic data in new ways. We know about research on geographic and chronological visualizations of newspaper publishing changes in the US, epidemiological studies incorporating how the “news” about disease spread and influenced societal behavior, and even linguistic analysis in different regions. We’d love to hear about other studies going on as well.

Susan:  Are there any preservation capabilities built into this collection?

Deb:  Access to these newspapers is key, and is intended to be sustained over time, so preservation capabilities are built into nearly every facet of the program. The digital objects LC prescribes for the program are based on the premise of a one-time opportunity to capture this diverse material scattered around the country so we need to  get the most we can out of this data. The specifications are intended to capture as much information from the original microfilm as practical (using print master duplicate negatives for the best copy) and for the newspapers to be described by the people who know them best, the original selectors and curators.

In addition to the specifications themselves, the data transfer and handling procedures used by the digitization vendors, awardees, and the Library promote having a collection of uniform and consistent self-describing digital objects, with verifiable data values that can be checked over time to ensure enduring access. At LC the data lifecycle is supported, as much as possible, with an automated workflow system of repository services that help us manage the validity and consistency of the data over time, as well as a detailed inventory of what, how much and where the data is in our systems.

Susan:  Could you describe the importance of collaboration in this project?

Deb:  Collaboration is one of the key elements supporting success for NDNP. Within LC itself, the program is a successful partnership between the curatorial newspaper collection managers who know the ins and outs of working with historic newspapers and the repository development technologists who design ever-more sophisticated and efficient ways to approach the management of this growing digital collection.  It’s also an important collaboration between federal agencies to provide the resources supporting the development of a national-level collection of historical newspapers. Most significantly, it’s a collaboration between institutions representing their state and local history and the sponsors of the program to build the best national resource possible from this important primary source material.

 

If You Can’t Open It, You Don’t Own It

On October 17, I had the extreme pleasure of hearing Cory Doctorow at the Library for talk entitled “A Digital Shift: Libraries, Ebooks and Beyond.”  Not surprisingly, the room was packed with attentive listeners. The talk covered a wide range of topics–his love of books as physical objects and his background working in libraries and …

Read more »

Revisiting NISO’s “A Framework for Building Good Digital Collections”

Today’s guest post is by Carlos Martinez III, a Hispanic Association of Colleges and Universities intern in the Library of Congress’s Office of Strategic Initiatives. The National Information Standards Organization provides standards to help libraries, developers and publishers work together. Their report, A Framework Guidance for Building Good Digital Collections, is still as helpful to organizations today …

Read more »

My Weekend Project

I bought a new computer this summer.  I immediately copied all of my digital files from my old computer to my new one and to an external hard drive. Now I had three copies of my digital content on three different devices. Because if something happens to one of those media, I’ve got two others …

Read more »

Bits Breaking Bad: The Atlas of Digital Damages

A question popped up in the blogosphere recently.  “Where is our Atlas of Digital Damages?” asked Barbara Sierman of the National Library of the Netherlands. She pointed out the amazement that would greet evidence of physical books, safely stored, with spontaneous and glaring changes in their content or appearance.  “Panic would be huge if this …

Read more »

DAMs Vs. LAMs: It’s On!

As digital preservation and stewardship professionals, we approach digital objects from a unique perspective. We evaluate the long-term value of any particular digital object and work to develop a technical and social infrastructure that will enable us to successfully preserve the objects over time. Preserving and providing appropriate access are our primary functions, but no …

Read more »

Archivematica and the Open Source Mindset for Digital Preservation Systems

I  had the distinct pleasure of hearing about the on-going development of the free and open-source Archivematica digital preservation system twice this year. First, from Peter Van Garderen at the CurateGear conference and second from Courtney Mumma at a recent briefing on the project for staff at The Library of Congress. Peter and Courtney both …

Read more »

Potential Residency Hosts Invited to Library of Congress

The following is a guest post by Ali Fazal and Claudia Martinez, both HACU interns with the Library’s Office of Strategic Initiatives. The National Digital Stewardship Residency program held a meeting with potential host institutions on Friday, September 21st. This program, which was created by the Library of Congress, Office of Strategic Initiatives, and the Institute …

Read more »

Get Your Bits Off (Old Storage Media)

The following is a guest post by Jefferson Bailey, Strategic Initiatives Manager at Metropolitan New York Library Council, National Digital Stewardship Alliance Innovation Working Group co-chair and a former Fellow in the Library of Congress’s Office of Strategic Initiatives. As a recent blog post recounted, each year at the National Book Festival NDIIPP has a …

Read more »

Media Archaeology and Digital Stewardship: An interview with Lori Emerson

In what we hope will become a regular feature here on The Signal I am excited to have a chance to chat with Lori Emerson, a representative from the newest member of the National Digital Stewardship Alliance. Lori is the Director of the Media Archaeology Lab and an Assistant Professor in the Department of English …

Read more »