Read All About It! An Update on the National Digital Newspaper Program

Deb Thomas

Here at the Library of Congress, there are many projects underway to digitize and make available vast amounts of historic, archival material.  One such project is the National Digital Newspaper Program, providing access to millions of pages from historic newspapers (a previous blog post provides an introduction).  Deb Thomas, NDNP program coordinator here at the Library, answers some questions about this amazing project – what’s been accomplished over the past year, as well as goals for the future.

Susan:  Could you give us a quick summary of this project?

Deb:  The National Digital Newspaper Program supports digitization and enhanced access to millions of historic newspaper pages selected and produced by state libraries, universities and historical societies across the United States. Jointly supported by the National Endowment for the Humanities and the Library of Congress, the program is a collaboration of national proportions, currently providing access to 5.2 million pages from more than 800 newspaper titles, published in 25 states between 1836 and 1922 (and growing – we add additional newspapers regularly.) The collection is made freely available through the Chronicling America website hosted by the Library of Congress. These newspapers provide the first glimpse into historic events, as well as the diverse voices and differing perspectives on the life, times, and activities of nineteenth and early-twentieth century America.

Susan:  Tell us about some highlights from your recent partners meeting

Deb:  Each year, awardees in the program gather in Washington, DC, to share their experiences and activities throughout the program. In late September, fifty-eight representatives of 27 active projects and 2 “alumni” projects joined in 2 days of meetings, including presentations from LC and the NEH on progress in the program, new developments and outreach activities. Throughout the meeting, state participants presented on their own activities, sharing their experiences with the challenges of mass digitization and their own plans for promoting the program, collections and Chronicling America.

Highlights of the agenda this year included discussions on the addition of non-English ethnic newspapers to the Chronicling America site by awardees in New Mexico, Arizona and Louisiana and presentations by several outside scholars on their use of the site’s open access protocols to develop new research approaches through datamining. In addition, a half-day of workshops helped awardees brainstorm ideas for connecting with general users and the educational community,  as well as get technical guidance in working with the LC Newspaper Viewer, the core software applications that support the Chronicling America site, published by LC as open-source software.

Susan:  Describe some of the ways you can search this collection.

Deb:  We have full-text search for newspapers from all across the country covering almost a hundred years – you can find first-hand reporting on the battles of the Civil War, diverse voices during the years of Reconstruction, life events throughout families going back generations, and the scandals and crimes that riveted the reading public during these decades. You can explore land disputes, crop reports, society news in both cities and small communities, American perspectives on events across the world, fact and fiction in technological advances, poetry, serialized literature by such classic writers as Charles Dickens and Arthur Conan Doyle and much, much more. All digitized papers in the collection can be searched by date, location and full-text options in both a simple keyword search and a more advanced approach which allows users to zero in on specific times and places with combinations of words and phrases.

In addition to the digitized newspapers made available through the program (limited to those published between 1836 and 1922), the site also includes a separate searchable directory of US newspaper records, describing more than 150,000 titles published between 1690 to the present and listing libraries that have physical copies in microfilm or original print. This directory, derived from data collected under the US Newspaper Program (1982-2011, a precursor to NDNP supported by NEH and LC, providing funding and guidelines to state institutions to inventory, describe and selectively microfilm their state’s newspapers collections), helps users identify what’s available and where to go to find newspapers beyond those digitized for the Chronicling America site.

Susan:  What can you tell us about any updates and/or milestones during the last year?

Deb:  2012 has been a very exciting year for NDNP. We have experienced record-use of the Chronicling America site over the year as well as crossed the milestone of having more than 5 million pages available. (Actually, at this time, it’s 5.2 million and ever-growing with monthly contributions from awardees.) This year we added 4 new participants to the program representing Iowa, Maryland, Michigan, and North Carolina, and incorporated the technical ability to search non-English language newspapers in French, German, Italian and Spanish. (Currently we have text for French and Spanish only, but other titles are in the works.)

Susan:  What are plans for future updates, both short term and long term?

Deb:  More, more, more! Over the long-term, we want more states to participate and add more content (NEH runs an annual award competition for interested applicants)  All in all, over the next decade or so, we expect the collection to increase by tens of millions of pages gathered from all 54 states and territories.

Susan:  What are some of your favorite items in the collection?

Deb:  Some of my personal favorites are associated with the imaginative speculation by turn-of-the-century journalists. In an age when technology began to change lives dramatically year by year, the possibilities seemed endless. For example, an article in the Saint Paul Globe in 1904, told the story of the author jumping ten years ahead via a time machine to learn how his own works had stood up to time, only to find libraries dramatically changed. Instead of being places where people came to read books, libraries had evolved (in only 10 years!) to be places where books were read to people via phonograph or transferred as sound via wires.  Another favorite is a feature article from the Washington Times in 1907, describing recent seismic disruptions around the world and presenting popular theories about the cause.

From the St. Paul Globe, 1904

Susan:  What are some uses of this material that you’ve heard about from researchers?

Deb:  First, there are the genealogists who find these materials a treasure trove of unexpected details about family histories and events long past. A significant part of our user-base self-identifies as a “family historian.” Such folks can find articles on marriages, deaths, funerals, celebrations, scandals, day to day living and more. Teachers use the collection to teach narrative analysis and how perspective influenced (and still does) the news (e.g., articles during the Civil War from both Confederacy- and Union-aligned papers). Another type of use we see is downloading large parts of the collection through the Application Program Interface.

The data in Chronicling America is all in the public domain and available for open access. The software supporting the Web site has been designed to encourage reuse and holistic analysis of the data as a “big data” resource. This makes the site available to a new kind of research, the “digital humanities,” in which historians join with technologists to analyze large amounts of historic data in new ways. We know about research on geographic and chronological visualizations of newspaper publishing changes in the US, epidemiological studies incorporating how the “news” about disease spread and influenced societal behavior, and even linguistic analysis in different regions. We’d love to hear about other studies going on as well.

Susan:  Are there any preservation capabilities built into this collection?

Deb:  Access to these newspapers is key, and is intended to be sustained over time, so preservation capabilities are built into nearly every facet of the program. The digital objects LC prescribes for the program are based on the premise of a one-time opportunity to capture this diverse material scattered around the country so we need to  get the most we can out of this data. The specifications are intended to capture as much information from the original microfilm as practical (using print master duplicate negatives for the best copy) and for the newspapers to be described by the people who know them best, the original selectors and curators.

In addition to the specifications themselves, the data transfer and handling procedures used by the digitization vendors, awardees, and the Library promote having a collection of uniform and consistent self-describing digital objects, with verifiable data values that can be checked over time to ensure enduring access. At LC the data lifecycle is supported, as much as possible, with an automated workflow system of repository services that help us manage the validity and consistency of the data over time, as well as a detailed inventory of what, how much and where the data is in our systems.

Susan:  Could you describe the importance of collaboration in this project?

Deb:  Collaboration is one of the key elements supporting success for NDNP. Within LC itself, the program is a successful partnership between the curatorial newspaper collection managers who know the ins and outs of working with historic newspapers and the repository development technologists who design ever-more sophisticated and efficient ways to approach the management of this growing digital collection.  It’s also an important collaboration between federal agencies to provide the resources supporting the development of a national-level collection of historical newspapers. Most significantly, it’s a collaboration between institutions representing their state and local history and the sponsors of the program to build the best national resource possible from this important primary source material.

 

Potential Residency Hosts Invited to Library of Congress

The following is a guest post by Ali Fazal and Claudia Martinez, both HACU interns with the Library’s Office of Strategic Initiatives. The National Digital Stewardship Residency program held a meeting with potential host institutions on Friday, September 21st. This program, which was created by the Library of Congress, Office of Strategic Initiatives, and the Institute …

Read more »

The October 2012 Library of Congress Digital Preservation Newsletter is now available

The October 2012 Library of Congress Digital Preservation Newsletter is now available. http://www.digitalpreservation.gov/news/newsletter/201210.pdf In this issue: *Find out how you can help define levels of digital preservation *Reflections on CurateCamp processing *Read about three individuals who are working on the preservation of video games *Learn about the difference between domains and subdomains in web archiving …

Read more »

Yes, The Library of Congress Has Video Games: An Interview with David Gibson

Video games represent one of the most difficult challenges for digital preservationists. Created for a diverse array of hardware and software platforms, rife with rights issues, and as expressive creative works objects which one hopes to attend to the highest levels of artifactual qualities. Despite being one of the most challenging forms of content, there …

Read more »

Who Do You Want to Be Today?

I lead a group that develops software for the management, preservation and delivery of digital collections.  In some organizations, digital preservation is part of the physical preservation unit.  In some organizations, software development is part of the systems office.  Or software development might be part of a central IT unit. I work with colleagues who …

Read more »

Collaborating to Identify Government or Election-related Websites to Preserve

The following is a guest post by Abbie Grotke, Web Archiving Team Lead. Is a U.S. Government website or part of a site you use or know about at risk of disappearing? Is there a website related to the 2012 U.S. Elections that you think should be preserved? Always dreamed of contributing to a collaborative …

Read more »

From AIP to Zettabyte: Comparing Digital Preservation Glossaries

The following is a guest post by Emily Reynolds, a 2012 Junior Fellow. As we mentioned in our introductory post last month, the OSI Junior Fellows are working on a project involving a draft digital preservation policy framework. One component of our work is revising a glossary that accompanies the framework. We’ve spent the last …

Read more »

Rescuing the Tangible From the Intangible

They’re the red-headed stepchildren of the digital age. They’re neither retro chic (all things being relative, of course) like the server arrays that support “big data,” nor are they as cute as the thumb drives made to look like your favorite Star Wars character (or more oddly, chicken feet). Of what do I speak? The …

Read more »