ALTO Technical Metadata for Optical Character Recognition (OCR)

News Subscribe to ALTO news feed



Documents using the PDF format can be read using free software like Adobe Acrobat Reader. External link:

Get Acrobat Reader

About ALTO


The Analyzed Layout and Text Object (ALTO) XML Schema was initially developed by the METAe project group External URL: for use with the Library of Congress' Metadata Encoding and Transmission Schema (METS). While METS excels in describing the structure of objects, a schema related to the content and layout information of each piece of the object was missing. Claus Gravenhorst, who helped create ALTO for the METAe project, states that:

"During the METAe project, we learned that there is no standard to handle word positions and physical layout information (print space, margins, etc.), an essential feature for high performance repositories that are able to highlight elements within documents. Therefore, the ALTO schema has been developed. In the METS file, there are file pointers to the ALTO files that contain the text, other elements (illustrations, etc.), and word positions. We would like ALTO or a similar schema to become a standard as we do not see an alternative right now." [1]

The role that the Library of Congress shall assume is to help foster and spread adoption of ALTO as a standard, similar to other standards it currently promulgates.

  1. "Editors' Interview with Günter Mühlberger and Claus Gravenhorst of METAe." RLG DigiNews October 15, 2004. Available online at: External URL:

ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.