ALTO Technical Metadata for Optical Character Recognition (OCR)

News Subscribe to ALTO news feed



Documents using the PDF format can be read using free software like Adobe Acrobat Reader. External link:

Get Acrobat Reader

Using ALTO with METS

ALTO was created for usage with METS. While using METS to wrap ALTO instances is not a requirement, most implementers have chosen to utilize ALTO inside of a METS wrapper. In order to do so, references can be made within the METS <area> element that is used within the METS <structMap>.

Screenshot of a METS area element

The FILEID attribute on <area> refers to the following structure within the METS <fileGrp> element under <fileSec>:

Screenshot of a METS fileGrp element

The BEGIN attribute on <area> then points into the ALTO file itself within one of the children of the METS <amdSec>.

ALTO (Analyzed Layout and Text Object) is a XML Schema that details technical metadata for describing the layout and content of physical text resources, such as pages of a book or a newspaper. It most commonly serves as an extension schema used within the Metadata Encoding and Transmission Schema (METS) administrative metadata section. However, ALTO instances can also exist as a standalone document used independently of METS.