This is a guest post by Chris Adams, in the Repository Development Center, technical lead for the World Digital Library at the the Library of Congress.
Like many people who work with digital imagery, I’ve been looking forward to the JPEG-2000 image format for a long time due to solid technical advantages: superior compression performance for both lossless masters and lossy access images, progressive decoding and multiple resolutions and tiling. Having a single format which is flexible enough to satisfy both preservation and access requirements is appealing, particularly at a time when many organizations are being forced to reconcile rising storage costs with shrinking budgets.
So, given clear technical advantages, why do many of my fellow software developers seem distinctly uneasy about using JPEG-2000? Johan van der Knijff has summarized a range of concerns but I want to focus on the last point from his guest post at the Wellcome Library’s JPEG-2000 blog:
According to [David Rosenthal], the availability of working open-source rendering software is much more important, and he explains how formats with open source renderers are, for all practical purposes, immune from format obsolescence…. Perhaps the best way to ensure sustainability of JPEG 2000 and the JP2 format would be to invest in a truly open JP2 software library, and release this under a free software license.
The most common concern I’ve heard about JPEG-2000 is the lack of high-quality tools and particularly support within the open-source world. I believe this is a critical concern for preservation.
1. Our future ability to read a file is a function of how widely it is used. Some formats achieve this through openness: it is highly unlikely that we will lose the ability to display JPEG files because they are ubiquitous and there are many high-quality implementations which are regularly used to encode and decode files produced by other implementations. (Adoption is one of the Library’s factors for assessing the sustainability of digital collections).
Some proprietary formats have achieved similar levels of confidence through pervasive market share: while it’s always possible for a single vendor to discontinue a product this would have significant repercussions and create strong demand from many organizations around the world for tools to migrate their orphaned content.
I would argue that JPEG-2000 is currently in the unfortunate position of having limited use outside of a few niches and the majority of users depend on proprietary software but might not represent a sufficiently large market to support multiple high-quality implementations. How likely is it that JP2 constitutes even 1% of the new images created every day? The lack of browser support ensures that JP2 is almost non-existent on the web and thus is not a factor in most software selection decisions.
2. The format is quite complex. Complexity significantly increases the barrier to entry for new implementations, particularly given the challenge of not only implementing the entire standard but reaching competitive performance as well. (Transparency is another key sustainability factor).
As a rough estimate of the relative complexity consider the size of two open-source JPEG-2000 implementations versus the entire Python Imaging Library, which supports several dozen formats as well as a general-purpose image processing toolkit (Core lines of code, generated using David A. Wheeler’s SLOCCount):
|PIL||C||12,493, Python: 9,229|
3. Limited resources for compliance testing implementations. The complexity of the format and the restricted specification provide many opportunities for developers to produce malformed files or fail to decode correct but obscure options.
In my primary role as the technical lead for the World Digital Library I’ve been asked to process files in most common formats created by the wide variety of software used by our many partners. JPEG-2000 has been by far the most common format requiring troubleshooting and, disturbingly, in most cases the files had already processed by at least one other program before a problem attracted closer inspection. While there have been some failures caused by programs which do not correctly support the entire format, more frequently the failure has been caused by a program applying stricter validation checks and rejecting a file with minor errors which had not been reported by the other tools used earlier in our processing.
This is particularly disturbing when you consider the possibility that a file which is was viewable today could become problematic later after a bug is fixed!
A Role for Open Source
Over the last decade, open-source software has become pervasive as users spend considerable time working directly in open-source applications or, far more commonly, using applications which rely on open-source libraries. This is currently a problem from the perspective of JPEG-2000 as the most widespread open-source implementation unfortunately has some compatibility issues and is considerably slower than the modern commercial implementations. The good news is that many, many users — including those of several popular, high-volume websites — are using open-source libraries; targeted improvements there would benefit many people around the world.
The need for improved open-source support has also been touched upon before on this blog in Steve Puglia’s summary of the 20011 JPEG 2000 summit.
Browser Support is Critical
Users are increasingly using browsers to perform tasks that used to be considered solely the domain of traditional desktop applications. Photos which used to be stored and viewed locally are now increasingly being uploaded to social networks and photo sharing sites, a trend which will continue as HTML5 makes increasingly advanced web applications possible. In practice, this means that any image format which cannot be viewed directly in the average web browser will become a support burden for site operators and it becomes correspondingly tempting to adopt a storage format such as PNG or JPEG since most images will eventually need to be transcoded into those formats for display.
This is not in itself a threat to traditional preservation but it’s an additional complication that needs to be dealt with as internal file management applications are increasingly web based and it makes development of access services more expensive, again providing an incentive to simplify access at the expense of storage costs.
Of the popular browsers, only one supports JPEG-2000. For wider adoption, requirements for browsers to support JPEG-2000 will need to be detailed (one example is here).
Potential Use of OpenJPEG
OpenJPEG is emerging as a possible solution for many of these problems, particularly as the recently released version 2.0 added support for streaming and tiled decoding which deliver some of the greatest benefits relative to other formats. For applications such as Chronicling America which need to serve 60+ tile and thumbnail requests per second, this is far more of a limiting factor than the time required to decode the entire master image and was previously unavailable to open-source developers.
In addition to the standard JPEG 2000 conformance suite the OpenJPEG project has been developing their own test suite. An obvious area where the preservation community could assist would be in contributing not only examples of test images which exercise features which are important to us but also known-bad images for which tools should issue compliance warnings.
The preservation community should review existing suites and contribute additional freely-available tests to help validate implementations, with a focus on features which are less commonly used in other industries as well as strong conformance testing and improved detection and reporting of non-compliant files to help guide implementations towards stronger interoperability.
1/28/2013: Added additional information to introduction.