Is JPEG-2000 a Preservation Risk?

This is a guest post by Chris Adams, in the Repository Development Center, technical lead for the World Digital Library at the the Library of Congress.

Like many people who work with digital imagery, I’ve been looking forward to the JPEG-2000 image format for a long time due to solid technical advantages: superior compression performance for both lossless masters and lossy access images, progressive decoding and multiple resolutions and tiling. Having a single format which is flexible enough to satisfy both preservation and access requirements is appealing, particularly at a time when many organizations are being forced to reconcile rising storage costs with shrinking budgets.

So, given clear technical advantages, why do many of my fellow software developers seem distinctly uneasy about using JPEG-2000? Johan van der Knijff has summarized a range of concerns but I want to focus on the last point from his guest post at the Wellcome Library’s JPEG-2000 blog:

According to [David Rosenthal], the availability of working open-source rendering software is much more important, and he explains how formats with open source renderers are, for all practical purposes, immune from format obsolescence…. Perhaps the best way to ensure sustainability of JPEG 2000 and the JP2 format would be to invest in a truly open JP2 software library, and release this under a free software license.

The most common concern I’ve heard about JPEG-2000 is the lack of high-quality tools and particularly support within the open-source world. I believe this is a critical concern for preservation.

1. Our future ability to read a file is a function of how widely it is used. Some formats achieve this through openness: it is highly unlikely that we will lose the ability to display JPEG files because they are ubiquitous and there are many high-quality implementations which are regularly used to encode and decode files produced by other implementations. (Adoption is one of the Library’s factors for assessing the sustainability of digital collections).

Some proprietary formats have achieved similar levels of confidence through pervasive market share: while it’s always possible for a single vendor to discontinue a product this would have significant repercussions and create strong demand from many organizations around the world for tools to migrate their orphaned content.

I would argue that JPEG-2000 is currently in the unfortunate position of having limited use outside of a few niches and the majority of users depend on proprietary software but might not represent a sufficiently large market to support multiple high-quality implementations. How likely is it that JP2 constitutes even 1% of the new images created every day? The lack of browser support ensures that JP2 is almost non-existent on the web and thus is not a factor in most software selection decisions.

2. The format is quite complex. Complexity significantly increases the barrier to entry for new implementations, particularly given the challenge of not only implementing the entire standard but reaching competitive performance as well. (Transparency is another key sustainability factor).

As a rough estimate of the relative complexity consider the size of two open-source JPEG-2000 implementations versus the entire Python Imaging Library, which supports several dozen formats as well as a general-purpose image processing toolkit (Core lines of code, generated using David A. Wheeler’s SLOCCount):

OpenJPEG C 49,892
libjasper C 26,458
PIL C 12,493, Python: 9,229

3. Limited resources for compliance testing implementations. The complexity of the format and the restricted specification provide many opportunities for developers to produce malformed files or fail to decode correct but obscure options.

In my primary role as the technical lead for the World Digital Library I’ve been asked to process files in most common formats created by the wide variety of software used by our many partners. JPEG-2000 has been by far the most common format requiring troubleshooting and, disturbingly, in most cases the files had already processed by at least one other program before a problem attracted closer inspection. While there have been some failures caused by programs which do not correctly support the entire format, more frequently the failure has been caused by a program applying stricter validation checks and rejecting a file with minor errors which had not been reported by the other tools used earlier in our processing.

Problems in this JP2 file are only evident at certain zoom levels. Different implementations would produce blank areas, random noise or the shifted fragment seen above. No GUI tool tested produced any warning for the malformed tile structure

Problems in this JP2 file are only evident at certain zoom levels. Different implementations would produce blank areas, random noise or the shifted fragment seen above. No GUI tool tested produced any warning for the malformed tile structure.

This is particularly disturbing when you consider the possibility that a file which is was viewable today could become problematic later after a bug is fixed!

A Role for Open Source

Over the last decade, open-source software has become pervasive as users spend considerable time working directly in open-source applications or, far more commonly, using applications which rely on open-source libraries. This is currently a problem from the perspective of JPEG-2000 as the most widespread open-source implementation unfortunately has some compatibility issues and is considerably slower than the modern commercial implementations. The good news is that many, many users — including those of several popular, high-volume websites — are using open-source libraries; targeted improvements there would benefit many people around the world.

The need for improved open-source support has also been touched upon before on this blog in Steve Puglia’s summary of the 20011 JPEG 2000 summit.

Browser Support is Critical

Users are increasingly using browsers to perform tasks that used to be considered solely the domain of traditional desktop applications. Photos which used to be stored and viewed locally are now increasingly being uploaded to social networks and photo sharing sites, a trend which will continue as HTML5 makes increasingly advanced web applications possible. In practice, this means that any image format which cannot be viewed directly in the average web browser will become a support burden for site operators and it becomes correspondingly tempting to adopt a storage format such as PNG or JPEG since most images will eventually need to be transcoded into those formats for display.

This is not in itself a threat to traditional preservation but it’s an additional complication that needs to be dealt with as internal file management applications are increasingly web based and it makes development of access services more expensive, again providing an incentive to simplify access at the expense of storage costs.

Of the popular browsers, only one supports JPEG-2000. For wider adoption, requirements for browsers to support JPEG-2000 will need to be detailed (one example is here).

Potential Use of OpenJPEG

OpenJPEG is emerging as a possible solution for many of these problems, particularly as the recently released version 2.0 added support for streaming and tiled decoding which deliver some of the greatest benefits relative to other formats. For applications such as Chronicling America which need to serve 60+ tile and thumbnail requests per second, this is far more of a limiting factor than the time required to decode the entire master image and was previously unavailable to open-source developers.

Conformance Testing

In addition to the standard JPEG 2000 conformance suite the OpenJPEG project has been developing their own test suite. An obvious area where the preservation community could assist would be in contributing not only examples of test images which exercise features which are important to us but also known-bad images for which tools should issue compliance warnings.

The preservation community should review existing suites and contribute additional freely-available tests to help validate implementations, with a focus on features which are less commonly used in other industries as well as strong conformance testing and improved detection and reporting of non-compliant files to help guide implementations towards stronger interoperability.

1/28/2013: Added additional information to introduction.

12 Comments

  1. Dave Rice
    January 28, 2013 at 7:36 pm

    Pleased to read this assessment. I wanted to add a few supporting comments on your 3 considerations.

    1. Use. I doubt use of jpeg2000 is anywhere near as high as 1%, but for whatever use there is the experience of use is further subdivided among different communities. For instance many jpeg2000 tools only support RGB-based colorspaces or bit depths that are multiples of 8. Some support 10 bit sampling but not losslessly. Some have limited support for various chroma subsampling patterns. When the objective of using jpeg2000 is to encode visual data losslessly there can often be a lot of work to identify jpeg2000 tools that support the pixel format, colorspace, bit depth, and chroma subsampling of the source visual data (transcoding any incoming visual data to the same pixel format in jpeg2000 to normalize the result would compromise the objective of a lossless representation of the original data). Additionally many encoders are not multi-threaded and thus too slow to practical use with video. So the tools relevant to a photo archivist may not at all be appropriate for the goals of a video archivist and vice versa.

    2. Complexity. This point is very true. I believe building a jpeg2000 encoder/decoder has been the only FFmpeg-managed Google Summer of Code project (http://wiki.multimedia.cx/index.php?title=FFmpeg_Summer_Of_Code) that has not been completed within the assigned time. Actually the jpeg2000 SOC project has probably been assigned within three different years and is still unfinished and in experimental status. [note: FFmpeg does support compilation with libopenjpeg as well as their native j2k codec]

    3. Compliance. Generally I’d advocate that archives should use a file format or codec in the simplest configuration possible in order to achieve preservation objectives. I find this also true with the QuickTime file format as well where the entire specification covers extensive complexity in order to facilitate an experience to the user but can produce an overly complex container architecture if not carefully managed. Also with lossless encodings like jpeg2000 there are testing formats such as framemd5 that can be used to verify that both the lossless compressed data and the source data decode to identical images to verify the intended losslessness.

    Thanks again,
    Dave Rice

  2. Gary McGath
    January 28, 2013 at 8:14 pm

    I too have concerns about the format’s long-term viability. Even the commercial options for JPEG2000 are not that great. Luratech’s implementation was very good for a while, but they seem to have lost interest in maintaining it. Kakadu has confusing license terms and a significant price just to evaluate it. OpenJPEG, at least when I last evaluated it, is slow.

  3. Susan
    January 29, 2013 at 10:59 am

    Thank you very much for this column. I work in a field that has been advocating JPEG200. I don’t have an IT background, and your column gave me a lot to think about in a succinct and easy-to-understand way. I now have the language to ask the necessary questions of those promoting this format.

  4. Chris Adams
    January 29, 2013 at 12:36 pm

    Dave Rice: one of our resident video experts brought up that line of questions and I feel it might warrant its own post, particularly as it sounds like it’s starting to become standardized in some fields.

    I strongly agree with your comments about the range of JPEG-2000 feature support, particularly given the pattern of narrow-but-deep adoption in fields might not have much software in common. To act on my last point, I think the first step would be to start surveying the files used within different specialties and collecting examples of valid JP2s using the less common features, perhaps seeing whether there’s a clear need for a new profile beyond e.g. the ones listed at http://www.digitalpreservation.gov/formats/fdd/fdd000138.shtml.

    Ultimately, I would like to have an open source project which would basically contain test images and canonical reference TIFFs with some sort of automated comparison tool as it’s currently somewhat daunting for someone outside of the community to know how to test an implementation. I doubt many people are going to try writing a JP2 codec from scratch but it’s quite easy to envision someone wanting to integrate a library such as OpenJPEG into their own project or perhaps fix bugs or optimize an implementation and wanting an easy way to confirm that they haven’t introduced some subtle regression.

  5. Andrea Goethals
    January 30, 2013 at 9:19 am

    Thanks for this post Chris. Your first 2 points sum up the concerns I’ve had with this format for awhile.

    On popularity: The digital preservation community doesn’t have the resources to maintain its own processing and rendering tools so we should try to stay away from niche formats.

    On complexity: PDF suffers from this same problem. The more complex and feature-filled a format becomes, the more difficult it is to write supporting creation/validation/rendering software. I think that this is a more difficult problem than the first because many times the things that make a format complex are the things that make them appealing for keeping storage costs and bandwidth use down and for creating rich user experiences.

  6. Lars
    January 30, 2013 at 11:41 am

    Excellent summary and nice discussion. One things does not make sense really. Why does the amount of images play a role? Of course there is an endless amount of simple and small images out there where jpeg is fully sufficient but there are also areas like DCINEMA or medical images, geo etc. where the images are very big or where quality is crucial. That is why there is no alternative in some areas (like DCINEMA). Does preservation then really mean to not preserve those items or only take snapshots?
    Also when it comes to size we are used to copy images, no matter what size is required for the targetted device, think of smartphones etc. So we move data that is not required and wait for ages, sometimes we do not even show these images because they are too big.
    I implement solutions based on Kakadu for a decade now. I agree the lic terms are not easy and they basically prohibit to publish code but everybody who owns a lic can publish all sorts of applications. On the other hand there are limitations to open-source. David Taubman is working on that for a long time now and has all the expertise, this is impossible without licence fees.
    For me the bottom line is: Keep it simple when the image is simple but also invest and preserve when quality is crucial or the importance is high.

  7. Paul Wheatley
    January 31, 2013 at 7:13 am

    Excellent post Chris. I agree that collecting example files would be a really useful community contribution. You and others are welcome to contribute to the Format Corpus we’ve been gradually building on Git, which already has some JPEG2000 files there. It might be a useful staging post for collating contributions before moving them in to some of the test suites you’ve mentioned.
    https://github.com/openplanets/format-corpus

  8. Chris Adams
    January 31, 2013 at 5:52 pm

    Lars: I completely agree about the prospective savings with large numbers of high-resolution images. One way to approach this would be for us as a community prioritize building better JPEG-2000 support now – particularly in common image-processing tools where weak/slow support actively discourages use – and treating it as a long-term investment which will allow us to save significant amounts of storage and transfer cost for many years to come.

    Andrea Goethals: I strongly agree with your general conclusion about processing and rendering tools, although I would argue that JPEG-2000 could be an exception because we’re not starting from scratch. Helping with some rough edges on OpenJPEG or integrating it into a popular tool like ImageMagick or GraphicsMagick is significantly less work than producing a new implementation from scratch or convincing millions of users to adopt a new image processing application.

    Paul Wheatley: the format corpus sounds like a great starting point. I’ve been talking internally about surveying the various features which we’re using, which would be a good starting point for seeing where contributions would be useful.

  9. Gary McGath
    February 4, 2013 at 8:08 am

    If what’s needed is a lossless format for large images, what about BigTIFF? It doesn’t seem to be widely used, but it’s supported by TiffLib, so lots of people already have the code to create it, and it’s really just TIFF with some parameters changed to allow 64-bit offsets, so it’s a conservative approach technologically.

  10. David S. H. Rosenthal
    February 4, 2013 at 9:37 am

    Thank you, Chris, for a fascinating post. It reinforces the point I’ve made since 2007 that the key attribute for a format’s survivability is that it have strong open source support. Formats that get wide adoption will have strong open source support and those that don’t, won’t. Its a Darwinian world out there.

    This suggests that the idea of “preservation formats” as opposed to “access formats” is a trap. Precisely because they aren’t access formats, preservation formats are less likely to have the strong open source support that enables successful preservation.

    But your most interesting observation is “more frequently the failure has been caused by a program applying stricter validation checks and rejecting a file with minor errors which had not been reported by the other tools used earlier in our processing.” This is a failure to observe Postel’s Law, which I blogged about in the context of the use of format validation tools in preservation pipelines.

    Postel’s Law is fundamental to the Internet; it says you should be strict in what you emit and liberal in what you consume. What we care about is whether the preserved file can be rendered legibly, not whether it conforms to one tool developer’s interpretation of the standard versus another’s,

  11. Johan
    February 6, 2013 at 8:07 am

    Chris, David,

    Following David’s comment above I just re-read Chris’ statement on validation:

    While there have been some failures caused by programs which do not correctly support the entire format, more frequently the failure has been caused by a program applying stricter validation checks and rejecting a file with minor errors which had not been reported by the other tools used earlier in our processing.

    Reading this makes me wonder if validation is really the problem here. This would be only true if we had the following situation:

    Encoder A is used to create a JP2
    Decoder B fails to decode it because before doing so it applies some checks (‘validation’) which are too strict

    The assumption here is that decoder B would actually be able to decode the file if the checks were left out altogether!

    Based on my own experience, in the majority of cases the real issue is that the JP2 created by encoder A simply contains features that are not (fully) supported by decoder B, leading to interoperability problems. If you look at JPEG 2000′s Codestream Syntax specification it’s also easy to see how such things may be happening, as it contains quite some features that are optional and which may not be fully supported by all decoders. So often it’s not even about slightly different interpretations of the standard, but rather about incomplete interpretations. And since for JP2 we don’t have many decoding options to begin with this is somewhat worrying.

    But maybe you have some specific examples to the contrary?

    Also a final note/warning on validation: even my jpylyzer validator is still somewhat limited in this regard, as for the image codestreams I’ve only managed to include support for the required (non-optional) marker segments for the main codestream header (5 out of a total of 13) so far. In addition jpylyzer only provides information on 5 out of the 11 marker segments that can occcur at the tile-part level.

    In practical terms this means that 2 JP2s that were created by 2 different encoders may yield similar jpylyzer output (in terms of validation results and reported marker segments), even if one of them contains some of the less frequently-used marker segments.

    At this stage these limitations are mainly due to limited time and resources, and limited availability of sample files for the less-common marker segments, but eventually I would like to add these features.

    Johan

  12. David S. H. Rosenthal
    February 9, 2013 at 4:51 pm

    Johan’s observations are interesting but I stand by my comment that if a program, whether a decoder or a validator, is “rejecting a file with minor errors” then it is not conforming to Postel’s Law. This may be because it is nit-picking, or because it is incomplete, but in the light of Postel’s Law either way it is wrong.

    If this wrong-ness causes the “file with minor errors” to be rejected for preservation that is a serious problem. Given the limited resources available for preservation, an only slightly less serious problem is that it is wasting the valuable time of people like Chris.

Add a Comment

This blog is governed by the general rules of respectful civil discourse. You are fully responsible for everything that you post. The content of all comments is released into the public domain unless clearly stated otherwise. The Library of Congress does not control the content posted. Nevertheless, the Library of Congress may monitor any user-generated content as it chooses and reserves the right to remove content for any reason whatever, without consent. Gratuitous links to sites are viewed as spam and may result in removed comments. We further reserve the right, in our sole discretion, to remove a user's privilege to post content on the Library site. Read our Comment and Posting Policy.

Required fields are indicated with an * asterisk.