Tool highlight: Jpylyzer

The JP2 format (which is part of the JPEG 2000 image compression standard) is getting increasingly popular for storing digitised content, both for access and preservation. However, quality control of JP2 imaging workflows has been poorly addressed so far. By early 2011 the British Library was confronted with JP2 images that contained damage that could not be detected with existing tools, such as JHOVE. A few months later the KB started initial preparations for migrating 146 TB of TIFF images to JP2. They realised that the possibility of hardware failure (e.g. short network interruptions) during the migration process would imply a major risk for the creation of malformed and damaged files.

This is why SCAPE started working on a validation and feature extraction tool for the JP2 (JPEG 2000 Part 1) still image format. The tool, called Jpylyzer, was specifically created to answer the following questions about any JP2 file:

  • Is this really a JP2 and does it really conform to the format’s specifications (validation)?
  • What are the technical characteristics of this image (feature extraction)?

Example of a corrupted JP2 file of a newspaper page

The software was developed as part of the SCAPE work on addressing improved quality assurance for automated digitisation and imaging workflows. The initial development was done by Johan van der Knijff (KB); later versions of the software included contributions by various other people (both inside and outside of SCAPE).

Jpylyzer was designed to cover the following two aspects of quality assurance in JPEG 2000 imaging workflows:

  1. Validation against the JP2 format specifications, which ensures that images are standards compliant. It is also effective for detecting common forms of byte-level corruption.
  2. Validation of image and encoding properties against an institute-specific profile. Jpylyzer doesn’t perform this second type of validation by itself, but its extracted properties can be validated against an external schema that defines the institute-specific profile.

Although other tools exist for these tasks, their ‘validation’ scope is rather limited, which introduces the risk that malformed and badly damaged images will go unnoticed. In addition, quite a few tools exist for extracting properties for this format, but most of these either focus on the JP2 header fields or on the codestream headers. Jpylyzer was designed with the aim of filling these gaps.

Following completion of the initial tool in December 2011, The British Library has now implemented this solution at scale, and other organisations beyond SCAPE are currently trialling it.

In order to make a tool sustainable, it is important that its maintenance and development are not solely dependent on one single institution or person. Because of this, jpylyzer is now hosted by the Open Planets Foundation, which ensures the involvement of a wider community. Jpylyzer also has its own home page on the OPF site. It contains links to the source code, Windows executables, Debian packages, blog posts and the User Manual.

A paper on the Jpylyzer will be presented at the Archiving 2012 conference (12-15 June, Copenhagen, Denmark, http://www.imaging.org/ist/conferences/archiving/): “Improved validation and feature extraction for JP2 (JPEG 2000 Part 1) images: the jpylyzer tool” by Johan van der Knijff (Open Planets Foundation and KB National Library of the Netherlands), René van der Ark (KB) and Carl Wilson (The British Library)

Leave a Reply