METS Java Toolkit, Version 1.5

We present a Java toolkit for the procedural construction, validation, and marshalling and unmarshalling for METS. METS, the Metadata Encoding & Transmission Standard, is intended to provide a standardized XML encoding for transmission of complex digital library objects between systems. While it provides standard containers and encoding mechanisms for descriptive and administrative metadata, it does not define the content or format of that metadata. However, the content and format of structural metadata is explicitly mandated within the METS specification.

The METS schema is expressed using the W3C XML Schema definition language. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.

METS incorporates by reference a subset of the XML XLink schema for defining simple relationships between METS files and external entities.

The current toolkit supports Mets Version 1.5 (April 12, 2005).

1 Architecture

The toolkit is a Java binding framework in which each element of a METS file is represented in memory by an instantiated object. Thus, the root <mets> element is represented by a Mets object, the <metsHdr> element by a MetsHdrs object, and so on.

Note that the class design, and the invocation signatures of some of the toolkit API class methods have changed in an incompatable manner from an earlier version of the toolkit.

2 Validation

The toolkit supports both local and global validation of METS files:

  1. Local
    1. Required attributes
    2. Required content model elements
  2. Global
    1. ID/IDREF consistency: all IDREFs reference a defined ID

Validation is performed by invoking the validate() method on an instantiated Mets objects:

mets.validate (new MetsValidator ());

The MetsValidator object maintains pertinent state information across the local validation of each element of the METS file to allow global validation.

3 Serialization/De-serialization

Serialization and de-serialization are performed by invoking the write() and read() methods of a Mets object:

mets.write (new MetsWriter (InputStream));
mets.read (new MetsReader (OutputStream));

4 Examples

The following example procedurally creates a new METS file. The general principle is to instantiate an object corresponding to a particular METS schema element, set its various attributes, and then add it to the content model of its parent:

Mets mets = new Mets ();
mets.setOBJID ("123456");
mets.setLABEL ("My title");
mets.setTYPE  ("myType");
mets.setPROFILE ("myProfile");
  MetsHdr metsHdr = new MetsHdr ();
    Agent agent = new Agent ();
    agent.setRole (Role.CREATOR);
      Name name = new Name ();
      name.getContent ().add (new PCData ("C. Reator"));
    agent.getContent ().add (name);
  metsHdr.getContent ().add (agent);
  ...
mets.getContent ().add (metsHdr);
...

mets.validate (new MetsValidator ());

FileOutputStream out = new FileOutputStream ("mets.xml");
mets.writer (new MetsWriter (out));
out.close ();

The following example de-serializes an existing METS file into an validated in-memory representation:

FileInputStream in = new FileInputStream ("mets.xml");
Mets mets = Mets.reader (new MetsReader (in));
in.close ();

mets.validate (new MetsValidator ());

5 Distribution Package

The toolkit is organized into three packages:

edu.harvard.hul.ois.mets
edu.harvard.hul.ois.mets.helper
edu.harvard.hul.ois.mets.helper.parser
The top-level package, edu.harvard.hul.ois.mets, contains the METS element binding classes, e.g., Mets for <mets>, MetsHeader for <metsHdr>, etc. The helper package, edu.harvard.hul.ois.mets.helper, contains various utility classes necessary for serializing (MetsWriter), de-serializing (MetsReader), and validation (Validator). The parser package, edu.harvard.hul.ois.mets.helper.parser, contains the low-level XML parser (Parser) and its supporting classes.

This version of the METS toolkit (Version 1.3.4, 2005-03-04) is made available under the terms of the GNU Lesser General Public License (LGPL) on the download page.

The distribution directory is organized as follows:

mets/
     COPYING                          # GNU Lesser General Public License
     LICENSE                          # METS Java Toolkit license information
     Makefile
     README                           # This file
     RELEASE                          # Release notes
     bin/                             # METS toolkit Jar file
              mets.jar
     classes/                         # Source and class file tree
              edu/harvard/hul/ois/mets/
                                       Makefile
                                       *.java
                                       helper/
                                              Makefile
                                              *.java
                                              parser/
                                                     Makefile
                                                     *.java
     doc/                             # Javadoc
              *.html
     examples/                        # Examples
              README
              Makefile
              Copy.java
              Write.java
     lib/ ...                         # This directory tree is empty
     src/ ...                         # This directory tree is empty
All three packages are aggregated into a single JAR file, mets.jar.

Note: The empty lib/ and src/ directory trees are present to conform to standard Harvard University Library development guidelines.

6 Toolkit API Documentation

Documentation for the METS toolkit API is available.

7 To Do

  1. Sequence order validation
  2. Support arbitrary entity references (toolkit supports character references (decimal and hexadecimal) and the standard named entites: &amp;, &apos;, &gt;, &lt;, and &quot;)
  3. ID/IDREF references to non-METS schema elements, e.g. DMDID reference to MODS element

Copyright 2003-2007 by the President and Fellows of Harvard College
Last updated 2007-07-16