Harvard University Library / Office for Information Systems
Abstract
We present a Java toolkit for the procedural construction, validation, and marshalling and unmarshalling for METS, the Metadata Encoding & Tranmission Standard for XML encoding descriptive, administrative, and structural metadata regarding objects within a digital library. The toolkit API is based on Sun's JAXB binding framework, under which elements are represented by classes with accessor and mutator methods for attributes and the element content model, and additional methods for validation and marshalling to and from instance documents. JAXB was chosen as the basis for the METS API due to its anticipated market acceptance as a key component of Sun's Java XML Pack bundle. The toolkit's parser is James Clark's XP.
The toolkit was developed to allow procedural processing of METS files in the context of an archiving project in which multiple content providers submit materials packaged in METS files to a centralized archive. To achieve necessary operating efficiencies, a maximum level of automation is required for the creation of syntactically valid METS files on the provider side and for the ingest of those METS files on the archive side. The toolkit will be used as the basis for development of these automated systems.
The METS Framework
METS is intended to provide a standardized XML encoding for transmission of complex digital library objects between systems. While it provides standard containers and encoding mechanisms for descriptive and administrative metadata, it does not define the content or format of that metadata. However, the content and format of structural metadata is explicitly mandated within the METS specification.
METS incorporates by reference a subset of the XML XLink schema for defining simple relationships between METS files and external entities.
The METS schema is expressed using the W3C XML Schema definition language. The standard is maintained in the Network Development and MARC Standards Office of the Library of Congress, and is being developed as an initiative of the Digital Library Federation.
The JAXB Framework
In creating an API for the METS toolkit we had two options:
The JAXB product is comprised of a set of base classes (the javax.xml.bind package); a schema-driven compiler for generating schema-specific binding classes, which are sub-classed from the JAXB base classes; and a set of run-time marshalling classes (the javax.xml.marshal package) invokable by the derived binding classes. JAXB provides mechanisms for local and global type and structural validation, and for marshalling to and from binding class instatiations and XML instance documents.
The organization of JAXB is presented graphically in the following figure:
The JAXB specification is still in early pre-release development. Although a preliminary early version implementation of the class generator has been released, it only accepts as input a subset of the XML DTD language; XML schema is not supported at all. Thus, all of the base classes as well as the METS schema-specific class used in the toolkit were constructed manually, following the JAXB API specification.
Note: Sun has announced that the future production release of the JAXB specification may be incompatible with the current pre-release version. The question of whether to stay with the current format of the METS toolkit or whether to migrate it into compliance with subsequent JAXB specification at some future time remains under discussion.
The Toolkit
The METS toolkit consists of an implementation of the javax.xml.bind and javax.xml.marshal packages (org.mets.xml.bind and org.mets.xml.marshal, respectively) and the METS API package org.mets.xml.mets following the JAXB binding framework API for schema-derived classes.
In general, each element defined in the METS schema maps to a public class:
public class Element
{
public Element ();
...
}
with public type-specific accessor and mutator methods for all element attributes:
public type getAttr (); // Scalar-valued attribute public void setAttr (type value); public List getAttr (); // List-valued attribute public void List.add (type value); ...
and content model:
public List getContent (); public void List.add (Element content); ...
List-valued attributes are represented as ordered lists of objects stored in an ArrayList object, which implements the List interface. Thus, to add a value to list-valued attribute, first retrieve the list and then use one of the standard List mutator methods to add the child to the list:
Element element = new Element (); ... List list = element.getAttr (); list.add (type value);
Similarly, child elements are represented in the content model as an ordered list of objects. To add a repeatable child element to its parent's content model, first retrieve the list and then use one of the standard mutator methods to add the child to the list:
Parent parent = new Parent (); ... Child child = new Child (); // Instantiate child element List list = parent.getContent (); // Get parent content model list list.add (child); // Add child to content model
The Mets class encapsulating the root <mets> element provides public methods for validation:
public void validate ();
and marshalling to and from instance documents:
public void marshal (OutputStream os); public void unmarshal (InputStream in);
To encapsulate arbitrary metadata elements, possibly namespace-qualified, defined external to the METS specification, the toolkit provides a generic element class:
public class Any
{
public Any (String name);
public Any (String namespace, String name);
public String getNamespace ();
public String getName ();
public String getQName ();
...
}
with a public accessor and mutator methods for ID attributes:
public String getID (); public void setID (String id);
a public list-valued accessor method for all other attributes:
public List getAttributes (); public void List.add (Object value); ...
and a public list-valued accessor method for its content model:
public List getContent (); public void List.add (Element content); ...
A generic attribute class is provided to support type-indendent manipulation of attributes, possibly namespace-qualified:
public class Attribute
{
public Attribute (String name, Object value);
public Attribute (String namespace, String name, Object value);
public String getNamespace ();
public String getName ();
public String getQName ();
public Object getValue ();
}
Note that the generic element class does not perform any local validation on its attributes or content; global validation is performed for ID uniqueness.
The toolkit explicitly defines the default namespace to be the METS namespace, http://www.loc.gov/METS/.
The XMLScanner class used for unmarshalling is built using the token-level interface of XP.
Procedural Construction
The general recursive rule for procedural construction of a METS file is:
Mets mets = new Mets ();
mets.setID ("1234");
...
MetsHdr metsHdr = new MetsHdr ();
metsHdr.setRECORDSTATUS ("Prod");
...
Agent agent = new Agent ();
agent.setROLE (Role.CREATOR);
...
Name name = new Name ();
name.setContent (new PCData ("S. L. Abrams"));
...
agent.getContent ().add (name);
...
metsHdr.getContent ().add (agent);
...
mets.getContent ().add (metsHdr);
...
DmdSec dmdSec = new DmdSec ();
dmdSec.setID ("abc");
...
MdWrap mdWrap = new MdWrap ();
mdWrap.setMDTYPE (Mdtype.DC);
...
XmlData xmlData = new XmlData ();
Any any = new Any ("my", "descMD");
any.getAttributes ().add (new Attribute ("my", "attr", "value"));
...
...
any.getContent ().add (...);
...
xmlData.getContent ().add (any);
...
mdWrap.getContent ().add (xmlWrap);
...
DmdSec.getContent ().add (mdWrap);
...
mets.getContent ().add (dmdSec);
...
Once the entire content tree has been constructed, it can be validated and marshalled.
mets.validate (); mets.marshal (System.out);
Validation
Validation occurs at both the global (document scope) and local level (element scope).
Note: In the current implementation, the IDREF-to-ID consistency validation is performed using an mechanism that is an extension to JAXB.
Note: In the current implementation, the ordering of sequential content model elements is not validated.
Marshalling/Unmarshalling
The procedure to marshal a METS file is to create an in-memory representation composed of instantiated objects, and invoke the root <mets> class validate() and marshal() methods:
Mets mets = new Mets (); ... mets.validate (); mets.marshal (outputStream);
This generates a well-formed and valid METS file on the output stream.
The procedure to unmarshal a METS file to an in-memory representation is to instantiate the root <mets> class, and invoke its unmarshal() and validate() methods:
Mets mets = Mets.unmarshal (inputStream); mets.validate ();
At this point, the entire document is represented by a tree structure of instantiated objects, against which the standard accessor and mutator functions can be invoked.
Implementation
The current version of the METS toolkit is based on the METS schema Version 1.0 (zeta) of February 8, 2002; the JAXB working draft specification Version 0.21 of May 30, 2001; and version 0.5 of XP. The toolkit has constructed and tested with the Sun J2SE 1.3.1_02-b02 JDK under Solaris 2.7.
Javadoc for the METS tooklit packages is available.
A distribution version of the METS toolkit can be downloaded as a gzipped tar file, mtk-20020315.tar.gz (378 KB).
gunzip mtk-20020315.tar.gz tar xvf mtk-20020315.tar
The distribution directory structure is as follows:
mtk-1.0/
README
Makefile
Marshal.java # Test marshalling application
Unmarshal.java # Test unmarshalling application
marshal.xml # Marshalling output
unmarshal.xml # Unmarshalling output
bin/
bind.jar
marshal.jar
mets.jar
xp.jar
com/
jclark/
util/
...
xml/
tok/
ContentToken.java
Encoding.java
...
doc/
index.html
...
org/
mets/
xml/
bind/
Makefile
MarshallableObject.java
MarshallableRootElement.java
PCData.java
ValidatableObject.java
...
marshal/
Makefile
XMLScanner.java
XMLWriter.java
mets/
Makefile
Mets.java
MetsHdr.java
...
The marshalling application, Marshal, procedurally constructs a METS file and marshals to the standard output unit. The unmarshalling application, Unmarshal, unmarshals the file specified by the file argument, validates it, and then re-marshals it to standard output.
To test the marshalling and unmarshalling applications:
cd mtk-1.0 java Marshal > marshal.xml java Unmarshal marshal.xml > unmarshal.xml
The two files, marshal.xml and unmarshal, should be identical.
The following list documents currently unsupported features of the METS and JAXB specifications:
The following limitations are in force in the current implementation:
Things to do:
References
Java API for XML Binding (JAXB), Sun Microsystems, <http://java.sun.com/xml/jaxb/>.
Metadata Encoding & Transmission Standard (METS), Library of Congress, Network Development MARC Standards Office <http://www.loc.gov/standards/mets/>.
XP - An XML Parser in Java, James Clark <http://www.jclark.com/xml/xp/>.
The METS specification
was developed as an initiative of the
Digital Library Federation
and is maintained by the Network Development and
MARC Standards Office of the Library of Congress.
The toolkit is compliant with the
JAXB API specification,
copyright © 2001 Sun Microsystems, Inc.
The toolkit uses the XP parser,
copyright © 1997,1998 James Clark.