Metadata Standards

Descriptive Metadata
Technical Metadata
Structural Metadata
Preservation Metadata
Rights Metadata

Libraries use metadata, that is, structured information about resources, to support almost everything they do. For convenience, metadata formats and standards are often discussed in broad categories, such as descriptive, technical, structural, preservation, provenance and rights. The categories overlap, however, and shift over time, and individual standards often address more than one category.

Standards also can address different aspects of metadata, such as the meaning and structure of data elements, guidelines for formulating the content of the elements, controlled lists of terms or values, and encoding. Many metadata standards address more than one of these aspects.

Key metadata standards used in Harvard libraries or related communities include those for descriptive, administrative, technical, structural, preservation and rights metadata.

Descriptive Metadata

Descriptive metadata identifies a resource and describes its intellectual content. Catalog records and finding aids are two examples of descriptive metadata.

  • Anglo-American Cataloguing Rules, 2nd Ed. Rev. (AACR2)
    • The primary set of rules used by libraries to create catalogs and other lists. The rules cover the description of, and the provision of access points for, commonly collected library materials.
    • AACR2 is the primary cataloging code used in Aleph/HOLLIS.
  • Cataloging Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO)
    • "…a manual for describing, documenting, and cataloging cultural works and their visual surrogates. The primary focus of CCO is art and architecture, including but not limited to paintings, sculpture, prints, manuscripts, photographs, built works, installations, and other visual media. CCO also covers many other types of cultural works, including archaeological sites, artifacts, and functional objects from the realm of material culture."—CCO Commons
    • Some Harvard units take the CCO guidelines into account when describing images of cultural materials in OLIVIA and VIA.
  • Categories for the Description of Works of Art
    • CDWA is a conceptual framework for describing and accessing information about works of art, architecture, other material culture, groups and collections of works, and related images.
    • Some CDWA concepts, as expressed in the XML schema CDWA Lite, are used as extensions in MODS records made available for harvesting from VIA.
  • Content Standard for Digital Geospatial Metadata (aka FGDC, for Federal Geospatial Data Committee)
    • "…a common set of terminology and definitions for the documentation of digital geospatial data.”
    • FGDC metadata is created for resources in the Harvard Geospatial Library.
    • Other geospatial standards: ISO 19115: Geographic Information – Metadata, and ISO/TS 19139: Geographic Information – Metadata – XML Schema Implementation
  • Describing Archives: A Content Standard (DACS; only available in print)
    • "…an output-neutral set of rules for describing archives, personal papers, and manuscript collections [that] can be applied to all material types. It is the US implementation of international standards (i.e., ISAD(G) and ISAAR(CPF)) for the description of archival materials and their creators." – Society of American Archivists
    • DACS is the current archival descriptive standard used by archives at Harvard.
  • Dublin Core
    • A set of metadata terms defined to support resource description and discovery across domains. A subset of the terms, the Dublin Core Metadata Element Set (NISO Z39.85), consists of the original fifteen elements used without any qualification. This set of terms is also referred to as Simple Dublin Core and is a required format in the Open Archives Initiative.
    • Dublin Core is typically used in one of three ways:
      • Groups that lack specific standards developed for their needs use DC as is or as a foundation, adding local elements as needed
      • Groups that have well developed standards use DC to create simpler, less expensive descriptions for a subset of materials
      • Services that bring together metadata created according to different standards map those different standards to Dublin Core to create a common set of elements for searching and display.
    • At Harvard, Dublin Core is machine-generated as required to expose Harvard metadata through OAI-PMH
  • Data Documentation Initiative
    • "An international XML-based standard for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences…"
    • DDI is used at Harvard in the Harvard-MIT Data Center's Henry A. Murray Research Archive, connected to the Virtual Data Center/Dataverse Network.
  • Encoded Archival Description (EAD)
    • EAD is a mark-up language for archival finding aids, that is, detailed descriptions of collections that contain a wide variety of materials, including letters, diaries, photographs, drawings, printed material, and objects.
    • EAD is used at Harvard in OASIS, the system that provides centralized access to a growing percentage of finding aids for archival and manuscript collections at Harvard.
  • MARC21
    • MARC21 is the primary library standard for the representation and communication of bibliographic and related information in machine-readable form. MARC21 is an implementation of ANSI/NISO Z39.2, Information Interchange Format. It is used at Harvard to communicate resource descriptions between the HOLLIS system and external systems and services.
    • MARCXML is one standard way of representing MARC21 metadata in XML. The schema is very compact. Rather than defining elements for each MARC field, the schema defines field and subfield types, and the specific tags, indicators and subfields are supplied as uncontrolled attribute values. This design decision makes the MARCXML schema suitable for communication but not for validation of MARC metadata.
    • Harvard converts MARC21 records from Aleph/HOLLIS into MARCXML for a variety of purposes.
  • MODS
    • An XML schema of a simple set of elements for bibliographic description, MODS was designed both to carry selected information transferred from MARC21 records and to support the creation of original resource description records. Metadata from several other domains also fit nicely into MODS, so it can be used as a common mapping format across diverse sets of metadata created according to other standards.
    • Harvard uses MODS in the Virtual Collections system and for communicating VIA metadata to ARTstor.
  • Resource Description and Access
    • Intended to succeed AACR2 as "a set of guidelines and instructions on formulating descriptive data and access point control data to support resource discovery," RDA is still under development and is scheduled to be published early in 2009. Drafts of portions of the document are available on the web site.
  • VRA Core 4.0
    • A metadata element set providing a categorical organization for the description of works of visual culture as well as the images that document them.
    • Metadata in Harvard's VIA catalog is conceptually similar to VRA Core.
  • Related descriptive metadata standards used outside the Harvard libraries include
    • Darwin Core
      "…a standard designed to facilitate the exchange of information about the geographic occurrence of species and the existence of specimens in collections.”—Darwin Core wiki.
    • Ecological Metadata Language
      A specification and set of XML schemas designed to document ecological datasets.
    • ONIX
      "A standard format that publishers can use to distribute electronic information about their books to wholesale, e-tail and retail booksellers, other publishers, and anyone else involved in the sale of books." – Book Industry Study Group, ONIX page

Technical Metadata

Technical metadata focuses on how a digital object was created, its format, format-specific technical characteristics, storage and location, etc. Accurate technical metadata helps a repository manage digital objects over time and keep them usable.

Note that Harvard's Digital Repository Service defines baseline technical metadata for all objects and additional technical metadata for specific formats. Those schemas draw on technical metadata standards developed by communities of format experts. In addition to the DRS metadata, some detailed technical metadata for specific formats can be created and packaged with objects in the DRS.

  • AES31-3-1999: AES standard for network and file transfer of audio -- Audio-file transfer and exchange -- Part 3: Simple project interchange
    AES31-3 defines a standard way to document the edit decision list for an audio project. This allows other engineers to take the same audio files and reproduce the project in another environment.
  • AES Core Audio
    Core Audio is a draft standard of the Audio Engineering Society that defines an XML schema for the technical characteristics of an audio object (analog or digital).
  • AES Process History
    AES Process History is a draft standard developed by the Audio Engineering Society to document the equipment, processes and settings used to create or convert audio.
  • MIX
    MIX is an XML schema for recording and exchanging image technical metadata defined by NISO Z39.87 Data Dictionary –Technical Metadata for Digital Still Images (see below).
  • NISO Z39.87, Data Dictionary – Technical Metadata for Digital Still Images
    NISO Z39.87 defines a data dictionary, that is, a set of formal properties with specific semantics, applicable for the detailed technical description of digital raster still images. These properties were selected with particular attention to their significance for preservation assessment and manipulation. LDI staff were involved in writing the original draft version of the standard and in the subsequent development of the standard in its final form, as well as in the follow-on activity to create MIX, an XML schema for expressing Z39.87 metadata in a standard form. The metadata stored in the Digital Repository Service (DRS) for images is consistent with the Z39.87 standard.
  • TextMD
    TextMD is an XML schema for describing the technical characteristics of text, such as encoding, character set, language, script and markup language.

Structural Metadata

  • METS
    A standard for encoding descriptive, administrative, and structural metadata about an object within a digital library. METS is variously used as a digital archiving package for preservation, as a representation of the structure of an object --enabling rendering and navigation of complex digital objects-- and as a transmission package for moving a digital object between repository systems.
    Standards that serve one or more of these purposes for related communities include IMS Content Packaging, MPEG21-DIDL, and XFDU.

Preservation Metadata

  • Open Archival Information System (OAIS)
    OAIS provides a reference model for an archival system designed to maintain access to digital resources and preserve them over time.
    Harvard;s Digital Repository Service was developed with OAIS in mind.
  • PREMIS: Preservation Metadata Implementation Strategies
    Metadata in a variety of forms— technical, structural, rights, and provenance—lies at the heart of most preservation activities. While a number of detailed metadata standards exist in many specific areas, the purpose of the joint OCLC/RLG PREMIS activity was to develop an overall framework and core element set for preservation metadata. The PREMIS data model defines a number of properties of preservation significance for digital objects, events, agents, rights and permissions, and the relationships between these entities.
    LDI staff participated throughout the development of the PREMIS standard. The next generation of the Digital Repository Service is being developed to be PREMIS-compliant.

Rights Metadata

Harvard libraries only express rights in simple, single metadata elements at this time. However, there are several rights expression languages (RELs) and other rights standards that will be valuable resources for Harvard in the future.

  • copyrightMD
    CopyrightMD is an XML schema for recording characteristics that, taken together, help determine the copyright status of a resource.M
  • Creative Commons
    Creative Commons provides a range of standardized digital licenses that can be associated with or embedded in open access web resources.
  • METSRights
    METSRights is an XML schema for documenting minimal administrative metadata about the intellectual rights associated with a digital object or its parts. METSRights is most often used to record statements to be viewed by professionals managing the content or to be displayed to end users viewing the content. It is not designed to be machine-actionable.
  • ONIX For Publications Licenses
    "ONIX-PL is an XML format for the communication of license terms in a structured and substantially encoded form."
  • Open Digital Rights Language
    ODRL is an open standard defining a model and vocabulary for the expression of terms and conditions over assets.
  • XrML
    XrML is a proprietary method for securely specifying and managing rights and conditions associated with all kinds of resources including digital content as well as services. It underlies commercial Digital Rights Management applications. XrML has come to agreements with MPEG and other initiatives to enable them to use XrML as a basis for more specific rights language specifications, such as MPEG21-Part 5: Rights Expression Language.