Harvard University Library - Library Digital Initiative
   
  home search sitemap contacts jobs
Technical Development Digital Acquisitions Reformatting Intellectual Property & Copyright About LDI Grant Program Advisory Services

Metadata

  Robin Wendler

 

 

What is Metadata?
Metadata is broadly defined as "data about data."  In particular communities and contexts, however, the word is used with much narrower definitions.  For example, it may be used to mean only cataloging, or only data about digital resources, or only information structured to be understood by computers.  However, none of these limits is intrinsic to metadata.  It is important to set the definition of the term for any particular discussion to avoid misunderstandings. 


Your Meta is My Data*
Note that all metadata is, in fact, data.  Whether or not particular information functions as data or as metadata is a matter of context or perspective, and what is metadata to one person or application can be data to another.


Metadata in the Library Digital Initiative
In the context of the Library Digital Initiative, metadata is information that makes it possible to find, access, use, and manage information resources.  Note that this definition is not restricted to electronic resources.  Harvard units need to consider the metadata requirements for all kinds of materials in a coordinated way in order to manage and access them as effectively as possible.

Some metadata is public information, such as cataloging, used for searching and identification of resources, while other metadata is used behind the scenes to maintain and administer a resource over time, to control access to it, and to build the interface which will make it usable.  Libraries have created metadata for many years by cataloging according to MARC and AACR2, by creating finding aids to archival resources, and by compiling bibliographies.  Many of the concepts familiar to us from these activities carry over into other kinds of metadata needed in LDI projects.


Metadata Standards in General
There are many different standards governing metadata:

  • standards specific to topics or disciplines (such as biology or art)
  • standards specific to kinds of materials (such as moving pictures or encoded texts)
  • standards to support particular functions (such as discovery or rights management or presentation)

In any of these areas, metadata standards may govern

  • what pieces of information are created (semantics)
  • how the information is formed (content standards)
  • how the information is encoded for computer processing (syntax)

Metadata design is a critical part of the planning for any digital project.  Without the right kind of metadata, it will not be possible to find or use digital materials effectively.  The LDI Metadata Advisor can help you determine what kinds of metadata you need to get the results you want.


Specific Metadata Standards
The International Federation of Library Associations and Institutions (IFLA) maintains an excellent, extensive page of links to metadata documentation.

However, to get you started, here is an introduction to some of the most prominent metadata standards initiatives:

  Categories for the Description of Works of Art (CDWA)
CDWA is a data dictionary which "articulates an intellectual structure for the content of object and image descriptions. É The Categories are intended to enhance compatibility between diverse systems and enable the sharing of art information. By providing guidelines for content, independent from software and hardware, the Categories can serve as a model to which existing art information systems can be mapped and as a basis on which new systems can be developed. Such guidelines can contribute to the integrity and longevity of information transmitted across networks and eventually moved to new systems."
 
  Dublin Core (DC)
The Dublin Core is a simple set of metadata elements intended to facilitate discovery of electronic resources.  It can also be used for non-electronic documents and physical objects. DC contains 15 elements that a diverse international and interdisciplinary community (including librarians) agreed were useful for information retrieval. All the elements are optional and all are repeatable.  The DC data elements overlap with MARC but may not be as specific as MARC, and there are few rules governing how the content of the elements is formed.  DC is meant to be a commonly understood set of elements, but it is not intended to cover all the metadata needed by any given community; each community is expected to use other elements outside of Dublin Core to meet their local needs. DC is syntax-independent, which means that the elements can be expressed in different forms.  For example, DC elements can be created in a MARC record or in HTML META tags. However, consensus is growing that the Resource Description Framework (see below) will be the most common syntax for Dublin Core metadata.
 
  Content Standard for Digital Geospatial Metadata (CSDGM, aka FGDC)
FGDC provides "a common set of terminology and definitions for the documentation of digital geospatial data. The standard establishes the names of data elements and compound elements (groups of data elements) to be used for these purposes, the definitions of these compound elements and data elements, and information about the values that are to be provided for the data elements."
 
  Encoded Archival Description (EAD)
The EAD is an encoding standard for archival finding aids using the Standard Generalized Markup Language (SGML). The EAD Document Type Definition (EAD DTD) provides
a flexible way for archives and libraries to convert finding aids that exist in paper form
into electronic documents or to create new finding aids in electronic form.  EAD-encoded finding aids form the basis of Harvard's OASIS system, which provides remote access and the ability to search across collections in different archival repositories at Harvard and Radcliffe.
 
  Instructional Management Systems Metadata (IMS)
An initiative of EDUCAUSE, IMS has defined a metadata structure for managing online learning resources, including content, tools, people, educational service companies, and activities.  While many universities and companies are members of IMS, the metadata standard is complex and has not yet been generally implemented.
 
  Resource Description Framework (RDF)
RDF is a common syntax for expressing many different kinds of metadata.  By providing a standard way of referring to metadata element sets, specific metadata element names, and actual metadata content, RDF should facilitate data and system interoperability. 
 
 

Text Encoding Initiative (TEI)
The Text Encoding Initiative is an international project to develop guidelines for the preparation and interchange of electronic texts for scholarly research, with three overall objectives:

  • to specify a common interchange format for machine-readable texts
  • to provide a set of recommendations for encoding new textual materials. The recommendations would specify both what features are to be encoded and how those features are to be represented.
  • to document the major existing encoding schemes, and develop a metalanguage in which to describe them.
 
  Visual Resources Association Core Categories
The VRA Core Categories is intended as a guideline for developing local databases and cataloging records for visual resources.  The categories contain two groups of elements: one group which describes the work (e.g. The Mona Lisa) and another group which describes a surrogate or visual document (e.g. a close-up slide of Mona Lisa's smile).  At this point, the Core Categories are a set of metadata elements, but they are not paired with any standards to determine the form of the metadata content, nor are they tied to any particular syntax.  A testbed project using the Core Categories was recently completed, and the VRA Data Standards Committee will determine what next steps to take based on the evaluations of that project.
 

*courtesy of Judy Ahronheim, University of Michigan

Home | Search | Site Map | Contacts | Jobs