Veritas Huloar
Red Spacer

Systems & Planning: Office for Information Systems

Highlights from FY 2006--Enhancements to Library and User Services


E-Research @ Harvard Libraries

E-Research @ Harvard Libraries was launched at the beginning of FY 2006 and expanded on the e-resource functionality of the “Harvard Libraries” web site, which serves as the University’s primary research “portal.” The interface evolved during FY 2006 based on feedback collected from staff, students, faculty, and researchers. The most significant change to the E-Research interface was the development of an “accessible” version of the user interface that can be used with screen-reader software for users with visual impairments. The new site for screen-reader software closely replicates the functionality of E-Research and is easily found on the E-Research homepage. One important feature of the new site is that the screen-reader interface to E-Research can be set up as a display preference for users with a visual impairment so that, following initial setup, users will always be sent to the screen-reader interface after logging in.

In addition to interface improvements, the libraries are working to make new tools and services available to library users that assist students, faculty, and researchers in using the often confusing array of new online resources more effectively. In addition to many locally developed research guides, the libraries licensed a new tool called RefWorks. RefWorks allows users to import citations from many e-resource databases and to maintain bibliographies and folders of citations in unique personal accounts. RefWorks links to content when available and can be used with Microsoft Word to create bibliographies in a number of citation styles.

HOLLIS (Harvard Online Library Information System)

Harvard’s HOLLIS Catalog, which reached the 10 million record mark in FY 2006, is the primary discovery tool for the books, journals, electronic resources, manuscripts, government documents, maps, microforms, music scores, sound recordings, visual materials, and data files owned by the University and its libraries. The union catalog is updated continually as material is ordered, received, and cataloged.

Improvements were made to HOLLIS to enhance search and discovery of materials by library patrons. When users log into their accounts, they can now immediately import saved searches into the EndNote application, which facilitates publishing and managing bibliographies of title, author, or subject searches. When viewing lists of search results, users are now able to see online availability without having to go to full records, and they have several new options for sorting record sets. A pilot project in the Harry Elkins Widener Memorial Library allowed users to request quick processing for materials that were ordered, but not yet on the shelf. As a result of the enthusiastic response from library patrons, this enhanced feature was extended to all interested libraries at Harvard.

In another major improvement, OIS implemented a new cataloging service to provide expanded access to over 16,000 electronic journals through the HOLLIS catalog. Known as MARCit!, the service pulls records from the Find It! database and adds, maintains, and deletes electronic journal records through regular automated batch loads into both HOLLIS and E-Research. The project corrected the fact that, for years, HOLLIS had no title-level access to titles or holdings in aggregated collections such as Academic Premier and Lexis/Nexis. These aggregate databases also contain many titles that Harvard does not own in print.

Maintaining electronic journals presents a challenge because the titles in collections are volatile—holdings and coverage change frequently. Regular, automated processes add new titles and maintain changes to electronic journal subscriptions, coverage, and access.


Aleph was upgraded in August 2006 to the latest version of the software (17.01). The annual upgrade process has gotten more streamlined as the vendor continues to improve the process. In FY 2006, several improvements were made to increase the efficiency of getting records into the catalog, including the introduction of the Z39.50 search protocol that allows searching in remote computer databases. This protocol enhances the ability of library staff to find new records for our collections, greatly improving the turnaround time for new records to be entered into HOLLIS. Using Z39.50, librarians can search large bibliographic record databases, such as the collections of several national libraries, including the Library of Congress and the national libraries of Estonia, New Zealand, and Norway, as well as large union catalogs in Germany and Israel.

Virtual Collections Service

Spring 2006 saw the debut of Virtual Collections—a new OIS service that can harvest descriptions and links from Harvard union catalogs and provide a customized web-based catalog of these materials for the user. Virtual Collections (VC) allows a curator to highlight important collections drawn from the millions of resources available in the HOLLIS Catalog, the VIA image catalog, and the Harvard Geospatial Library.

To harvest catalog records, VC utilizes a subset of rules established in the OAI–PMH (Open Archives Initiative–Protocol for Metadata Harvesting). Records are transformed from their native formats (MARC, VIA, or FGDC) into a common metadata format known as MODS, or Metadata Object Description Schema, that is shared by all collections, and loaded into a VC database. Once a virtual collection is defined, the curator can use an administrative interface to maintain the collection, including an option to add collection-specific subject vocabularies to records.

Virtual collections can be implemented as “stand-alone” catalogs that are hosted by OIS, or as “integrated” catalogs that are hosted on web sites under the curator’s control. Stand-alone collections offer a standard user interface with a modest set of options for customizing look and feel. Integrated collections are highly customizable, with a look and feel that is completely under the curator’s control.

The first use of VC was a collection of Latin American pamphlets from the Harvard College Library ( The Open Collections Program will be using VC to support its collections, including Women Working, 1800–1930 (, Immigration to the United States, 1789–1930 (, and future collections as they are implemented.

Future plans for Virtual Collections include support for record harvesting from the OASIS catalog of finding aids and TED custom catalogs. There are also plans to allow harvesting of virtual collections data by outside institutions.

Page Delivery Service Enhancements

The Page Delivery Service (PDS)—a delivery service for digitized documents—underwent a few enhancement cycles this fiscal year that added new features for end users and for document maintainers. 

Since FY 2004, Harvard libraries have increasingly turned to the JPEG 2000 standard when digitizing their materials. In spring 2006, the PDS user interface was enhanced with new options to zoom, pan and change the display size of page images derived from JPEG 2000 master images. Additional changes to PDS improved navigation and simplified printing of PDS documents.

Late in 2005, OIS released a completely redesigned PDS maintenance system, used by curators to edit PDS documents. The primary goals of the redesign were to make it easier for curators to edit a document’s pages and navigation structure and to see the effect of their editing in real time. A new structure editor was added, with many new editing options and a graphical tree display that exposes all parts of a document’s structure. With this release, curators are now able to merge individual documents together and assign persistent identifiers to any part of a document.

OASIS Enhancements

OASIS (Online Archival Search Information System) is our online union catalog of finding aids to Harvard archival and manuscript collections. In June 2006, OIS released an updated version of OASIS, with some new functionality and changes to the look and feel of finding aids implemented in response to user feedback over the previous year.

  • OASIS is now open to discovery by search engines, such as Google, providing much broader access to information about collections at Harvard.
  • The persistent ID (URN) for each finding aid now appears in the full finding aid display. This link provides a durable way to reference a finding aid from another catalog or web page.
  • Faster loading of “Easy Print” (PDF) versions of finding aids.
  • User requests for help are now collected by an improved “Questions or Comments” form and processed through the OIS problem-tracking database, FootPrints.

There were additional behind-the-scenes modifications, which are invisible to users but essential for those responsible for creating finding aids. This included migration of the OASIS database to the most recent version of the standard finding aid document type definition, EAD 2002.

Digital Preservation

Because digital assets are dependent on technological mediation for their use, they are inherently fragile and susceptible to risk of irretrievable loss in light of constant technological change. OIS is engaged in a number of activities, and operates a number of systems, designed to mitigate against this risk and to ensure that the University’s digital assets managed by OIS continue to remain usable over time.

Through institutional membership in the Digital Library Federation (DLF) and individual participation in its semiannual forums, OIS staff routinely engage with leading preservation practitioners at peer institutions. OIS also consults frequently with leading preservation programs both nationally, such as the Library of Congress and its NDIIPP (National Digital Information Infrastructure and Preservation Program) initiative and the National Archives and its ERA (Electronic Records Archive) project, and internationally, such as the British Library, UK National Archives, and National Library of Australia. Additional consultation occurs with representatives of leading international initiatives in the preservation area, such as the European PLANETS (Preservation and Long-Term Access Through Networked Services), CASPAR (Cultural, Artistic, and Scientific Knowledge Preservation, Access, and Retrieval), and DPE (Digital Preservation Europe) projects.

Digital Repository Service (DRS)

The Digital Repository Service (DRS) is a preservation and access repository that forms the core of the OIS digital infrastructure. During FY 2006, the number of assets under management in the DRS almost doubled to 4.6 million (16.6 TB), up from 2.8 million (10.3 TB) the previous year. OIS has continued two internal review processes related to the DRS. A review of existing DRS policies is leading to recommendations for change with regard to specific preservation policies and practices. This policy review will bring DRS operational policies into line with evolving best practices in the digital repository and preservation communities regarding formats; metadata; service-level agreements; preservation monitoring, risk assessment, planning, and intervention; and other aspects of prudent digital stewardship of valuable University assets.

A second review is providing a comprehensive evaluation of DRS architecture and functionality leading to recommendations for the future evolution of the DRS. Based on functional requirements developed during FY 2005, a number of externally developed repository systems, both commercial and open source, were evaluated for potential use as the basis for DRS in the future. Only the Fedora open source repository had sufficiently rich functionality to be considered as a replacement, and even it was felt to be lacking in certain key areas. Additional investigation of the costs of integrating Fedora into the OIS infrastructure and adding the missing functionality versus continued incremental enhancement of the existing DRS will be carried out in FY 2007.

Global Digital Format Registry (GDFR)

An understanding of format is critical to preserving the usability of digital assets over time. Without knowing an assets format, it is merely a set of undifferentiated bits. Format is the key that permits the proper interpretation and rendering of those bits. In view of this central position that format plays in preservation activities, OIS has started a two-year project to create a Global Digital Format Registry (GDFR) that will provide sustainable services for management, discovery, and delivery of the significant information about digital formats themselves. The GDFR is designed as a distributed network of independent, but cooperating, registries that will communicate over a common network protocol to synchronize their holdings. The information managed in this network will be used by local, national, and international preservation practitioners now and in the future. The design of the GDFR system is being overseen by a group of distinguished preservation experts from a number of national and international institutions.

While OIS maintains administrative control of the project, the technical development of the GDFR system is under subcontract by the Online Computer Library Center (OCLC). This collaboration brings together the expertise of OIS in digital preservation with that of OCLC in providing products and services that help libraries adapt to a rapidly changing technology environment.


JHOVE, the JSTOR–Harvard Object Validation Environment (pronounced “jove”), is an OIS-designed and -developed software tool for format-specific identification, validation, and characterization of digital objects. The ability to identify, validate, and characterize digital objects properly is a fundamental requirement for effective long-term preservation. Having gained widespread international acceptance, JHOVE is in use by most major library and archival institutions with significant digital library and preservation programs. In FY 2006, OIS integrated JHOVE into the DRS workflows for deposit, so that invalid objects can be rejected and corrected prior to being accepted for long-term preservation. OIS is consulting with the JHOVE user community about a range of functional enhancements to include in a future version.


Adobe’s Portable Document Format (PDF) has become the de facto standard for web-based delivery of electronic documents. To address concerns regarding the long-term preservability of PDF documents, the International Organization for Standardization (ISO) empanelled a working group to develop an archival profile of PDF, known as PDF/A. As a follow-up to work done previously by this working group, led by OIS digital library program manager Stephen Abrams, PDF/A was approved as International Standard ISO 19005-1:2005. This standard defines specific features of PDF that are required, recommended, restricted, or prohibited in an attempt to make the resulting files more amenable to long-term preservation. Through experience gained in working with this standard, a small number of changes are being put to ballot for a technical corrigendum. The initial form of PDF/A is based on PDF version 1.4. Work is also proceeding on a newer version of the standard, ISO 19005-2, which will address concerns raised by PDF features added in the subsequent versions 1.5 and 1.6.