A Proposal for the Harvard University Library Digital Initiative

November, 1997

Digital Libraries Initiative Working Group
John Deeley (Medical School)
Dale Flecker (University Library)
Barbara Graham (University Library)
Anne Margulies (Office of the Provost)
Harry Martin (Law School)
Nancy Maull (FAS)
Thomas Michalak (Business School)
Nancy Cline (College Library), Chair



Digital information is transforming the conduct of teaching and research. There are debates about the speed of change and the nature of the change, but there is no doubt that the creation, transmission, accessing, and archiving of information will increasingly be digital. The Harvard Library is already moving into the world of digital information. Whether we move effectively and efficiently is an open question; that we must so move is not.

Harvard must and will develop a digital library capacity. There are a number of reasons why such a development is required:


Libraries, of necessity, reflect the world of scholarship and information. They acquire, disseminate, and preserve information in whatever form it is created by scholars and publishers; they provide teachers and researchers with materials in the forms those users desire. More and more of that material is being produced in digital form; more and more of our users want access to such materials. Harvard has no choice but to develop the capacity to collect digital resources as it always has paper-based materials.


Digital information can be used in ways that paper cannot: it need not be tied to a particular location; it can be transmitted more easily to users; it can be consulted by many users at once; it encompasses many media from text to quantitative data to visual and audio materials, allowing these to be combined; it can be searched, copied, and manipulated in ways unknown to print. A full explication of ways in which digital information is changing and enhancing scholarship and learning would require a much longer paper, but the evidence is overwhelming. Numerous classes at Harvard-undergraduate, graduate, and professional-use digital information in the teaching process. In many cases, this is a more efficient way for teachers to provide information and students to obtain it through electronic reserves, on-line texts, and on-line visual collections. In many cases digital information represents more: the availability of digital information is transforming the teaching and learning process by allowing students to explore widely across information sources, to combine materials in new ways, and to be more active and creative participants in the learning process.


Effective movement into digital information is needed to maintain Harvard's special position of excellence. The Harvard University Library is the greatest academic library in the world. If any metric sets Harvard apart from other universities it is the Library. Furthermore, it is not only that the library per se is a major resource for teaching and research; the library is directly related to the quality of the other major university resource: its faculty. The library is a major factor is attracting and keeping faculty. If the library declines in quality so does the University.

Every year, the Association of Research Libraries ranks its member libraries on a standard indicator of library scale. Every year, the data array looks the same: there is a smooth progression upward across the 100-plus academic libraries with a major discontinuity-Harvard-alone at the top level. The ranking is largely based on the size of collections and the scale of operations. No one can catch Harvard in that ranking. However, if one were to add criteria for performance in the digital world, we would likely rank one third of the way down-below a number of the major universities with which we compete. As time goes on, our faculty and students will more and more want performance in the digital library world commensurate with what they have come to expect from our paper-based collections. If Harvard is to maintain its quality-not merely for competitive purposes but because we value the highest levels of teaching and research-we will have to increase our efforts and change our way of doing things.

This proposal outlines the changes in library operation and structure caused by the digital revolution, and recommends a strategy for responding to these changes. It describes a project to develop the institutional capacity to acquire, deliver, and archive digital information and it argues for a more coordinated approach to these tasks than is the Harvard norm. We believe that Harvard has no choice but to move decisively into the world of digital information. Our research and teaching obligations and the changing nature of the information needed to support those obligations require action and investment. Will Harvard invest wisely? A commitment to the program described in this paper will ensure that it does.

A Coordinated Approach

Given that our many libraries must collect, deliver, and archive digital information, why do we need a centralized University-wide project? Why not rely on distributed local activity, the norm for library operations at Harvard? A central, coordinated project is needed for at least three reasons:

Digital information is different.

When a book is acquired, it resides in a single library. When a digital resource is put on the network, it can be equally available to all. Further, traditional materials are sold in small units with a relatively low unit price, and purchasing decisions are virtually always local. Digital materials are being marketed in larger and much more costly units, and both decisions to purchase and the related cost-sharing negotiations frequently involve many separate Harvard units.

Shared services are more efficient.

The development of digital libraries involves a wide range of highly specialized technical skills, and duplicating those skills in each library would be both difficult and wasteful. Likewise digital libraries require a set of common infrastructure services, that need not be replicated in each individual library. Some hardware components exhibit significant economies of scale, including storage systems, one of the chief cost components of digital library systems. Software to support commonly used services can be licensed once for the entire University.

Integration improves service.

For the library user, having a unified and coherent way to find what is available, a single means of accessing secured resources, the fewest interfaces to digital materials, and the smallest set of required desktop applications represents significantly better service than if each library were to implement varied ways of doing similar things.

Harvard's libraries and library users have benefited for many years from the integrated, shared services of HOLLIS. The digital library will likewise benefit from coordinated development and common services. The absence of a coordinated infrastructure will inevitably generate duplicative and incompatible efforts in individual libraries, duplication that will (in the absence of a University-wide effort) happen soon in our current environment of exploding digital resources.

Proposal Overview

This is not a proposal to create Harvard's digital library. That will be built over many decades as the world of digital information evolves. Rather we propose a project to launch Harvard's digital library effort, to give it the initial momentum, momentum that we expect to be carried through regular library operations once digital libraries mature and digital formats become a norm. We are in a period of discontinuity, one where none of the old forms have been replaced, but new ones surround us, demanding attention. We therefore propose a one-time effort to provide special resources to help cope with these extraordinary times.

The project we propose will:

  • create the first-generation technical infrastructure to support storage of and access to digital library materials;
  • make available a staff of specialists to advise librarians (and other University personnel working in related areas) about a number of key issues related to digital materials, including digital reformatting, metadata, licensing, and archiving;
  • give librarians and technologists experience with a wide range of technologies and digital materials on which to base future decisions and development;
  • provide library users with at least one significant set of digital resources to support current teaching and research activities.

At the end of the five year period we expect that Harvard will have the capacity to collect, organize, serve, and archive digital materials with the sophistication and ease with which it now manages traditional media, and that use of the digital collections will be part of the daily activity of students and scholars across the University. The acquisition, organization, and management of digital information should be on a firm footing as new information sources appear. The libraries will have a solid base for adaptation to the inevitable market and technical changes that will continue to take place in the digital world.

Libraries Elsewhere

Harvard is, of course, not alone in facing the need for an extraordinary initiative related to digital libraries. Most of our peer institutions have concerted programs underway, in some cases programs of considerable magnitude. Among the most notable are those at:

  • The Library of Congress, which has begun developing the National Digital Library, with a target 5-year budget of $60 million (including a $3 million annual appropriation from Congress and significant private contributions). The program has already generated a considerable collection of primary source materials related to the development of the United States, a collection receiving heavy use from around the world.
  • The University of California system, which has recently created The California Digital Library program to acquire and archive digital materials for all 9 University campuses. The project budget for the initial year is $3 million, which does not include much of the infrastructure costs supported elsewhere in the University's budget.
  • The University of Michigan, which in 1993 initiated its Digital Library Program with the intention of creating a "comprehensive (i.e., campuswide in perspective), coherent (i.e., logical in its organization and scope), and coordinated (i.e., supported by systems and standards to promote effective access)" digital library collection. The digital material now available at Michigan are impressive in their scope and variety. Michigan's digital infrastructure has enabled it to win a continuing series of grants for content and program development.
  • The University of Virginia, which has created technical centers specializing in a wide range of digital information, including: Digital Image Center, Digital Media and Music Center, Electronic Text Center, Geographic Information Center, Social Sciences Data Center, and the Special Collections Digital Center. Together these represent a wide array of resources (a fact that faculty coming to Harvard from Virginia have pointed out to Harvard librarians).
  • The UK Higher Education Funding Council, which has mounted a 3-year, 15 million-pound, Electronic Libraries Programme to fund university-based digital library projects in a number of areas, including experimentation with electronic journals, digital preservation, and training.

The program we propose would not put Harvard at the "cutting edge" of digital library developments. Rather we propose to profit from the pioneering efforts of others who have invested in research and exploration of the many difficult issues in digital libraries. Our program is intended to build on current knowledge to create a practical, large-scale infrastructure that will be of daily use in building our electronic collections as the world of publishing moves into the digital age.

Project Activities

Developing Infrastructure

Infrastructure is the collection of common systems and services that make it possible to store, organize, and access digital materials. Among the key elements to be addressed in the project are:

  1. Catalogs and indexes. The hallmark of a library is organization, a structure that helps users identify and locate what is available. The HOLLIS catalog will be a key component of our infrastructure, but it is not adequate for many types of materials likely to be in our digital collections, including:
    • photographs, prints, and other visual materials
    • contents of archival collections
    • journal articles.
    Building these separate catalogs and linking them in a coherent structure is critical to making the digital collections useful. It is a technical and intellectual task of considerable magnitude, and one made all the more pressing by the scale and breadth of Harvard's collections.
  2. Controlling access. Much of our digital collection will likely be owned by others, and Harvard will only have a license permitting use. Other materials may be the property of Harvard, but we will want to control access for legal, contractual, or economic reasons. Securing access to digital materials can be thought of as a two-dimensional matrix composed of:
    • the user (how do we know for certain who this user is? what category is this user in?)
    • the digital item (which user classes can access this item? what use rights do they have to use this item: view a description? view a low-grade version? view a high-definition version? print a copy?).
  3. Storage. Large digital collections involve many issues:
    • cost effective large-scale storage of hardware and software, including the capability to store digital items on a variety of devices with varying cost and performance characteristics
    • systematic back-up* disaster recovery plans and facilities
    • media migration to protect against data loss
    • coherence between stored objects and the data about those objects ("metadata") that makes them useable.
    Scale changes storage requirements. Storing hundreds of thousands of digital objects and information about them requires tools, discipline, and systematic operations. A digital repository resembles in many ways the Harvard Depository-its nature is determined more by the scale of operation than by the specific nature of the materials stored. To work well, both require careful professional management.
  4. Interface. The user interface for printed materials is inherent in the carrier (a book, a journal, a newspaper-each embodies its own interface). Digital materials are frequently separate from the interface, and the same object can be delivered through many different interfaces. One of the difficulties in the current environment is that there is a dramatic proliferation of interfaces, a proliferation that is causing user confusion and frustration. While one cannot hope to do away with the need to use many different system interfaces, a shared set of frequently used interfaces for such common actions as the following will noticeably simplify the use of the digital library:
    • searching bibliographic databases
    • viewing images of digitized journal articles or other page-based materials
    • navigating through descriptions of archival collections.

Acquiring Digital Resources

The content of Harvard's digital collection will come from many sources, and will be distributed across many systems, within Harvard and without. Sources will include authors, commercial and non-profit publishers, government agencies, value-added aggregators, other universities and libraries, and scholarly societies. We anticipate that the landscape of players and products will evolve in complex and unexpected ways over the next decade. In this project we will both assist with the general acquisition and creation of content by Harvard libraries and begin creating a significant collection of content in specific areas.

The first two project activities relate to materials created outside of Harvard:
1. Assistance with digital acquisitions. The libraries are buying an increasing number of databases, digital reference works, and electronic journals. The acquisition process is increasingly complex, involving such questions as:

  • are there (or will there soon be) alternate sources for the same or equivalent materials?
  • will the delivery system fit Harvard's security and access systems?
  • is this the right interface for this material?
  • are there appropriate provisions for long-term, archival access?
  • is there adequate provision for "fair use"?
  • what parts of Harvard should share in the cost of providing this resource?
  • can this resource be effectively used over the Internet, or is a local copy needed?
  • are there adequate service guarantees in the contract?
  • does the contract protect Harvard against liability claims?
  • is the authorized user base defined appropriately for Harvard?

These issues are new to most librarians and addressing them requires specialized knowledge and understanding of both the technical and the legal environment.

2. Integration of outside resources. Much of the content of Harvard's digital collection will be held on systems outside the University. Because there are few standards or conventions for how such systems should operate, these systems tend to be designed in an astonishing variety of ways. Thus a significant effort is needed to integrate them into Harvard's local environment, in such areas as:

  • securing access
  • integrating with local interfaces
  • linking content and indexes
  • relating electronic content with Harvard's holdings of paper equivalents or of related materials
  • integrating searches across multiple related databases.

One of the key advantages of digital resources is that they can be functionally brought together for the user regardless of their physical location or technical format. Such integration however requires effort, both technical and intellectual, and the resulting smooth operation and navigation is one of the key contributions of libraries to the digital arena.

In addition to acquiring digital materials from elsewhere, there is an ever-increasing amount of interest at Harvard (not just within the libraries, but also in other areas including the museums, research projects, and classroom support) converting materials from traditional to digital format. Motivation for digital conversion includes the preservation of content when the original format is endangered, providing enhanced functionality, and improving access. This program will provide assistance in addressing many of the issues raised by these conversion projects. Four additional project activities relate to materials converted locally:

3. Conversion formats and methodologies. Even something as seemingly straightforward as digital imaging, which is essentially the simple recording of a picture digitally, involves many issues. There are numerous recording formats, varying in the amount of detail captured, likely functional longevity (conventions and standards in this area evolve rapidly), the software that is required to capture, display, or print the image, the size of the resulting digital object, and the ability to render color. For the conversion of text, there are issues of markup (tagging added to text to identify and label its parts), for which varying standards and conventions are available, standards that affect usability, archivability, and the software needed for manipulation and display. For any digital format there are generally many conversion technologies available, again involving tradeoffs among functionality, cost, effort required, likely longevity, and compatibility with various technical environments. All of these issues involve technical knowledge and experience. The availability of appropriate technical assistance as the project is designed will significantly affect the utility of the product, the cost of the conversion, and the longevity of the result.

4. Metadata. To be useable by people or software, digital materials need to be described systematically:

  • For the user, this description is usually a catalog entry, providing index terms as well as a description of the item and location information. There are many differing practices in use today, both local ones and those adopted by various communities (current examples include standards adopted by research libraries, arts and cultural heritage museums, visual arts libraries, botanical or biological systematics collections, archives, and manuscripts libraries).
  • For the curator responsible for a digital object, there are many pieces of administrative information needed: information about the date and technique used for object conversion (of significance for media refreshing or format conversion), whether a copy is considered archival (which will affect how it is stored and formatted), and what access restrictions apply. Standards or conventions for which administrative information should be recorded, and how it should be encoded, are not yet common.
  • For software, the technical format of a digital object must be recorded (affecting what software is needed to render or use the object), how a given object relates to other digital objects (affecting how the interface provides navigation through a related set of materials such as images of the pages of a journal article), and the mark-up standard followed (describing the meaning of the tagging to be found in a text). There are an enormous number of examples and conventions for such "structural" metadata in use today, some standardized (e. g., the various "document type definitions" for text marked up in Standard Generalized Markup Language), many local or proprietary (such as the page-relationship data used in commercial electronic document management systems).

Again, this is an area where technical expertise and experience are required for intelligent decisions to be made in designing projects.

5. Conversion process design. Most conversion projects involve the expenditure of significant resources. A well designed workflow, the choice of the right technologies and service providers, and appropriate quality control standards and procedures are critical to controlling costs and conversion time, and improving the resulting product. This is an area where technical expertise and experience will make a significant difference in project design, affecting cost, usability, and longevity.

6. Ownership marking. When Harvard owns the rights to a digital object, there may be concerns about controlling legitimate reuse of that object. The University may want to allow general scholarly access to its resources, but to control (and perhaps profit from) their reuse in commercial publications. "Electronic watermarking" and similar technologies can provide ways of tracking the origin of an object and provide assistance in controlling the use and flow of owned resources. Such techniques are becoming available in the marketplace, but choosing the right strategy will require careful investigation and analysis.

Project Structure

We propose a 5-year project, to be based in the University Library. The project would be managed under the direction of a governance committee of representatives from faculties and libraries. The committee would be responsible for setting priorities for project activities, for monitoring progress and costs, and for allocating the incentive fund (discussed below). In particular, we recommend that in addition to the normal month-to-month monitoring of the project, the governance committee conduct a formal review in the third year, to ensure that the project continues to be useful and its directions appropriate. The rapid evolution of the digital environment requires vigilance. Much will change over the course of the project.

The University Library would create a project team of systems developers and technical consultants, who would be responsible for the creation of the technical infrastructure and for providing assistance and consulting advice to library staff and others at Harvard developing or delivering digital resources*. The University Library would also procure adequate hardware and software for operating the infrastructure and housing digital materials when needed.

Digital library developments must be driven by collection content and by relevance to the University's academic mission, not by technology. The work on infrastructure and integration discussed above cannot be done in the abstract, divorced from actual digital library materials. Therefore a critical part of this project is the development of a digital collection that provides both ongoing utility to Harvard users, and experience on which to base sound design and development decisions. At Harvard, building library collections is a decentralized responsibility, with decisions about what to collect made at a level close to the researchers and students. A program to develop content therefore cannot be executed centrally. Instead, we propose the creation of a funding pool to be used to encourage the rapid development of a critical mass of digital materials, in order to:

  • provide real-world experience with the use of digital library resources
  • provide a set of materials on which to base infrastructure development
  • test and demonstrate the utility of Harvard's digital library developments.

This incentive fund would be used to help support digital item creation or acquisition by the individual libraries, based on the following criteria:

  • the materials would have to contribute meaningfully to building a critical mass of resources in a limited number of topical areas (we suggest using social science materials related to contemporary social issues or policy development, or materials related to one of the interfaculty initiatives, as these would be useful in all of the Harvard Faculties);
  • projects would need to utilize or, even better, contribute to the development of the common infrastructure;
  • materials should represent a variety of types of digital resources, so as to build experience at Harvard with a wide range of digital materials and to provide information on the relative utility of various resource types.

Most projects receiving support from the incentive fund are expected to involve significant matching effort or resources from the individual library (or libraries) involved.


There are three categories of costs to be considered for the digital library developments proposed here.

  • First, there are the direct project costs for staff, hardware, software, HOLLIS II, and space and overhead. These are estimated to cost $7 million. In addition we propose an Incentive fund of $5 million, for a total project cost of $12 million.
  • Second, for digital libraries to succeed, the University will need to provide a number of other types of infrastructure applicable not only to digital libraries but also to other network and computing applications at Harvard. Examples of such infrastructure include network capacity, user authentication facilities, distributed printing facilities, support for local networks and workstations, and electronic classroom facilities and support. These are not directly attributable to the digital library but are a necessary condition for it to succeed.
  • Third, the discontinuity of the digital revolution is causing severe strain on library collection budgets. The explosion of electronic resources on the market represents an increase in the pool of materials appropriate for our libraries to collect. Further, today little of the new material displaces traditional resources. For the next several years at least, the libraries will therefore be faced with demands both to continue collecting as they have in the past and to acquire the new electronic formats. Since the proposed incentive fund is intended to be limited in both time and subject scope, libraries will be faced with the need to find significant new revenue to continue collecting as they have in the past, or they must reduce the scope of their collecting.


The world of information-and the lives of those whose work depends on it-is being transformed. Key decisions affecting the ways in which online resources can be used, the standards, and the new solutions, are being shaped by the active participants. Work relating to digital libraries has been underway for over a decade. To date, Harvard has moved cautiously in digital libraries. The dramatic increase in the rate of development of electronic information in the past two years makes it inappropriate that we continue in this cautious manner. The explosion in the publishing and use of digital information requires that we act now.

The development of a digital library for Harvard is not a threat to the remarkable history of its existing libraries. It is instead a powerful complement to those collections. If Harvard intends to be among the leading players in defining new directions for its outstanding academic programs, in shaping information policies for the digital world, and in creating the educational environment for the new millennium, it will need the best possible access to a full array of information resources, both print and digital. The development of a digital library, with the concomitant commitments to building a sound technological infrastructure and developing a community of highly skilled library and computing professionals, is a necessary step for the University.

For the libraries at Harvard, the years ahead will pose significant challenges to provide the best possible balance between print-based collections and digital resources. At present, the lack of a defined infrastructure for a digital library and the fragmented nature of digital content make it very difficult for the libraries to respond to the needs of the teaching and research programs of the University. The program we outline will allow the Harvard libraries to be effective providers of information to our users in the most cost-efficient way possible. Commitment to the coordinated development of a digital library program for Harvard will provide each of the libraries the opportunity to focus its attention on acquiring access to needed content, rather than crafting idiosyncratic solutions to maintain digital library developments. The libraries at Harvard are poised to carry out a vital role in assuring integrated access to both traditional and digital information resources. To do so effectively, there needs to be an institutional commitment to and a corollary investment in development of an operational infrastructure for a digital library.