Harvard University Library

   

<< Table of Contents

veritasHarvard University Library Notes, For Harvard Library Staff, Number 1336 March 2007
Wendler

Robin Wendler

Harvard University Library Notes / March 2007 / No. 1336

Interview: Robin Wendler

Robin Wendler, metadata analyst for the Office for Information Systems (OIS) in the Harvard University Library, is a member of the Task Group on Discovery and Metadata. A member of the OIS staff since 1988, she designs metadata used throughout Harvard library systems, collaborating with digital library developers, librarians, and faculty to support digital library systems and services. Wendler earned her BA in classical studies from the College of William and Mary and her MLS from Syracuse University. She was interviewed for Library Notes on February 28.

LN

The ULC's task group on discovery and metadata came into being in the fall of 2006. What prompted its formation?

RW

It's a time of such upheaval in the way people create and use information, and the services provided by libraries have not been keeping up. Independently of each other, ideas for two library task groups were floated: one on the future of discovery and another on the future of metadata. It became clear that we should be talking about these things together. Discovery and metadata are really two sides of the same coin, and there are enormous changes in both areas that the Harvard Libraries need to address in a coordinated way. We do not want initiatives in one to undercut initiatives in the other—they need to work in tandem.

LN

Your task group has a limited life span—just until June 30. How is the group approaching these big topics?

RW

First, we're reviewing the landscape in these two areas. The task group treated this almost like a seminar. It's a small group, and pairs of committee members took on over 20 specific topics, things like tagging or visualization of search results, and did background investigation and presented to the group. We'll present an overview of that broad survey to the ULC later in March. Based on this groundwork, we'll try to identify near- and mid-term priority areas for new initiatives, as well as suggest a framework to enable Harvard Libraries to respond more quickly to an environment that is changing with increasing speed.

LN

It sounds as if the discussion is more philosophical than tactical.

RW

It's practical. The goals are both to provide the ULC with information that can help set the direction for discovery and metadata services and to raise awareness throughout the Harvard Libraries about issues that are changing libraries in major ways.

In addition, the ULC has proposed a new coordinating committee on discovery and metadata. It's important to say that our task group is not the same thing as the planned committee, but it is in many respects a prerequisite for it. The ULC couldn't just put a new permanent committee in place without setting the stage for its work, both in terms of establishing an understanding of the overall environment and providing a sense of what concerns are most pressing.

LN

Can you talk about some of your major topics?

RW

One of the most urgent topics is functionality of the catalog. Dissatisfaction with library catalogs permeates the library world right now. A number of initiatives, such as the one at North Carolina State University, seek to bring library catalogs more in line with the expectations users bring from Internet search engines and e-commerce sites. [See http://www.lib.ncsu.edu/endeca/presentations.html—Ed.] The goal is to provide a better discovery environment that includes relevance ranking, characterization of search results ("facets"), tagging, and other social-recommendation features.

LN

Let's start with relevance ranking.

RW

Users are accustomed to going to Google and similar search engines and getting what they perceive to be relevant results, where something they were looking for comes up on that first screen—often on the top half of that first screen.

In contrast, library catalogs historically avoided relevance because most relevance engines work better with full text; that is, with a rich set of data. Catalog metadata is designed to reduce redundant word occurrence, which limits the effectiveness of traditional relevance. In a heavy research environment like Harvard's, there has also been bias about the level of sophistication of the researcher.

One thing we've learned is that we need to balance the information-discovery requirements of novice users with those of expert scholars. Since everyone is a novice outside his or her area of expertise, we need to support that first step into a topic better than we have, and that's where relevance ranking can help.

LN

Social features?

RW

You know, tagging capabilities, user ratings and reviews, users who used this also used these other materials. The last is another area that libraries have historically shied away from, partly for reasons of privacy: libraries have a philosophical commitment to an individual's right to keep their reading habits to themselves. Even when the information is anonymous, in a research environment, you might be revealing some researcher's path to an idea they haven't yet published.

The scope of the catalog is another huge issue right now. The scope of the main library catalog is no longer clear to people. Now, in addition to HOLLIS, you've got VIA, OASIS, the Harvard Geospatial Library, Harvard–MIT Data Center, where materials can be discovered online that in most cases were never in the HOLLIS catalog. These services arose as separate systems, outside HOLLIS, for good reasons: to provide specialized searching, displays, or tools. But with the multiplicity of services out there, how can users know where to search? When they've done a search, how can they know the scope of what they might have found, and what might be missing?Increasingly, outside of the libraries, you're seeing the aggregation of discovery, and developments such as Google Scholar are huge for that.

So we need to ask a number of questions. When should we have so-called "stovepipes"—that is, separate discovery environments? What should be the scope of aggregated and metasearch environments that we provide? What kinds of discovery beyond metadata-based discovery should we offer? What's the relationship between full-text retrieval and metadata-based retrieval?

LN

What is the relationship?

RW

That's the question! Because of the effort and expense that we put into metadata creation and maintenance here at Harvard, we want to be sure that we are putting that effort in the right place and that we are getting the most benefit from the human work that goes into metadata creation.

Where there are alternatives, such as the full-text retrieval, social retrieval, and automated metadata creation, we want to be able to take advantage of those alternatives. We need to make sure that we're leveraging the human effort that goes into metadata creation.

LN

We've talked about some of the major issues for discovery. What are the big themes for metadata?

RW

The metadata community has been under tremendous and conflicting pressures for at least a decade. They are being pushed to do more with less, to streamline, to become more efficient. The number of resources that they need to control has exploded, because paper publishing hasn't gone away but electronic publishing has gone through the roof. And they are faced with the proliferation and formalization of new kinds of metadata in a variety of specialized domains. We were talking a minute ago about OASIS, VIA, and HGL, which are underpinned by metadata standards that are different from the traditional information in a library catalog. Not only must catalogers streamline what they produce and produce more of it, they're expected to have expertise and produce metadata in a variety of new formats as well.

Finally, we see the development of the semantic web and new mandates to interoperate with metadata from communities that are completely outside of libraries.

At the same time, as more older items are digitized, people are increasingly wondering: when do you need metadata as opposed to full text to fully research a topic?

Until these big digitization projects—like Google Book Search—search engines, by and large, were operating on material that had been produced within a very narrow timeframe. As you throw in these pre-1923 texts, you start to see the conditions where metadata can really serve you. For example, terminology shifts over time, and if someone searches for a certain word today, that word may not have been in use 50 or 100 years ago. Straight keyword searching on full text—without a tremendous amount of sophisticated language processing—is not necessarily going to retrieve everything you'd like to get. You'd have to know to search under today's terminology and terminology from 50 years ago, 100 years ago, in every language that you're interested in, in every grammatical variant, to gather the set of information resources that you want. That's when metadata helps, by bringing resources together under common terminology.

LN

Turning back to the task group, how will your work unfold for Harvard librarians in general?

RW

What the community will see first are visits from key players in the library domain who are thinking about resource discovery and metadata. Terry Ryan, UCLA's associate librarian for IT, will be speaking March 19 from 3:30 to 5 in the Radcliffe Gym. Ms. Ryan was a member of the UC-wide task force that produced the report "Rethinking How We Provide Bibliographic Services for the University of California" [See http://libraries.universityofcalifornia.edu/sopag/BSTF/Final.pdf—Ed.] in December 2005. Other speakers will be announced throughout the spring.

LN

How will the task group conclude its work?

RW

By the end of the academic year, the task group will make its final report to ULC. There will be a set of recommendations with priorities for projects that might be fruitful for ULC to undertake, as well as some kind of a framework for addressing change.

 

<< Previous Article

 

Links:

Current Issue

Contact Library Notes

 

Return to the top.