Next HOLLIS Liaisons Meeting (#102)
September 13, 1995
Notes from the July Meeting
New OIS contact list in August
Recon quarterly report out
HOLLIS Plus and RLG line wrap problems
The default configuration for many telnet applications specifies Line Wrap On, including PC/TCP 2.X, and TelW. However NCSA PC Telnet, and LAN Workplace for DOS Telnet do not. To correct this problem using NCSA PC Telnet, edit the config.tel file (in the NCSA directory) and change 'vtwrap=no' to 'vtwrap=yes'. To turn line wrap on after you have already initiated a telnet session, type Alt-P for modify parameters, move the cursor to Line Wrapping and hit the space bar to toggle between on and off.
To set Line Wrap to On using LAN WorkPlace for DOS, after initiating a telnet session, type Alt-S to bring up the set-up menu, select Display from the menu, from the display sub-menu, choose Autowrap ON.
If you are unsure whether you are experiencing this problem, connect to HOLLIS Plus, select the RLG Bibliographic File [Eureka] and type 'fi au customs cooperation council' at the Command line. There should be AUTHOR and TITLE labels above the search results -- if you do not see these labels, you are having a text wrap problem.
Contact John Maher in OIS if you have questions.
Another fiscal year wrapped up
There has been a billing change related to HOLLIS Distributed Reporting (DR). OIS and the Administrative Data Project (ADP) are negotiating the fate of the server machine on which Distributed Reporting runs. ADP owns this machine; negotiations concern whether Distributed Reporting processes continue to share this machine with ADP or move to a new machine purchased by OIS. Until OIS knows what the costs will be, it will not pass on charges for Distributed Reporting activities. Contact Tracey Robinson in OIS if you have questions.
SEICing a new search engine
Kathy started by defining some technology buzz-words that are important search engine concepts. Client/server or client/server architecture is a model of computing in which a monolithic program is split into software "pieces" that talk to each other. The client is the user interface that makes requests. A server provides the database; it accepts requests from a client and then sends its reply back to the client, which formats and displays the reply to the user. In a particular computing environment, there can be multiple clients and multiple servers. Clients and servers often reside on different machines (e.g., client on your PC, server at the computing center) but they can also occupy the same machine. The connection between client and server is normally by means of message passing, often over a network, and uses some protocol or language to encode the client's requests and the server's responses.
A protocol is a set of formal rules describing how to transmit data, especially across a network. In the early days of client/server development, everyone created proprietary systems and communication protocols. Proprietary protocols meant that each system had to learn the "language" of every other system in order to communicate with them (see Figure 1 below).
It soon became evident that a standard protocol was needed for efficient communications between clients and servers (see Figure 2). If clients and servers are speaking the same language (protocol), many fewer forms of communication need to be learned (in Figure 2, many fewer lines need to be drawn). Kathy used the analogy of French and German speakers using English as their standard protocol. The standard communications protocol in the library world is called Z39.50.
Z39.50 is an American National Standard that was approved in 1988 by the National Information Standards Organization (NISO), an American National Standards Institute- (ANSI) accredited standards writing body that serves the library, information, and publishing communities. Z39.50 is a search and retrieval protocol that allows clients to provide a uniform way for a patron to search a variety of databases without having to know the searching methods used by each individual database.
The World Wide Web (also called WWW, W3, or the Web) is a very popular client/server application designed to deliver hypertext documents. Web clients are called browsers. Popular browsers include Netscape, Mosaic, and Lynx. The Web communications protocol is called http (short for HyperText Transfer Protocol), although Web browsers can also communicate using gopher, ftp, telnet, and other Internet protocols.
Lastly, a search engine is the computer program that runs a database. It receives encoded search requests and returns raw data that is formatted and displayed to the user by the client.
Why develop a new search engine?
Amongst the "final changes to HOLLIS" recommendations from the summer of 1994 was a project to investigate the implementation of a new search engine outside of HOLLIS. The University Library has been adding databases directly to HOLLIS since 1989. Moving this new database process off of HOLLIS now and on to a new search engine would facilitate the transition to HOLLIS2.
A new search engine would be optimized for search performance. Kathy noted that in general, search engines are either fast at updating or fast at searching, but not both. Current HOLLIS is optimized for fast updating (record changes are processed in real time).
A new search engine that can handle many types of data would be an advantage. HOLLIS is good for delivering bibliographic data in MARC format, and everything loaded so far has fit into this format. However, other useful data can not be so readily transformed into MARC (for example, full text, images, sound, etc.). The new HOLLIS Plus search engine will be able to handle both MARC and non-MARC data easily.
Selecting a search engine
In September 1994 the Automation Planning Committee charged a group called the Search Engine Evaluation Committee (SEEC) with investigating alternate search engines. SEEC members (Kathy Klemperer, Charles Husbands, Hinda Sklar, Cathy Conroy, Rod Goins, and Dorothy Solbrig) made lists of desired features, investigated the marketplace and eventually narrowed their evaluations to four vendors: BRS Search, Basis Plus, TextSearch (from Open Text), and OCLC SiteSearch. In the Spring of 1995, SEEC chose OCLC's SiteSearch package, which includes a search engine (Newton), a Z39.50 server (Zserv), a World Wide Web gateway (WebZ), and support for a character-based client.
With a search engine product selected, SEEC became SEIC (Search Engine Implementation Committee). OIS is about to acquire the SiteSearch software and start development.
Digital finding aids pilot
During the early discussions related to HOLLIS2 development, a library special collections task force identified their special collection repository requirements for HOLLIS2. One of their recommendations was the need to be able to deliver finding aids electronically. (Finding aids are descriptions of original materials. They can range in size from a single piece of paper to thick volumes.) HOLLIS' requirement of MARC formatting makes this system an unlikely host for these materials -- such potentially large full-text files would be hard to squeeze into MARC and would be hard to read online. So, the electronic finding aids concept is in need of a separate computer host, making it an excellent pilot database for the HOLLIS Plus search engine project.
The official name of Harvard's initiative is the Digital Finding Aids Project and a working group responsible for generating digital finding aids texts has been formed (Leslie Morris - Chair, MacKenzie Smith, David de Lorenzo, Susan Gonzalez, Jean Cargill, Mary Daniels, and Michael Fitzgerald).
Project participants have decided to mark finding aids texts using SGML encoding. SGML (Standard Generalized Markup Language) is an international standard for the definition of device-independent, system-independent methods of representing texts in electronic form. SGML encoding is conceptually similar to MARC coding for bibliographic data. SGML-tagged data, like MARC data, can be exchanged between library systems when institutions adhere to the SGML standards. SGML tagging identifies the type of data (similar to MARC tagging) but each library system controls how its data will display. SGML tagging is more flexible than MARC, in that it can represent a variety of data types (full text, images, sound, etc.) in a graphical computer environment.
The group is now evaluating software authoring tools to assist with SGML mark-up. OIS tentatively plans to make the first electronic finding aids available in January 1996. OIS expects to host a presentation about this project in the Fall.
Putting the technologies together
The SEIC project is concerned primarily with providing access to locally-mounted data. SEIC activities will be integrated with other existing or developing technologies: the World Wide Web, Z39.50, HOLLIS2, existing desktop equipment, and new types of data.
World Wide Web. The main delivery method for data on this new search engine will be the World Wide Web. OCLC's SiteSearch package includes a Web server and gateway for this purpose. Using the Web for access simplifies client distribution because a variety of Web clients already exist and are widely available. The Web is a relatively easy development platform (much of the programming exists; no need to start from scratch).
Z39.50 protocol. Harvard is committed to including the Z39.50 search and retrieval protocol in its next generation online system. OCLC's SiteSearch package provides this Z39.50 connectivity for databases on the SEIC local server and FirstSearch databases available remotely from OCLC. The WebZ component provides a similar connectivity for other Z39.50-compatible services at Harvard and outside Harvard.
HOLLIS2. The next generation HOLLIS system will certainly be Z39.50 compliant. The public catalog component of HOLLIS2 can therefore be "plugged in" to the HOLLIS Plus web service when HOLLIS2 becomes available. Of course, this is a rather simplistic view -- it is still too early to tell what the HOLLIS2 public catalog will look like or how it will integrate with HOLLIS Plus.
Existing desktop equipment. We need to remember that not all desktop machines can support a graphical web browser. There will be a character-based Web common client to allow dumb terminals access to the webbed SEIC databases.
Many kinds of data. Finding aids data, marked up using SGML, will be the pilot database for the SEIC search engine, but the expectation is that the University Library will want to serve up other types of data, such as full-text electronic journals and other documents, images, multimedia documents, etc. At some point, it will even be possible to construct live links between bibliographic descriptions (such as Union Catalog records) and the electronic resources they describe.
HOLLIS Plus provides the public interface
HOLLIS Plus is the glue that holds our public-access information services together. As shown in Figure 3 below, the goal is to provide access to all information resources using a single navigator and a single web-based interface, using the Z39.50 protocol to connect to the databases whenever possible.
At the September HOLLIS Liaisons meeting there will be a presentation covering the new web-based interface for HOLLIS Plus.
After the SEIC pilot
After the January 1996 debut of the finding aids data in HOLLIS Plus, OIS will continue development of the SEIC search engine. One important part of delivering information via the Web will be user validation. Validation, also called access control, has been a simple matter in HOLLIS because of the circulation patron database. User validation is more difficult in the Web environment. OCLC is working on incorporating an access control module into SiteSearch.
A second project will be to provide access via the World Wide Web to remote Z39.50 databases, such as RLG's Eureka databases and OCLC's FirstSearch databases.
Another important feature will be a hook to Harvard holdings. Patrons using citation databases in HOLLIS today can request Harvard holdings using the LOCATION command. Such a feature is not currently available in the gopher-version of HOLLIS Plus but it is an important part of the SiteSearch configuration.
Following the presentation, liaisons had many questions about what the user interface would be like for data loaded on the new search engine. There is (understandably) confusion over the effect SEIC developments will have on user interfaces associated with current HOLLIS and HOLLIS2.
None of the activities described in this article will have any effect on the appearance or performance of the current HOLLIS system. HOLLIS will continue to provide the familiar character-based display no matter how a patron connects to it. The SEIC project also is not directly related to HOLLIS2 development. While a new search engine is important, its development is not on the critical path to HOLLIS2. If the project is delayed or fails, it will not significantly impact HOLLIS2 planning.
Suzanne Kemple asked for clarification on the functions of the Lynx client. Lynx is a character-based World Wide Web client that provides a terminal-like interface. "Character-based" means Lynx is limited to text displays; hypertext links are used to navigate through the menu structure. Lynx was developed by the University of Kansas to provide less powerful devices a non-graphical way of using the World Wide Web. Another way of understanding the role Lynx plays is to compare it with the HOLLIS Plus gopher common client in use today. A few Harvard libraries offer the gopher HOLLIS Plus from in-library IBM terminals. These devices connect to a common gopher client that provides a plain, terminal-like display that these dumb terminals can handle. When HOLLIS Plus moves to the web, Lynx will be the common client enabling dumb terminals to make the connection.
Liaisons were interested in plans to provide images via HOLLIS Plus once the SEIC search engine goes online. At this early stage OIS and the University Library have no explicit plans to start an image project. OIS, in concert with other units in the library system, have submitted grant proposals for image digitization projects but none have been funded yet. Once the finding aids pilot debuts and OIS completes further developments on the new search engine, OIS will revisit delivery of images. Anyone in the library community with ideas of potential image projects should contact Kathy Klemperer in OIS.
During the fall, more information about SEIC activities and related HOLLIS Plus developments will be made available in this newsletter, via HOLLIS Liaisons meetings, and over HULINFO. If you have further questions about SEIC and HOLLIS Plus, contact Kathy Klemperer in OIS.
Notes and Reminders
It is NOT possible for OCLC to batchload (tapeload, ftp) AMC and computer files records (except from national libraries). All other formats can be sent to them for batchloading. This means that if you are doing original cataloging of an AMC or computer files format record and are concerned about getting it into OCLC, you will have to do it online in OCLC. You cannot currently do it in HOLLIS and issue the tape command.
And, here is Robin Wendler's reply, also originally from HULINFO: The reason for this is that OCLC has not developed duplicate detection routines for these formats. However, they do accept special loads of AMC records (which are almost by definition unique), provided the supplier understands that each record may be sent only ONCE. These records must be supplied in a separate file from other records being reported.
If you need to send a batch of AMC records to OCLC, contact Robin Wendler in OIS to make arrangements.
Regular Distributed Reporting training now available
After completion of this class, students will understand the basics of GQL, the query tool used for HOLLIS Distributed Reporting. The introductory class begins with an overview of the reporting environment, including the client-server architecture, GQL's split data model, and a description of files maintained in the database and on the individual workstation. Students will learn to specify selection criteria, apply sorting and/or special functions to create basic list and statistical types of queries. Students will learn to save queries and results on disk, and will work with the report formatter to produce printed, presentation-quality reports.
The August class date is: August 15, 9:30am - 12noon, in the OIS Training Room. Consult the HOLLIS training schedule section in each issue for future class dates. Contact Patti Fucci in OIS to register for this class. Contact Martha Creedon in OIS if you have questions.
Student I.D. expiration dates and HOLLIS access control
Contact the FAS Registrar if you have questions.
Fiscal 1996 HULPR holiday schedule
During all other holidays, assume HULPR will be on its regular schedule. Note that the HOLLIS system is available 24 hours a day / 7 days a week and does not have scheduled holiday downtimes. Contact Linda Marean in OIS if you have questions.
A REALLY GOOD reason to use the EINV command
HULPR automatically saves information about the last invoice processed -- allowing an operator to make payments on any order record and have these linked back to the invoice record. This is the concept of the "current" invoice. This current invoice information is saved as part of the operator's session. Apparently, when the operator disconnects from HULPR, his or her HULPR network session (and the current invoice setting) is still "hanging around" and given the right circumstances, another operator logging on later will get this session and make a payment which will be linked to the first operator's invoice. Got that?
Although this appears to be a bug, good operator work habits will help avoid it. Use the EINV command before logging off of HULPR to end the current invoice. This prevents another operator signing on in some other location from mistakenly getting someone else's session and current invoice. Also, an operator making payments should always be sure to display the invoice, file it away, and then proceed with making payments. Contact Julie Wetherill in OIS if you have questions.
HOLLIS Enhancements Update
In addition to the changes listed above, 8 additional small changes which affect the HOLLIS infrastructure were completed recently. "Infrastructure changes" are enhancements and bug fixes that are needed to ensure reliable operation of HOLLIS but which do not have directly visible or readily describable functional effects.
OIS Current Projects