WAX Public Interface Help


The Web Archive Collection Service (WAX) is used by Harvard curators to harvest and archive selected web sites for purposes of teaching and research. This Help Guide describes the WAX public interface, which lets users browse and search the contents of these archived web sites. For more information about this service, see the About WAX page on the WAX site.

Need help? To report a problem or ask a question about WAX, use this feedback form.

Contents:

Browser Setup
Searching
   Basic searching
   Advanced searching
   Search results
Viewing an Archived Web Site
Citing an Archived Web Site
Known Issues
About WAX harvesting
Information for Copyright Owners

Browser Setup

Browser support. The WAX public interface works best in modern browsers (version 6+ of Internet Explorer or current versions of other popular browsers such as Firefox, Safari or Opera). If you experience problems viewing archived content in Internet Explorer, you may have better luck using Firefox.

Javascript support must be enabled in your browser to use WAX successfully.

Language of presentation. English is the default language of presentation in the WAX user interface. To assist users of the Constitutional Revision in Japan archive, Japanese translations of introductory text and menu choices are available. To change the presentation to Japanese, click the Japanese flag icon . To change back to an English presentation, click the US flag icon .

Character set support. Some archived web sites may contain content in non-Western languages (for example, Japanese). WAX uses Unicode UTF-8 encoding to express characters in these languages. To display and search WAX collections in non-Western languages, your browser's character encoding must be set to UTF-8.

To enter non-Western characters into the WAX search form, use the character input methods that are available on your computer.

Searching

WAX offers full text keyword searching of its archived web content, including text and links within web pages and Acrobat PDF files.

From the WAX home page, you can search one archived collection or search across several collections. If you drill down into a single web site collection, you can search the entire collection or select and search an individual archived web site.

Basic searching

No wild card search options are available at this time.

Advanced searching

WAX offers a few advanced options that can help you search for specific file types or URLs contained within an archive. These are called fielded searches.

The format of a fielded search is [field name]:[term], e.g., type:application/pdf. Available fields are:

It is possible to combine fielded searches with each other as well as with full text searches in one search query, for example:

ministry type:application/pdf
     will find any PDF file containing the keyword ministry.

"conservative voice" site:www.adamsweb.us
     will find content within the specified site that contains the phrase "conservative voice".

Be sure to put a space between each component in your search.

Search results

WAX search results display in order of relevance (with most relevant hits at the top).

Search results will be limited to a single relevant hit from each of the individual web site domains within the scope of your search. WAX imposes this limitation to prevent hits in one domain from overwhelming your search results.

You have several options for viewing an archived page from search results:

The screen shot below illustrates the results of a search within an archive of Harvard departmental web sites. The results include a single hit from each Harvard domain (fas.harvard.edu, mcb.harvard.edu, etc.).

Viewing an Archived Web Site

WAX collections usually offer multiple archived versions of a web site, distinguished by the date the site was crawled. When you select an archived site, you will be viewing that web site as it appeared then. The number of archived versions available depends on the frequency of harvest schedules for each web site. Note that there is a delay of at least three months between when a web site is harvested and when it will display in WAX.

Important things to watch out for when you are viewing an archived web site:

For additional hints about viewing archived sites, see the Known Issues part of this guide.

Citing an Archived Web Site

Please remember to cite the use of WAX archived content in your work. To assist you, WAX offers a "Cite This Resource" option that produces a ready-made citation in three styles (APA, Chicago and MLA).

Click on the "Cite This Resource" option to view and copy the WAX citation of your choice. This option appears on the collection's home page, the individual archived web site description page, and in the upper frame when you are viewing archived pages.

Consult these links to learn more about citation styles:

American Psychological Association (APA)

Chicago Manual of Style (Chicago)

MLA Style Manual and Guide to Scholarly Publishing

Known Issues

Web archiving technology is still in development with improvements being made continually. Currently, some of the original functionality found in web sites may not be preserved or may not display properly in the archived version of the sites. Issues that we have identified are listed below (with workarounds when we know them).

About WAX harvesting

To archive web content, WAX uses Heritrix, a web crawler designed by the Internet Archive. The WAX crawler is a program that browses web sites and copies their content for the WAX archive.

The WAX crawler is called: hul-wax

The WAX crawler will obey all common instructions in robots.txt files. You may specifically instruct our crawler to harvest material from your site or not to harvest material from your web site by updating your robots.txt file to include us. The robots.txt file must be placed at the root of your server. More information about robots.txt files can be found at: http://www.robotstxt.org/robotstxt.html.

Allowing the WAX crawler. The following text added to the robots.txt file will allow our harvester to crawl your web site:

   User-agent: hul-wax
   Disallow:

Prohibiting the WAX crawler. The following text added to the robots.txt file will disallow our harvester to crawl your web site:

   User-agent: hul-wax
   Disallow: /

Information for Copyright Owners

If you own or control copyrighted content available in WAX and wish it to be taken down, please let us know. To make a take down request or inquire about inclusion of your content in WAX, use the WAX feedback form. Please identify in your submission the URL(s) of the web page(s) carrying your content, the date(s) and time(s) of archiving, the specific content on the page(s) to which you claim rights, and the nature of your rights, e.g.:

http://www.school.edu/faculty archived January 1, 2009 at 12:00 AM, photograph of teachers, creator Jane Doe, photograph registered for copyright.


Last modified: Monday, 26-Jan-2009 10:31:20 EST
© 2008 President and Fellows of Harvard College
http://hul.harvard.edu/ois/systems/wax/wax-public-help/
Office for Information Systems
Harvard University Library

For assistance contact WAX Support.