Web Archiving (WAX) News

[an error occurred while processing this directive]

See more News

Overview: Web Archive Collection Service (WAX)

The Web Archive Collection Service (WAX) supports the collection of selected web content to ensure its long term preservation and accessibility for teaching and research. WAX began as a pilot service in 2006 and became a regular production service in February 2009. See WAX History for more information about the origins of this service.

What is WAX?
Who can use WAX?
What materials are eligible for WAX?
How to participate

What is WAX?

The WAX system lets a Harvard curator harvest one or more thematically related web sites into an archived collection. The curator uses a web-based administrative interface (called WAXI) to select, capture (harvest), organize, and describe the collection. The archived web collection is stored in the HUL Digital Repository Service (DRS) and can be searched or browsed from the WAX Public Interface.

The WAX system was built by OIS using several open source tools developed by the Internet Archive and other International Internet Preservation Consortium (IIPC) members. These tools include the Heritrix web crawler (used to capture web sites for archiving), the Wayback index and rendering tool and the NutchWAX index and search tool. WAX also uses Quartz, open source job scheduling software from OpenSymphony.

WAX currently harvests content from the surface web -- content that is discoverable to search engines or web crawlers, as opposed to content hidden from web crawlers in a database or restricted by password or login protection.

Version 1 of WAX supports the delivery of publicly-accessible web sites. In a future version, WAX will also support restricting delivery of archived web sites to the Harvard community.

Who can use WAX?

Harvard libraries, museums and archives are eligible to use the WAX Service. Other Harvard organizational units and individual members of the Harvard community are eligible, when sponsored by a Harvard library, museum or archive.

What materials are eligible for WAX?

Web sites being considered for WAX archiving should consist of materials that have library-like qualities (materials with persistent value, intended to support research or teaching). WAX is not designed for short term use. WAX collections can belong to any academic discipline, subject domain, etc.

How to participate

The planning needed to create a WAX collection usually takes about 3 months. Both new participants and returning participants will need to prepare as noted below. Questions about WAX participation should be directed to the Digital Library Projects Group in OIS.

Note that the scheduling of WAX projects is based on the availability of OIS resources. 

  1. Assign a “curator” for the project. The curator should be a staff member from the library, archive, or museum that is sponsoring the WAX collection. The curator will be the main contact for OIS regarding the project and will take the lead on planning and setup.
  2. Submit a WAX project inquiry form and then meet with OIS to evaluate the collection’s eligibility for WAX and assess the size of the project.

    For HCL projects: A complete proposal includes compliance with the HCL review process. Please contact Maggie Hale in HCL Imaging Services for assistance.

    Once your project is evaluated, OIS will respond with a WAX project proposal that includes an initial outline of tasks, project timing and associated fees.  OIS will also assign a digital project liaison to assist in the effort.


For current set up and maintenance fee rates, see Library Systems Fees and Assessments.

WAX participants are responsible for the following fees:

  • New participant startup fee:
    • Covers analysis, training and support, as well as operational costs in the first (startup) year.
    • Charged the first (startup) fiscal year.
  • Annual maintenance fee:
    • Covers direct incremental costs of operations and database maintenance (including hardware, UIS server fees, facilities and monitoring, database processing and storage).
    • Charged the fiscal year following the first (startup) year.
  • Regular DRS storage fees for archived content.

An annual review of costs and activity will be conducted and adjustments made as we all gain experience using and operating the WAX service.