Office for Information Systems - Harvard University Library
OASIS Home | OASIS Documentation Home

OASIS Indexing Decisions

Non-public indexes

  1. ID Number
    <eadid> (used for updating finding aids by OIS)
  2. Donor
    <admininfo><acqinfo> (to provide donor/source index)
  3. Processor
    <processinfo> (to provide access to collections cataloged by a particular cataloger)
    NOTE: the term "processor" was added to the Harvard-supplemented USMARC Code List for Relators as a standard term to be used as an attribute.

Public indexes

  1. Anywhere
    entire finding aid except <eadheader> and <frontmatter>
  2. Names (people and organizations):
    <name> anywhere
    <persname> anywhere
    <corpname> anywhere
    <famname> anywhere
    <archdesc><did><origination>
    NOTE: Dates and relation information found in name elements are indexed as keywords together with the names. Names should be marked up to allow for optimal retrieval (i.e. normalized as much as possible, especially punctuation).
  3. Places
    <geogname> anywhere
  4. Titles (books, poems, songs, etc.)
    <title> anywhere
    NOTE: These are titles of works (as opposed to cataloger-created descriptions)
  5. Subjects and Genres
    <scopecontent> anywhere
    <bioghist> anywhere
    <occupation> anywhere
    <genreform> anywhere
    <subject> anywhere
    <geogname> anywhere
    <c>...<unittitle>
    <c>...<note>
  6. Repository Name
    <repository>
  7. Call number/accession number
    <archdesc><did><unitid> (but not in a <c>)
    <admininfo><acqinfo><p><num>
  8. Dates
    <c>...<unitdate>
    <archdesc><did><unitdate> (gives inclusive dates of collection materials)
    <date> anywhere except in <admininfo>
    <admininfo><acqinfo><date>
    NOTE ON DATE INDEXING: To support range date searching, all dates need to be four-digit numbers. That is, if a date element contains the phrase "1920-1930", a search for 1925 will retrieve that finding aid, but if the phrase is "1920-30" it won't.
  9. Container Listing:
    everything in the <dsc>

Notes

Punctuation is ignored in indexing ( ) . , [ ] -

Diacritics and special characters are normalized for indexing (ISO Latin-1).

The user is presented with a list of finding aids that met her/his search criteria, with each finding aid only appearing once on the list. The user selects a finding aid and is given a "key word in context" view of the document, with each occurrence of the search term highlighted. On selecting the "full text view" of the finding aid the user enters the document at its top, and is directed to do a key word search using the browser's "find" feature within that finding aid.

At this point, all of the indexes are envisioned as being key word indices (as opposed to exact string indexes) because of the lack of standardization in the forms of these elements and the lack of subelements (such as surname and forename) to help.