DRS Batch Builder User Guide
Any recent updates to this document will be available online:
(html) http://hul.harvard.edu/ois/systems/drs/bb-userguide/
(pdf) http://hul.harvard.edu/ois/systems/drs/bb-userguide/bb-userguide.pdfThis User Guide describes how to use DRS Batch Builder -- a desktop application that simplifies the process of DRS batch deposit. Batch Builder currently works with these batch genres: still image, page-turned object and container page-turned object batches.
Need help? To report a problem or ask a question about Batch Builder send a message to Batch Builder Support (bb-support@hulmail.harvard.edu). If you are reporting a problem, please provide a detailed description of the problem, and a copy of any Batch Builder error or warning messages.
Batch Builder is developed and supported by the Harvard University Library Office for Information Systems (OIS).
Table of contents
1. Getting Started
2. Creating a General Image Batch
3. Creating a Page-Turned Object Batch
4. Creating a Container Batch
5. Uploading Batches to DRS
6. Expressing Relationships
7. Using External Mapping Files
8. Working with URNs
9. Reports and Messages
10. Metadata Reference
11. Command Line Reference
Top | Getting Started | Image Batches | Paged Batches | Container Batches | Uploading | Relationships | Mapping Files | URNs | Reports | Metadata | Command Line1. Getting Started
1.1 Installing Batch Builder
1.2 Setting options
1.3 User interface basics
1.4 Batch creation overview
1.5 About file names
1.6 About directory names and structure
1.7 Batch preparation checklist
1.8 Supporting documentationThis section provides an overview of Batch Builder features and describes requirements for file names and batch directories.
1.1 Installing Batch Builder
Batch Builder is a java-based application compatible with Windows, Mac and Linux operating systems. Batch Builder does not make live connections to the DRS and can operate on computers not connected to a network.
The Batch Builder application package includes a graphical user interface and a separate command line tool (for automated deposit workflows). The application package is bundled as a zip file, which must be unzipped and installed on the desktop. The Batch Builder zip archive can be downloaded from the OIS web site.
Note: Batch Builder requires that version 1.5.0 or higher of the Java Runtime Environment (JRE) be installed on the desktop.
- Copy the Batch Builder zip archive to your file system.
- Unzip the archive and extract the files to a directory of your choice. The installer will copy the files to a "batchbuilder-1.x" directory.
- To start the graphical user interface, double click the launch file (look for the Batch Builder icon
). See the Command Line Reference section for instructions on using the command line tool.
Consult the release.txt file in the archive for information on recent updates to the application. A PDF version of this User Guide (bb-userguide.pdf) is also included in the archive.
1.2 Setting options
There are a few Batch Builder features that are controlled by using View > Options on the main menu. These options will affect all projects.
![]()
- Auto-increment new batch directory names. (Unselected by default.) When activated, if you create a batch directory name that ends with a number, the next time you click Batch->New.. Batch Builder will use the same directory name but will increment the number to the next value.
- Enable METS file creation. (Selected by default.) When activated, Batch Builder will create a simple PDS METS file during generation of a page-turned batch. De-select this option if you plan to supply an externally-created METS file.
- Ignore file validation errors. (Unselected by default.) When activated, this option forces Batch Builder to create a batch even though JHOVE validation has detected errors in one or more objects in the batch.
- Open last project on application startup. (Unselected by default.) When activated, on Batch Builder startup the most recently used project will open automatically.
- Save reports as XML. (Selected by default.) When activated, Batch Builder will generate batch summary reports in text and xml formats. Reports are stored in the [project]\sync\[batch directory] folder on your file system.
- Validate contents of zip container objects. (Selected by default.) Deselect this option to prevent Batch Builder from validating the contents of ZIP container objects.
1.3 User interface basics
The Batch Builder interface includes a menu bar, a tool bar, and four display panes. In the following screen-shot of the interface, these components are labeled.
![]()
Below are more details about the project directory panel.
![]()
As you make changes to a project, generate batch directories and batch.xml files, you may need to refresh this panel in order to update the display of files and folders. To do this, from the main menu select View > Refresh file system panel.
1.4 Batch creation overview
The goal of using Batch Builder is to prepare a group of digital objects for deposit to DRS. The output of Batch Builder is called a "batch". A batch consists of:
- A set of "batch" directories on your local file system, populated with digital object files.
- A batch control file (batch.xml), which describes the batch of objects and instructs the DRS loader to perform these actions: add objects to DRS, create URNs, and create relationships between deposited objects.
- A PDS METS file (for page-turned object batches only). The METS file is a structural metadata file that identifies all the components of a document, describes its structure, and allows for page-turning navigation.
Once the batch is prepared, the deposit agent uses a secure FTP client to upload the batch directories and xml file(s) to a DRS drop box. The upload step is performed outside of Batch Builder.
Here is an overview of batch preparation using Batch Builder.
- Create a project. A project is a collection of metadata and directory settings that can be used over and over to create batches with the same characteristics.
For example, if you deposit both image and page-turned batches, you would need to create separate projects for each. Or, in a high volume deposit operation that handles digital materials for many libraries, you might create a separate project for each DRS owner code you work with.
- Create a batch template for the project. In this batch template you will define the batch directory structure and assign any global or directory-level metadata values that will be applied to digital objects in the batch.
- Create batch directories in the project. At this point, you provide a name for the batch and Batch Builder will generate the directory structure on your file system (based on the batch template). You then populate these directories with the digital object files that will be part of this batch.
- Generate a batch control file. A batch control file (batch.xml) is added to the top-level batch directory. To this file, Batch Builder adds batch-level administrative metadata as recorded in the project's batch template. Batch Builder then traverses the batch directory structure and:
- Extracts technical metadata from each digital object in the batch and adds this metadata to the batch.xml file. (For this activity, Batch Builder uses JHOVE - a format validation module developed by Harvard and JSTOR.)
Note: when JHOVE finds a file to be invalid, it cannot extract the file's technical metadata, which means the metadata properties required by Batch Builder cannot be calculated. If JHOVE is unable to extract technical metadata for some digital objects in the batch, you can supply the missing values in the batch template and these will substitute for the metadata values that JHOVE cannot generate. See the "Set by JHOVE" portions of the Metadata Reference for more information.
- Determines administrative metadata for each digital object based on the names and directory locations of the files and adds this metadata to the batch.xml file. See the Metadata Reference for more information.
- Determines an owner supplied name for each file, either by using original file names or an external mapping file, and adds these to the batch.xml file. See About file names for more information.
- Creates an MD5 signature for each file and adds this to the batch.xml file. The DRS loader uses the MD5 signature to verify that each transferred file arrives intact.
- Generates URN requests for deliverable objects in the batch and adds these to the batch.xml file. See Working with URNs for more information.
- Generates object relationship metadata and adds this to the batch.xml file. Note: creation of relationship metadata is triggered by a nested batch directory structure and files with matching ownerSuppliedNames. See Expressing Relationships for more information.
- Generates a PDS METS file (for page-turned batches only). The METS file (usually called mets.xml) will be added to the top level batch directory. Note: generation of METS xml is optional and can be controlled by setting a Batch Builder option. See Creating a Page-Turned Object Batch for more information.
As the batch is processed, messages will display in the Batch Builder message window. For a successful batch, the final message will be:
INFO - Creation of batch.xml complete for batch [batch name]Batch Builder will also create a report that summarizes the results of the batch.xml generation process.
Any "Error" messages that appear usually indicate that generation of the batch.xml will fail. You will need to fix the errors and re-generate the batch.
For more information on the summary report and processing messages, see Reports and Messages.
- Upload the batch directories to a DRS dropbox. This step must be performed outside of Batch Builder. The depositor must use a secure FTP client to transfer the batch directory with its batch control file (and METS file if included) to the appropriate DRS drop box. See Uploading batches to DRS for more information.
1.5 About file names
This topic describes file naming requirements for digital objects in batches handled by Batch Builder. Also included are descriptions of object file names in DRS storage and the purpose of owner supplied name.
File names of digital objects on disk
For all files in a batch:
- Maximum number of characters per file name is 100.
- Valid characters in file names are letters, digits, '.', underscores ('_'), and hyphens ('-').
Related digital objects (e.g., a production master .tif file and its related deliverable .jpg file) can share the same file name (with a different format extension of course).
File names of digital objects in DRS
The file name of a digital object will be changed once the object is in DRS storage. At the point of deposit, DRS assigns each object a numeric DRS identifier. The file name of the object in DRS storage will be the DRS ID followed by the format extension (e.g., 5844020.tif, 5844022.jpg).
The original file name of a deposited object is not preserved in DRS metadata, but the DRS load report that is sent after a deposit is processed will associate the object's DRS ID with the original file name. Also, if you use Batch Builder's default method for assigning owner supplied name, the original file name (minus format extension) will be preserved in DRS metadata as the ownerSuppliedName.
Owner supplied name
Each object stored in the DRS must be assigned an owner supplied name (
ownerSuppliedNameis the name of the corresponding element in DRS batch metadata). This name serves as a unique identifier that links deposited objects with local information about those objects. This name can be some type of record ID (e.g., an OLIVIA record ID), a local accession number or other curatorially-significant name.By default Batch Builder will use the names of the files on disk to determine the DRS ownerSuppliedName values. For example, an image file called
image998.tifwill be assigned an ownerSuppliedName ofimage998.Within a DRS owner code, the owner supplied name must be unique. More than one digital object within an owner code can have the same owner-supplied name if the role/purpose/quality values for the objects are different. For example, an archival TIFF and deliverable JPEG with the same base file name will be assigned the same owner supplied name by Batch Builder:
Note: Batch Builder can be configured to use an external ownerSuppliedName mapping file when the default method of assigning ownerSuppliedName based on file name on disk is not desired. See Using external mapping files for more information.
1.6 About directory names and structure
All files in a batch must be located within a set of batch directories on the local file system and this set of directories must be viewable and editable by Batch Builder. This topic describes Batch Builder requirements for batch directory structure and naming.
Batch directory structure
All files within a batch must be located in a batch directory. The top level of this directory should contain only the batch control file (batch.xml). For page-turned object batches, the top-level directory should also contain the PDS METS file.
Besides the batch.xml and METS file, no other file should be stored in the top-level batch directory. Batch Builder will ignore any other files at this level.
All other files in the batch must be located in an appropriate subdirectory under the top level directory. Batch subdirectories are named to indicate the role or type of file they contain (e.g., deliverable, archive_master, etc.). Batch subdirectory structure allows Batch Builder to:
- Infer the type of file contained in the subdirectory. For example, a subdirectory called
deliverableis understood to contain files meant for delivery to users.- Interpret the relationships between files in the batch. For example, a subdirectory called
deliverablenested under a subdirectory calledarchival_masterindicates to Batch Builder that the deliverable files were derived from the archival master files. If these subdirectories are parallel and not nested, no derivation is assumed. See Expressing Relationships for more information.- Assign user-supplied metadata. In Batch Builder, a directory of files is the smallest unit at which the depositor can assign metadata.
Depending on the batch, some subdirectories will not be needed. For example, a page-turned batch without any OCR text can omit the
ocrsubdirectories. Batch Builder will ignore empty subdirectories.A batch can contain more than one of the same type of batch subdirectory. For example, a batch can contain subdirectories named
deliverable_1,deliverable_2anddeliverable_3. This can be useful for grouping together files with the same metadata requirements (e.g., the same quality or purpose value).Batch directory names
For the top-level batch directory and all of its subdirectories:
- Maximum directory name length is 100 characters.
- Valid directory name characters are letters, numbers, underscores ('_') and hyphens ('-').
Batch Builder has no requirements for the name of the top-level batch directory but has very specific requirements for naming of its subdirectories.
Within the top-level batch directory, all digital object files must be contained within subdirectories that are named to indicate their role or type. Batch subdirectory names must begin with a pre-defined prefix. The following table lists a few sample directory name prefixes and how files in these subdirectories will be interpreted by Batch Builder.
Consult the requirements of general image batches, page-turned object batches or container batches for detailed subdirectory naming requirements.
Classifying an image as an archival master, production master or deliverable is a local decision. These classifications currently have no bearing upon preservation services in the DRS. General practices to date have been to use the term archival master to designate the highest quality or least processed versions of images; to use production master to designate the images that have been optimized to generate deliverables, particularly through batch automation; and to use deliverable to designate the images optimized for rendering on given delivery systems (e.g., web browsers).
Note: Only the first portion (the prefix) of batch subdirectory names is prescribed. Batch Builder will allow the depositor to append additional information to subdirectory names as long as the entire name uses valid characters and does not exceed 100 characters in length. For example, these subdirectory names are valid:
deliverable_1deliverable_screendeliverable_thumb1.7 Batch preparation checklist
Depositors should consult this checklist as they prepare to use Batch Builder.
- Prepare new depositors. If you are new to the DRS batch deposit process, you may need to request one or more DRS drop box accounts and perform some test deposits. See How to become a DRS Deposit Agent on the OIS web site.
- Determine project administrative metadata. You will need to know the DRS owner code, DRS billing code, success and failure email addresses, URN authority path, and URN resource name pattern. These are part of administrative properties in Batch Builder. For more information, see How to become a DRS Deposit Agent on the OIS web site.
- Understand DRS file naming options. Consult About file names for information on how to name digital object files on disk, what DRS does with file names of deposited objects, and the purpose of owner supplied name.
- Review batch workflows. For a summary of batch creation using Batch Builder, see Batch creation overview. See also detailed procedures for general image batches, page-turned object batches, and container batches. Each of these batch genres has specific file name and batch directory requirements.
- Check Batch Builder options. There are a few options that impact every project you create in Batch Builder. See Setting options for more information.
1.8 Supporting documentation
This User Guide assumes that Batch Builder users are generally familiar with DRS, NRS and PDS operations. For more information, see the online documentation on the OIS web site:
DRS information: http://hul.harvard.edu/ois/systems/drs/
NRS information: http://hul.harvard.edu/ois/systems/nrs/
PDS information: http://hul.harvard.edu/ois/systems/pds/
Top | Getting Started | Image Batches | Paged Batches | Container Batches | Uploading | Relationships | Mapping Files | URNs | Reports | Metadata | Command Line2. Creating a General Image Batch
2.1 Accepted formats
2.2 File name rules
2.3 Batch directory rules
2.4 Procedure to create an image batchA general image batch consists of:
- zero or more archival master images
- zero or more production master images (also called archival production images)
- zero or more delivery images
- zero or more external ICC profiles
- zero or more target files
- zero or more text files associated with target images
All files in the batch must share a single DRS owner code and a single DRS billing code.
Only the delivery images (files in a
deliverablesubdirectory) will be assigned a persistent identifier (also known as a URN or "Universal Resource Name"). NRS URNs can be added or modified after DRS deposit using the NRS Maintenance System.2.1 Accepted formats
For digital objects in general image batches, the following DRS-supported file formats can be used:
- GIF image files (file extension: gif)
- JPEG image files (file extension: jpg)
- JP2 image files (file extension: jp2)
- JPX image files (file extension: jpx)
- TIFF image files (file extension: tif or tiff)
- PhotoCD image files (file extension: pcd)
Note: although Batch Builder will accept photoCD files, the JHOVE module cannot automatically extract technical metadata for this format.
- Plain text files (file extension: txt or tdf)
Only US-ASCII and UTF-8 encodings are supported
- ICC files (file extension: icm or icc)
The file extensions noted above are mandatory.
Any other file formats not listed above that are included in the batch will be ignored by Batch Builder. These ignored files will not be described anywhere in the generated batch.xml file, and they will not be deleted from the batch directory.
2.2 File name rules
This topic describes specific file name requirements for general image batches. See About file names for general Batch Builder requirements.
The file-naming scheme for files in general image batches is:
{ownerSuppliedName}_-_{shortDescription}.{extension}where:
{ownerSuppliedName} is the item control name that associates the digital file with its analog counterpart. This may be an OLIVIA ID, accession number or other curatorially-significant name. In the case of target files or ICC profiles this could be a locally-meaningful name, e.g. Adobe_RGB_1998, PrimeScan_PrimescanFujiNegColor or it8_20000714. Batch Builder will use this value for the batch.xml's ownerSuppliedName value (you can also define ownerSuppliedName using a mapping file). Valid characters to use for the {ownerSuppliedName} are letters, digits, '.', underscores ('_'), and hyphens ('-' ).
_-_ character sequence (one underscore, one hyphen, one underscore) is used to separate the {ownerSuppliedName} from the {shortDescription}, if the {shortDescription} is present.
{shortDescription} is an optional locally-meaningful description of the file. This portion of the file name will not be used for any Batch Builder metadata generation. It is for depositors who want to embed information in the file names that is useful for their workflow like: thumb, prodarc, large, etc. Valid characters to use for the {shortDescription} are letters, digits, underscores ('_'), and hyphens ('-' ). Do not use the character sequence _-_ as part of the {shortDescription}.
.{extension} is one of the valid file extensions listed at the beginning of this topic.
2.3 Batch directory rules
This topic describes specific requirements for subdirectory names and structure in general image batches. See About directory names and structure for general Batch Builder requirements.
Within the top-level batch directory, all digital object files must be contained within subdirectories that are named to indicate their role or type. Batch Builder requires that subdirectory names begin with a pre-defined prefix. The following table lists valid prefixes for general image batches and how files in these subdirectories will be interpreted by Batch Builder.
As long as the first part of a batch subdirectory's name follows the rules above, you can append additional information. Batch Builder would consider the following subdirectory names to be valid:
archival_master-1archival_master_20061205archival_masterPCCRAny extensions to a subdirectory's name must contain only letters, numbers, underscores or hyphens and must not exceed 100 characters in length.
For the
archival_master,production_masteranddeliverabledirectories, when a "parent" subdirectory A contains a "child" subdirectory B, Batch Builder will infer that the digital object files in B are derived from the digital object files in A.When an
iccortargetdirectory is contained within one of the image subdirectories (archival_master,production_master,deliverable), Batch Builder will infer that the ICC and/or target files should be associated with all files in the image subdirectory. If the batch has text files that need to be associated with the target image files, put them in thetargetdirectory along with the target files.Below is a sample batch directory for a general image batch that includes a deliverable file derived from a production master, which in turn was derived from an archival master. The archival master has a related target file with a related text file as well as a related ICC file. The production master also has a related ICC file.
![]()
2.4 Procedure to create an image batch
Follow these steps to create a general image batch. The steps in Part I (Create the project and batch template) need only be done once. (A project is a template from which you can prepare multiple batches for DRS deposit.) If you have already created the project and batch template, proceed to Part II (Create a batch).
Note: to save project values at any point, from the main menu select Project > Save.
Part I: Create the project and batch template
- From the Batch Builder main menu, select Project > New.
- Complete the New Project form and click OK.
![]()
- Project name: this name displays in the Batch Builder title bar when the project is open. This value can be changed later by accessing the Project Properties node in the Configuration panel.
- Project directory: path and directory name in which project files are stored. To locate or create the directory, click the ellipses (...) button to browse your file system. This value cannot be changed once a project is created.
Best practice note: new users should consider including the word "project" in their project directory names to make it easier to differentiate these from the underlying batch directories.
- Batch genre: select "generalimage" as the genre. This value cannot be changed once the project is created.
- Project description: an internal free text note field associated with the project. This value can be changed later by accessing the Project Properties node in the Configuration panel.
- Add administrative metadata. In the Configuration panel on left, select Administrative Properties. This panel records project-level administrative metadata. This metadata will be used in every batch.xml file generated by this project.
![]()
Mouse over any of the field labels to display a definition. Or, consult Administrative properties in the Metadata Reference section for more information.
An asterisk indicates required fields. Batch Builder will provide warnings in the message pane if required fields are missing, but does not validate the contents of these fields. For example, if you supply an invalid Owner Code, this value will be added to the batch.xml file and at deposit, the entire batch will be rejected by DRS.
- [Optional] Add global metadata that will be applied to every digital object in batches created by this project. In the Configuration panel, select Global Properties under the Batch Template node.
![]()
Consult Global properties in the Metadata Reference section for more information.
If you plan to accept the JHOVE-supplied metadata and do not need to include optional global metadata, skip this step.
- Add batch directories to the template. In the Configuration panel, right click on Batch Template and select Add directory. In the pop-up window, choose a directory name from the list. Directory name choices on the list are determined by the batch genre.
![]()
[Optional] You can modify a directory name by appending custom text after the predefined name. Type the custom text in the box to the right of the directory list.
Click OK to add the directory to the template. Repeat this step to add all directories needed for this project.
If you plan on defining file relationships in batches created by this project, the directories that you add must be nested. For example, to nest a
deliverabledirectory under anarchival_masterdirectory, create thearchival_masterdirectory first, right click on it, select Add directory and select deliverable. See Expressing Relationships for more information.- [Optional] Add directory-specific metadata that will be applied to every digital object in a specific batch directory. In the Configuration panel, select a directory to display its metadata properties.
![]()
Consult Directory-level properties in the Metadata Reference section for more information on this metadata.
If you plan to accept the administrative metadata inferred by Batch Builder and the technical metadata supplied by JHOVE, skip this step.
Part II: Create a batch
- Create the batch directory and subdirectories. From the main menu, select Batch > New. In the New Batch window, supply a top-level directory name.
![]()
Batch Builder has no specific requirements for name of the top-level batch directory, but the name must be no longer than 100 characters and must consist only of letters, numbers, underscores ('_') and hyphens ('-').
By default, the `Create directories from batch template' checkbox will be selected (indicating that Batch Builder will create subdirectories based on directory names defined in the batch template). De-select this option if you want to create the directories outside of Batch Builder.
- In the Project Directory pane, an entry for your new batch will appear, flagged with a red b.
![]()
The red b indicates that a batch.xml file has not yet been generated for this batch directory.
- Move digital object files into the batch directories. This step must be performed outside of Batch Builder.
In Batch Builder, you will be able to see these files in the Project Directory panel if you refresh the display. From the main menu, select View > Refresh file system panel.
- Generate the batch xml file. Right click on the batch and select "Create batch.xml file" (or from the main menu, select Batch > Create batch.xml file).
As Batch Builder processes the batch, status messages will display in the message window. If batch generation is successful, the final message will be:
INFO - Creation of batch.xml complete for batch image-batch-1In the Project Directory pane, the "b" icon next to the batch directory changes to a green b with an orange x to indicate the batch xml file has been generated.
![]()
If any required metadata is missing, or if any digital object files are invalid, Batch Builder will display an error message and the batch.xml file will not be generated. You will need to fix the errors and re-generate the batch.
- [Optional] Review the summary report for this batch.xml generation process. You can view this report by selecting the batch in the `project directory' panel and clicking the Reports node.
Note that on your file system, this report is saved in the Sync directory for the project and not in the batch directory:
[project name]\sync\[batch name]\reports\
The summary report is not deposited to the DRS, so must not be placed within a batch directory.
Once the batch.xml file has been successfully generated, you are ready to upload the batch to DRS. See Uploading batches to DRS for more information.
Top | Getting Started | Image Batches | Paged Batches | Container Batches | Uploading | Relationships | Mapping Files | URNs | Reports | Metadata | Command Line3. Creating a Page-Turned Object Batch
3.1 Accepted formats
3.2 File name rules
3.3 Batch directory rules
3.4 About the PDS METS file
3.5 Procedure to create a page-turned batchA page-turned object batch consists of at most:
- A single PDS METS file (a structural metadata file that identifies all the components of a document, describes its structure, and allows for page-turning navigation).
- One or more page image files.
- Zero or more text file-versions of the page images.
In some cases, there are multiple images per page, one of which is an archival image, and one or more delivery images derived from the archival image in other formats. In other cases, there will be a single image file per page. For example, the Page Delivery Service will create delivery page images from JPEG2000 or TIFF masters. For detailed PDS file requirements, consult the PDS section of the OIS web site.
Batch Builder presumes that all files in a batch make up a single page-turned object (depositing multiple page-turned objects within a single batch is not supported). A group of related page-turned objects must be deposited individually to DRS and can then be merged together later using the PDS Maintenance System. See the PDS Maintenance User Guide for more information.
All the files in a page-turned batch need to share a single DRS owner code and a single DRS billing code.
During the batch generation process, Batch Builder will request a persistent ID (URN) based on the selected batch genre. If the "Page-turned Object" genre is selected, Batch Builder will request a URN for the METS file only. This URN will resolve to the citation (top) level of the document. If the "Page-turned Object - IDS URN for each image" genre is selected, a URN will be requested for every deliverable image in the batch.
3.1 Accepted formats
For page-turned objects, the following DRS-supported file formats can be used:
- GIF image files (file extension: gif)
- JPEG image files (file extension: jpg)
- JP2 image files (file extension: jp2)
- JPX image files (file extension: jpx)
- TIFF image files (file extension: tif or tiff)
- Plain text files (file extension: txt)
Only US-ASCII and UTF-8 encodings are supported.
- XML files (file extension: xml)
Only accepted in the case of the PDS METS file. There can be at most one XML file in the batch and this file must be located in the top-level batch directory.
The file extensions noted above are mandatory.
3.2 File name rules
This topic describes specific file name requirements for page-turned object batches. See About file names for general Batch Builder requirements.
METS file naming
If Batch Builder is configured to automatically generate the METS file, the file naming scheme will be:
{project name}_{batch directory name}_mets.xml
where:
{project name} is the name assigned to the project, as it appears in the Project Properties pane.
{batch directory name} is the name of the top level batch directory, as it appears in the Project Directory pane.
If the METS file is supplied by the depositor, the file name can be any locally meaningful name that follows Batch Builder name rules (only letters, numbers, underscores, hyphens and no more than 100 characters long). The file extension must be .xml.
Like all other files deposited to DRS, the METS file must have an owner supplied name that is unique within an owner's collection (within a DRS owner code). By default, Batch Builder will generate an owner supplied name for the METS file from its file name on disk. If for some reason the METS file name on disk will not yield a unique owner supplied name, you can use a mapping file to assign a unique owner supplied name to the METS file.
Page image and page text file naming
The file-naming scheme for page image and text files in page-turned object batches is:
{ownerSuppliedName}{separator}{sequenceNumber}.{extension}
or
{ownerSuppliedName}{sequenceNumber}.{extension}
where:
{ownerSuppliedName} is the item control name that associates the digital file with its analog counterpart. This may be an accession number or other curatorially-significant name. Batch Builder will use this value for the batch.xml's ownerSuppliedName value. Valid characters to use for the {ownerSuppliedName} are letters, digits, '.', underscores ('_'), and hyphens ('-' ).
{separator} is a dash(-) or underscore (_)used to separate the {ownerSuppliedName} from the page sequence number. A dash or underscore is optional if the last character in the ownerSuppliedName is a letter and not a number.
{sequenceNumber} is the numeric value that represents the sequence number of the page within the page-turned document. A sequenceNumber can be composed of any of the following characters: 0123456789.
The sequence number can include leading zeros, for example the third page can be written as: 3 or 03 or 000000003. The page sequence number indicates a page's relative position within a sequence of pages, regardless of the numbering that may appear on the page. If page sequence numbers are supplied in a separate page naming file, this separator and the {sequenceNumber} can be omitted.
.{extension} is one of the valid file extensions at the beginning of this topic.
Batch Builder will be able to identify page sequence "1" in these file names:
page1.tifpage-1.jpgpage_1.txtpage_-_1.tif9876page1.jp21980a1.txt2005-1.jp2Batch Builder will not be able to identify page sequence "1" in these file names:
2001e551.tif87051.jpgBatch Builder will use the file name on disk (minus extension) for the page object's ownerSuppliedName value in the batch.xml (you can also define ownerSuppliedName using a mapping file).
3.3 Batch directory rules
This topic describes specific requirements for subdirectory names and structure in page-turned object batches. See About directory names and structure for general Batch Builder requirements.
Within the top-level batch directory, all digital object files must be contained within subdirectories that are named to indicate their role or type. Batch subdirectory names must begin with a pre-defined prefix. The following table lists valid prefixes for page-turned object batches and how files in these subdirectories will be interpreted by Batch Builder.
Only the first portion (the prefix) of batch subdirectory names is prescribed. Batch Builder will allow the depositor to append additional information to subdirectory names as long as the entire name uses valid characters and does not exceed 100 characters in length. Batch Builder would consider the following subdirectory names to be valid:
archival_master-1
archival_master_20061205
archival_masterPCCRWhen an
ocr_uncorrected,ocr_correctedorkeyed_textsubdirectory is present in the batch directory, Batch Builder will infer that it contains text files that should be associated with page images.Below is a sample batch directory for a page-turned batch that includes a PDS METS file, two archival master files, two deliverable files, and two uncorrected OCR files.
![]()
3.4 About the PDS METS file
The PDS METS file is a structural metadata file that identifies all the components of a page-turned document, describes its structure, and allows for page-turning navigation. A page-turned object batch must contain one METS file (Batch Builder presumes that all files in a batch make up a single page-turned object). The METS file must be located in the top-level batch directory.
Creation of the PDS METS file is optional in Batch Builder. By default, Batch Builder will automatically generate a minimal METS file.
The depositor can turn off the automatic METS generation option within Batch Builder and supply an externally-created METS file. To turn off automatic generation, from the Batch Builder main menu select View > Options and de-select the option "Enable METS file creation". See the METS file naming section for the rules on file names.
The minimal METS file generated by Batch Builder will describe a simple document (with a citation node and page nodes). It will provide page sequence numbers and the option to include a HOLLIS system number for the document. PDS will use the HOLLIS system number to extract some basic citation metadata (e.g., author, title) from the HOLLIS Catalog.
The minimal METS file generated by Batch Builder will NOT include:
- Physical page numbers or page labels.
- Intermediate (section) nodes or labels.
- Custom citation-level metadata settings for PDF header text, related links or show/hide settings for "Go To" and "View Text" navigation options.
These values can be added after DRS deposit using the PDS Maintenance System.
3.5 Procedure to create a page-turned batch
Follow these steps to create a page-turned object batch. The steps in Part I (Create the project and batch template) need only be done once. (A project is a template from which you can prepare multiple batches for DRS deposit.) If you have already created the project and batch template, proceed to Part II (Create a batch).
Note: to save project values at any point, from the main menu select Project > Save.
Part I: Create the project and batch template
- Decide on PDS METS file generation. Options are to allow Batch Builder to generate the PDS METS file (the default) or to create the METS file externally and copy it into the top-level batch directory before generating the batch.xml file.
To deactivate auto-generation of the METS file, from the main menu select View > Options and de-select the option "Enable METS file creation". This option needs to be set before you generate the batch.xml.
- From the Batch Builder main menu, select Project > New.
- Complete the New Project form and click OK.
![]()
- Project name: this name displays in the Batch Builder title bar when the project is open. This value can be changed later by accessing the Project properties node in the Configuration panel.
- Project directory: path and directory name in which project files are stored. To locate or create the directory, click the ellipses (...) button to browse your file system. This value cannot be changed once a project is created.
Best practice note: new users should consider including the word "project" in their project directory names to make it easier to differentiate these from the underlying batch directories.
- Batch genre: select "Page-turned Object" if you want Batch Builder to request a citation (top) level URN only. Select "Page-turned Object - IDS URN for each image" if you want Batch Builder to request a URN for each deliverable image in the batch. This value cannot be changed once the project is created.
- Project description: an internal free text note field associated with the project. This value can be changed later by accessing the Project properties node in the Configuration panel.
- Add administrative metadata. In the Configuration panel on left, select Administrative Properties. This panel records project-level administrative metadata. This metadata will be used in every batch.xml file generated by this project.
![]()
Mouse over any of the field labels to display a definition. Or, consult Administrative properties in the Metadata Reference section for more information.
An asterisk indicates required fields. Batch Builder will provide warnings in the message pane if required fields are missing, but does not validate the contents of these fields. For example, if you supply an invalid Owner Code, this value will be added to the batch.xml file and at deposit, the entire batch will be rejected by DRS.
- [Optional] Add global metadata that will be applied to every digital object in batches created by this project. In the Configuration panel, select Global Properties under the Batch Template node. Add metadata values as needed.
![]()
Consult Global properties in the Metadata Reference section for more information.
If you plan to accept the JHOVE-supplied metadata and do not need to include optional global metadata, skip this step.
- Add batch directories to the template. In the Configuration panel, right click on Batch Template and select "Add Directory". In the pop-up window, choose a directory name from the list. Directory name choices on the list are determined by the batch genre (general image or page-turned objects).
![]()
[Optional] You can modify a directory name by appending custom text after the predefined name. Type the custom text in the box to the right of the directory list.
Click OK to add the directory to the template. Repeat this step to add all directories needed for this project.
If you plan on defining file relationships in batches created by this project, the directories that you add must be nested. For example, to nest a
deliverabledirectory under anarchival_masterdirectory, create thearchival_masterdirectory first, right click on it, select Add Directory and select "deliverable". See Expressing Relationships for more information.- [Optional] Add directory-specific metadata that will be applied to every digital object in a specific batch directory. In the Configuration panel, select a directory to display its metadata properties.
![]()
Consult Directory-level properties in the Metadata Reference section for more information on this metadata.
If you plan to accept the administrative metadata inferred by Batch Builder and the technical metadata supplied by JHOVE, skip this step.
Part II: Create a batch
- Create the batch directory and subdirectories. From the main menu, select Batch > New. In the New Batch window, supply a top-level directory name.
![]()
Batch Builder has no specific requirements for name of the top-level batch directory, but the name must be no longer than 100 characters and must consist only of letters, numbers, underscores ('_') and hyphens ('-').
By default, the "Create directories from batch template" checkbox will be selected (indicating that Batch Builder will create subdirectories based on directory names defined in the batch template). De-select this option if you want to create the directories outside of Batch Builder.
- In the `Project Directory' pane, an entry for your new batch will appear, flagged with a red b.
![]()
The red b indicates that a batch.xml file has not yet been generated for this batch directory.
- Move digital object files into the batch directories. This step must be performed outside of Batch Builder.
In Batch Builder, you will be able to see these files in the "Project Directory panel if you refresh the display. From the main menu, select View > Refresh file system panel.
- Generate the batch xml file. Right click on the batch and select "Create batch.xml file" (or from the main menu, select Batch > Create batch.xml file).
You will be prompted to supply a HOLLIS ID (the HOLLIS system number of the cataloging record that describes the page turned object). PDS will use the HOLLIS system number to extract some descriptive metadata (e.g., author, title) from the HOLLIS Catalog.
![]()
Enter the number and click OK. Or press Cancel to skip this step. If you opt not to add a HOLLIS system number at deposit, you can add it after deposit using the PDS Maintenance System.
Batch Builder will start processing the batch. Status messages will display in the message window as processing proceeds. If batch generation is successful, the final message will be:
INFO - Creation of batch.xml complete for batch pagebatch-20070130aIn the `Project Directory' pane, the `b' icon next to the batch directory changes to a green b with an orange x to indicate the batch xml file has been generated.
![]()
If any required metadata is missing, or if any digital object files are invalid, Batch Builder will display an error message and the batch.xml file will not be generated. You will need to fix the errors and re-generate the batch.
- [Optional] Review the summary report for this batch.xml generation process. You can view this report by selecting the batch in the Project Directory panel and clicking the Reports node.
Note that on your file system, this report is saved in the Sync directory for the project and not in the batch directory:
[project name]\sync\[batch name]\reports\
The summary report is not deposited to DRS, so must not be placed within a batch directory.
Once the batch.xml and mets.xml have been successfully generated, you are ready to upload the batch to DRS. See Uploading batches to DRS for more information.
Top | Getting Started | Image Batches | Paged Batches | Container Batches | Uploading | Relationships | Mapping Files | URNs | Reports | Metadata | Command Line4. Creating a Container Batch
4.1 About ZIP file internal structure
4.2 ZIP file accepted formats
4.3 ZIP file naming rules
4.4 ZIP container batch directory rules
4.5 Procedure to create a container batch
4.6 Brightening container objectsA page turned-object container batch can consist of:
- A single ZIP container file
- Multiple ZIP container files
Within a container batch ZIP file is the directory structure for a page-turned batch that is ready for DRS deposit.
Currently, the purpose of creating a container batch is to deposit a page-turned object as a "dark" object (an object not accessible to users). Such dark objects can later be retrieved from DRS and "brightened" (made accessible) when needed. See the section Brightening container objects for more information.
The ZIP files in a container batch are not assigned URNs and are deposited to DRS with an <access> value of `N' (no access).
Important note: Contact OIS (drs-support@hulmail.harvard.edu) before starting a project involving container batches. OIS must define a profile for the organization and content of the container before deposits can begin.
4.1 About ZIP file internal structure
A ZIP container file holds the directory structure and files for a page-turned object batch. The format and structure of the files within a ZIP container are not prescribed. However, to make the brightening process of a container batch as easy as possible, OIS recommends that contents of the container file follow the guidelines for page-turned objects as described in Creating a Page-Turned Object Batch. This means using batch directory structure and file naming practices similar to the following example:
/deliverable/page_1.jp2
/deliverable/page_2.jp2
/deliverable/ . . .
/ocr_uncorrected/page_1.txt
/ocr_uncorrected/page_2.txt
/ocr_uncorrected/ . . .
/mets.xmlUsing Batch Builder to generate the page-turned object batch (before zipping it) is recommended, but not required. If you do use Batch Builder for this purpose, the resulting batch.xml and mets.xml files can be included in the ZIP container (or discarded). During the brightening process, these xml files will be regenerated.
4.2 ZIP file accepted formats
For page-turned objects container batches, the accepted format at present is ZIP format (a valid ZIP file with a .zip extension).
4.3 ZIP file naming rules
This topic describes specific file name requirements for ZIP container file. See About file names for general Batch Builder requirements.
The file-naming scheme for zip files in container batches is:
{ownerSuppliedName}_-_{shortDescription}.{extension}
or
{ownerSuppliedName}.{extension}
where:
{ownerSuppliedName} is the item control name that associates the digital file with its analog counterpart. This may be an accession number or other curatorially-significant name. Batch Builder will use this value for the batch.xml's ownerSuppliedName value. Valid characters to use for the {ownerSuppliedName} are letters, digits, '.', underscores ('_'), and hyphens ('-' ).
_-_ character sequence (one underscore, one hyphen, one underscore) is used to separate the {ownerSuppliedName} from the {shortDescription}, if the {shortDescription} is present. You can also define ownerSuppliedName using a mapping file. See 6.1 Owner supplied name mapping file in DRS Batch Builder User Guide for details on creating a mapping file.
4.4 ZIP container batch directory rules
These are the specific requirements for subdirectory names and structure in page-turned object container batches. See About directory names and structure for general Batch Builder requirements.
Within the top-level batch directory, the container zip file must be stored within a subdirectory named with the prefix "container". Here is an example of a Batch Builder project directory path for a single container object (zip file):
\projectDirectory\batchDirectory\container\book.zipWhen a
containersubdirectory is present under the batch directory, Batch Builder will infer that it contains ZIP files and will assign the role "container" to those ZIP files.A single "container" subdirectory can contain more than one ZIP file. And, there can be more than one subdirectory with prefix "container" in a single container batch.
Only the first portion (the prefix) of batch subdirectory names is prescribed. Batch Builder will allow the depositor to append additional information to subdirectory names as long as the entire name uses valid characters and does not exceed 100 characters in length. Batch Builder would consider the following subdirectory names to be valid:
container-1
container_20061205
containerPCCRBelow is a sample project directory structure for a page-turned object container batch. The batch directory ("batch1") contains two container subdirectories; the first ("container-series1") holds two ZIP files while the other ("container-series2") holds one ZIP file.
\projectDirectory\
batch1\container-series1\
book1.zip
book2.zip
\container-series2\
book3.zip4.5 Procedure to create a container batch
Follow these steps to create a page-turned object container batch. The steps in Part I (Create the project and batch template) need only be done once per project (or once per project workflow if a project has several different workflows). A project is a template from which you can prepare multiple batches for DRS deposit. If you have already created the project and batch template, proceed to Part II (Create a batch).
Note: to save project values at any point, from the main menu select Project > Save.
Part I: Create the project and batch template
- Decide on validation of ZIP container contents. By default, Batch Builder will validate the contents of the ZIP container and report errors related to the object files and batch files.
To deactivate ZIP container validation, from the main menu select View > Options and de-select the option "Validate contents of zip container objects". This option needs to be set before you generate the batch.xml.
- From the Batch Builder main menu, select Project > New.
- Complete the New Project form and click OK.
![]()
- Project name: this name displays in the Batch Builder title bar when the project is open. This value can be changed later by accessing the Project properties node in the Configuration panel.
- Project directory: path and directory name in which project files are stored. To locate or create the directory, click the ellipses (...) button to browse your file system. This value cannot be changed once a project is created.
- Batch genre: select `Container'. This value cannot be changed once the project is created.
- Project description: an internal free text note field associated with the project. This value can be changed later by accessing the Project properties node in the Configuration panel.
- Add administrative metadata. In the Configuration panel on left, select Administrative Properties. This panel records project-level administrative metadata. This metadata will be used in every batch.xml file generated by this project.
![]()
Mouse over any of the field labels to display a definition. Or, consult Administrative properties in the Metadata Reference section for more information.
An asterisk indicates required fields. Batch Builder will provide warnings in the message pane if required fields are missing, but will not validate the contents of these fields. For example, if you supply an invalid Owner Code, this value will be added to the batch.xml file and, at deposit, the entire batch will be rejected by DRS.
Note for container batches: Although a container batch will not be assigned a URN, Batch Builder requires that all batch genres contain values for URN authority path and URN resource name pattern in their administrative metadata.
- [Optional] Add global metadata that will be applied to every digital object in batches created by this project. In the Configuration panel, select Global Properties under the Batch Template node. Add metadata values as needed.
![]()
Consult Global properties in the Metadata Reference section for more information.
If you plan to accept the JHOVE-supplied metadata and do not need to include optional global metadata, skip this step.
- Add batch directories to the template. In the Configuration panel, right click on Batch Template and select "Add Directory". In the pop-up window, choose a directory name from the list. For the container batches the only choice will be "container".
![]()
[Optional] You can modify a directory name by appending custom text after the predefined name. Type the custom text in the box to the right of the directory list.
Click OK to add the directory to the template. Repeat this step if you need additional "container" directories for the project.
- Add directory-specific metadata that will be applied to every digital object in a specific batch directory. In the Configuration panel, select a directory to display its metadata properties.
![]()
For container directories, these metadata values are required:
- Role: container (set by default)
- Purpose: NA (set by default)
- Quality: NA (set by default)
- Access Flag: N (set by default)
- Profile: "Kress" or "Google" (set by user)
- Version: 1.0 (set by user)
Values for Profile and Version must be set by the operator for each individual "container" subdirectory defined in the project.
Part II: Create a batch
- Create the batch directory and subdirectories. From the main menu, select Batch > New. In the New Batch window, supply a top-level directory name.
![]()
Batch Builder has no specific requirements for name of the top-level batch directory, but the name must be no longer than 100 characters and must consist only of letters, numbers, underscores ('_') and hyphens ('-').
By default, the "Create directories from batch template" checkbox will be selected (indicating that Batch Builder will create subdirectories based on directory names defined in the batch template). De-select this option if you want to create the directories outside of Batch Builder.
- In the `Project Directory' pane, an entry for your new batch will appear, flagged with a red b.
![]()
The red b indicates that a batch.xml file has not yet been generated for this batch directory.
- Move digital object files into the batch directories. This step must be performed outside of Batch Builder.
In Batch Builder, you will be able to see these files in the "Project Directory" panel if you refresh the display. From the main menu, select View > Refresh file system panel.
- Generate the batch xml file. Right click on the batch and select "Create batch.xml file" (or from the main menu, select Batch > Create batch.xml file).
Batch Builder will start processing the batch. Status messages will display in the message window as processing proceeds. If batch generation is successful, the final message will be:
INFO - batch.xml written to: C:\BatchBuilder\ProjContainerTest\container-batch1\batch.xml
FINISHED - Creation of batch.xml complete for batch: container-batch1