Data Infrastructure

A primary goal of our CLIR Hidden Collections (and Hidden Connections) projects was to create a data infrastructure to manage and link archival collection records and authority records so they will be maintained and updated on an ongoing basis. Based on extensive analyses of our needs and systems functionalities in early 2015, the Library chose a suite of systems to manage records about archive collections and their creators. As a result, ArchivesSpace and xEAC will be installed this fall (2015). Plans to connect the two systems to a triplestore to publish linked data are forthcoming. More details can be found further down this page under “Data Management”.

This work in the Library has extended to address a need for a centralized repository for the museum’s digital assets. AMNH hosted an NDSR resident, Vicky Steeves, who surveyed the digital assets, held by the Museum’s science departments. Her results corroborated those of an earlier report recommending an institution-wide trusted digital repository. At this year’s Hydra Connect event, Barbara Mathé presented a landscape for a trusted digital repository for the Museum. Based on Nick Krabbenhoeft’s expanded OAIS functional diagram, the landscape builds on the data infrastructure that started with the Library’s holdings.

Below are pdfs of the Hydra Connect poster and handout.


In addition to creating finding aids for archival collections in the Science Departments, the AMNH Library is producing authority records for scientific expeditions and the people who traveled on them. Encoded Archival Context for Corporate Bodies, Personal Names, and Families (EAC-CPF) is a metadata standard that describes the creators of archival materials and leverages relationships to entities (as well as resources and functions) to build meaningful links shared among them.

Entity data is recorded in a worksheet template in Excel. Nick Krabbenhoeft, who started on this project as an intern, wrote a script (macros) to convert the data into XML directly from the spread sheets. Our goal was to push the descriptive data into a system-independent format (XML) without the challenges of coding by hand. We are now strongly considering xEAC, an open-source web-based application for creating and managing EAC-CPF collections. In the team’s initial assessment of systems that support EAC, we found that xEAC has the most robust platform and are very likely to use it in the future. The following are zipped files containing macros-enabled Excel documents (.xlsm).

UPDATE (July 2015): The AMNH Library will install xEAC and go through a test period this fall.

Unique Identifiers and Filenames
This project has produced 2,466+ personal name entity records and 703 expedition entity records, to date. Some of these entities have been described in rich detail; the remaining records have minimal level description. Assigning unique identifiers to this large group of records was no small task. The library created an alphanumeric sequence that represents the museum, type of entity, numeric code for the entity category, and a six-digit numeric value assigned sequentially in our master spreadsheets. Roy Chapman Andrews’s unique ID is: amnhp_1000042. This is also the filename of his EAC-CPF xml file. To see more about our filenaming structure, see the document below.

Data Management

Recognizing that data management is as important as data creation, the project team sought to define the management and technical needs for an Archives Content Management System. We realized that the best way to compare systems was to discuss what we need our ideal system to do. Together with the Library’s technical staff (Jennifer Cwiok and Susan Lynch), the project team organized ongoing meetings to discuss the requirements needed to ensure the future management of the Library’s finding aids and entity records. From these meetings, we developed a specific set of Functional Requirements (currently in draft form) for the system we envision.

Some databases being considered are KE EMu (which is used by most of our science departments to manage their specimen collection records), ArchivesSpace, and ICA-AtoM.

UPDATE (July 2015): The library chose ArchivesSpace for its content management system. It will work alongside xEAC, linking to entity records.

Beginning July 2014, the project team hired Bill Levay as the Metadata Intern. Working with Iris through the rest of year, Bill tested and evaluated ArchivesSpace, AtoM, and xEAC in depth. We also looked at CollectiveAccess but it was clear from the onset that it would not meet most of our functional requirements. Below are the results of our analysis.

Minimal-Level Cataloging

As the word “process” suggests, workflows and procedures may evolve and change over time. So in the spirit of process, here is an important update to our flowchart for creating minimal-level EAD records and getting them into Archivists’ Toolkit.

After some unsatisfactory results from importing MARC records into Archivists’ Toolkit, we decided to take a different approach by converting MARCXML files into EAD using MarcEdit.  Importing EAD-encoded catalog records into AT produced much cleaner records that required less manual clean up.

Download or view a PDF of Version 2 shown above.

Below is the original flowchart with the steps described. Note that Stage 4B is no longer viable in our process.

Download or view a PDF of the chart above.

Data Gathering
Click on links below to view documents.

Preparing the data for online publishing
The following document includes spreadsheet conversion for MARC, importing and editing records in MarcEdit, batch importing to OCLC Connexion, and publishing MARC records to the worldwide web.

Finding aids

Repurposing spreadsheet data

Leave a Reply