One year has passed since we began planning for and creating descriptive records for the AMNH expeditions and the people who participated in them. The project team set out to encode the descriptions of the entities in the EAC-CPF metadata schema. EAC-CPF stands for Encoded Archival Context – Corporate bodies, Persons, and Families and was endorsed by the Society of American Archivists (SAA) as a technical standard in 2011. The schema itself was envisioned in 2001 when a group of archivists got together to create a high-level model and draft a strategy for implementation and testing. Now, in 2014, there are few tools for archivists to create and manage EAC-CPF records. We have a plan to dive into these tools, but more on that later… This lack of resources could prohibit a lot of CPF adoption in the archival community in the current landscape. But we can’t let that stand in our way. We have a grant-funded mission, and deliverables to produce, and …Excel.
Our go-to program from capturing general inventories to cataloging minimal-level records, Excel has facilitated data gathering with little overhead. In fact, prior to this project, the Research Library already had spread sheets of about 2,000 names recorded into two separate Excel files: one for personal names and the other for expeditions. These descriptions are based on the Special Collections vertical files, a heavily-used resource for our library visitors. The spread sheets capture bare-bones information that can be found in the vertical files, such as dates of existence, summaries generalized into one or two sentences, affiliations with the AMNH. As EAC-CPF began entering our daily conversations at the Library, we realized that the spread sheet headers could be mapped to CPF elements. Hence, the vision of this current effort!
Excel laid the groundwork for these so-called “skinny” CPF records. It made sense to us to keep building on it as a tool for richer descriptions of people and expeditions. Barbara advocated for a modular system of descriptions: rich CPF descriptions for creators of archival materials on the one hand, and EAD finding aids detailing the contents of archival collections on the other. As the team discussed the various ways to create rich entity records, our road of least resistance always led back to Excel.
During our project planning last year, the tools available to create EAC-CPF records were limited. ArchivesSpace was in beta-testing and the other CPF projects in our awareness were using a custom FileMaker database or coding in xml. ICA-AtoM had implemented CPF into their system, but the open-source software was, at that point, new to us and we were hesitant to adopt it when a direct migration from Archivists’ Toolkit to ArchivesSpace was being developed. Corey Harper at NYU suggested we look at xEAC, an X-Forms based application for EAC-CPF records, but we argued for a system that could support both EAD and EAC-CPF (though we are now strongly considering implementing it for CPF creation and management). It was clear that there was no clear solution for this fledgling schema. However, as a result of weighing the pros and cons of the programs out there, we decided the next step would be to define our needs for an archival content management system. I’ll be writing more on this later. A draft of our functional requirements can be read here in the meantime.
Back to Excel, back to the basics, back to where it started, back to simple tables with headers, a field and a value, a place to put data until a more elegant solution is reached. After all, what is the point of all this data management without the actual content? We evolved our “skinny” master spread sheets into a single worksheet for a single entity that could then be pushed into a traditional table layout to support multiple records.
Our original “skinny” headers transposed to stand upright, we could open up the cells to carry more descriptive information – paragraphs for the Biographical and Historical notes! We thought that this new view could greatly enhance data gathering without losing the utility of mapping field headings to CPF metadata tags. “ID” for <recordId>; “Expedition Name” for <nameEntry/part>; “Purpose” for <biogHist>; etc. However, the hierarchical architecture of the schema (also operative in EAD) pushed our descriptions out of the flat box into new tables held in separate sheets of the Excel file. See a table for <cpfRelation> below.
One of the (dare I say) sexy aspects of EAC-CPF, is the relations element. Recording associated names and their roles creates a virtual network of entities. An expedition record is enhanced by its member components, which each have their own networks of family, colleagues, expeditions and institutions. From a focused center, the netting expands. Not only that, but attaching a URI to that name opens up that network to other connections in the Linked Open Data (LOD) landscape. The Smithsonian generated CPF records for their scientific expeditions; no doubt our institutions hosted the same scientists and artists. A virtual link can be drawn seamlessly, if the metadata supports it. While there are a lot of unknowns and possibilities in the linked data landscape, we grounded ourselves in knowing that the best thing to do is to prepare our data for the LOD environment. So our spread sheet grew legs – in the form of Timeline and Relationship tables.
Suddenly our basic workform was getting complicated and we had to rethink our approach. Our deliverables specified EAC-CPF records, not multi-level Excel files.