As the metadata intern for the AMNH Archive Project, it’s been my job to evaluate several archival content management systems (CMS), weighing them against a detailed set of functional requirements previously established by my colleagues — in short, to find a good place for our stuff. I’ve examined ArchivesSpace, AtoM, Collective Access, and xEAC, and I can say with some certainty that there is currently no open-source option for managing both EAC-CPF and EAD records that meets all of our needs. However, as these applications are all under active development, we are hopeful they will be improved over time, especially with regard to EAC support, and we will still forge ahead with the best option available to us at present.
It’s been a challenging exercise, and I’ve learned a lot in the process. It has required switching between operating systems, including OS X, Windows 7, and Ubuntu, using VirtualBox and fine-tuning configurations therein, forking GitHub repos and updating code after major releases, lots of terminal/command line work, data imports, exports, and migrations, examining log files, troubleshooting, troubleshooting, and more troubleshooting.
Below I’ll give a brief report on my experience with each application so far. Please do comment below if you’d like some more detail, or if you’d like to share your own experience using one of these platforms.
ArchivesSpace, the successor to Archivists’ Toolkit and Archon, was an obvious candidate. The museum archives already uses AT to manage some of its resources, and migrating data from AT to ASpace is a relatively straightforward process. Entities in ASpace are referred to as “agents,” and agents can be exported as EAC-CPF records. ASpace also creates agent records when external EAD and EAC files are imported. The user interface is generally good, and helpful tool tips appear when you hover over certain fields (like in AT; thankfully these persist longer than the fleeting popups I’ve encountered in some versions of AT).
However, in my tests, conducted mostly in ASpace v. 1.0.9 (running in Windows 7, against a MySQL database), EAC support proved to be far less developed than the application’s support of EAD. Data from our well-formed, by-the-book EAC-CPF records were dropped during XML imports, and EAC-CPF exports of robust agent records were missing important pieces of information, including the biographical/historical (bioghist) notes. Furthermore, while records for persons imported without errors, I consistently encountered errors importing XML records for corporations and families. CSV imports would likely be more successful, but would require significantly more work upfront to specify the mapping of elements to fields. A custom plugin along the lines of this EAD importer for the MoMA could probably solve some of our import problems, but without more substantial changes ASpace will not be able to manage our EAC data at the level we require.
AtoM, an acronym for Access to Memory, an open-source application “for standards-based archival description and access in a multilingual, multi-repository environment” (AtoM Homepage, 2014). It was originally commissioned by the International Council of Archives and follows ICA standards, including ISAAR (CPF), hence the application’s former name, ICA-AtoM. I performed tests with both AtoM 2.0 and 2.1, running in Ubuntu 14.04 LTS via VirtualBox 4.2.16.
Because we’re starting with complete EAC records, and not creating them within these applications, we have to rely on the integrity and sophistication of the app’s import function. Because most of these applications attempt to create authority records for entities during the import of EAD and EAC, we inevitably end up with multiple records for a single entity. This was certainly the case with AtoM. A feature that the museum will likely require in any such system is the ability to merge records during the import process, in order to avoid tedious data cleanup after the fact.
It seems that each application has its own particular way of handling dates. In AtoM, certain date fields do not allow you to define a “dateRange” with a “fromDate” and “toDate,” but others do. In an “Authority Record” the exist dates field is a simple text box; if you want to make it a range, you would have to insert your own punctuation to indicate a date range. On the other hand, in the Relationships area, each relationship can have a start date and end date; the system then inserts a dash between the dates on the display side. This does allow for the use of approximate dates and language like “circa,” which is not the case in applications that adhere strictly to ISO 8601/W3C-DTF, as we’ll see below with xEAC.
On the whole, it appears that AtoM handles EAC records slightly better than ASpace, however data is still lost during XML imports, and only about one third of the EAC elements we use are actually mapped to editable fields in the AtoM interface.
The AtoM UI is solid, though my experience navigating around was hampered a bit by some limitations of VirtualBox (I was having trouble with the VirtualBox display resolution, and so my AtoM screen real estate was artificially limited). AtoM, like AT and ASpace, provides descriptions/tool tips, but AtoM’s are far more detailed, and actually quote from the relevant ICA standard.
“xEAC is an open-source XForms-based application for creating and managing EAC-CPF collections,” is the description on the application’s GitHub repository page, and that was music to our ears. Developer Ethan Gruber, who uses xEAC and its companion EADitor to manage and display archival collection data for the American Numismatic Society Archives, built an application that adheres closely to the EAC-CPF schema, utilizes an XML database (eXist), and supports the serialization of EAC-CPF into RDF.
Getting xEAC up and running was a challenge for our tech team. The developer has provided good step-by-step documentation, but when putting the pieces together (Orbeon, Solr, eXist, all running on Tomcat) our team hit several roadblocks. We finally had conference call with Ethan himself, and with his help we were able to get xEAC running locally in OS X Mavericks 10.9.5.
With our trove of XML EAC records, the eXist database was a welcome change of pace — no “importing” required. We could simply upload our XML files to the DB, point our browser to xEAC, and there were our records. After some tweaks to our records, that is. Turns out xEAC, at present, really does not like non-ISO 8601/W3C-DTF (but DACS-compliant!) dates like “approximately 1946” or “1990s”; it requires that any date element contain a standardDate attribute, whereas ASpace sort of lets users have it both ways, with fields for both human-readable, possibly fuzzy dates and machine-readable dates, and AtoM seems to forgo the idea of machine-readable dates entirely.
One nice surprise with xEAC was that the public-facing URLs are derived from our local IDs, which we used not only as our XML filenames, but also in our xlink:arcrole attributes. This means that after simply importing our existing records and publishing them, the relation links between entities just work.
There’s a bit of a change log built in to the system. Before a user can update a record in xEAC, he or she must add a new maintenance history element — the system will not let the user save changes otherwise. This provides a good way for us to keep track of changes to our records.
Finally, xEAC is ahead of the curve in regard to interacting with authority files for names and places (VIAF, GeoNames, etc.) and other linked open data. I can see a scenario in which we leverage the system’s linked data capabilities in order to sync our entity records with EAD records stored in a different system. (This is how xEAC and EADitor can work together; unfortunately we were not able to get EADitor up and running to the point that we could actually test this out.)
CollectiveAccess would be an attractive option if we were dealing with digital objects, but, out of the box, it doesn’t appear to meet our needs. It does a nice job of keeping track of relationships between entities that are entered natively into the CMS, but the system does not currently offer support for the EAC-CPF schema. It also requires significant upfront investment in configuring import mappings, configuring the public site, etc.
As somewhat-early adopters of the EAC-CPF schema the museum is a few steps ahead of the leading software applications for archives. Our robust records, created via Excel macros and stored on shared network drives, are waiting for a good home where they can interact with other archival records and help link together collections housed in various divisions within the museum. That potential repository is currently being built in some form or another, and it’s not yet ready for move-in. A couple of these applications get us some of the way there, so it’s possible that we may have to get creative with customizations, foot the bill for the development of new features, or link two systems together in order to firmly establish a place for our stuff.