Currently viewing the tag: "linked data"

As I start to wrap up my project here at the AMNH, I was reflecting on the many hats historians have to wear. A big challenge for me has been the departure from a university-style approach to history, which favors critical and interpretive analysis of known ‘facts’ (by ‘facts’ I mean the whos, wheres and whens that have already been somewhat established). Basically, I’ve never been asked to use primary sources to find out when the First World War began. I have, however, been asked to explain why it began using Three Main Arguments in Essay Form.

This project has been interesting for me because I have been mostly concerned with defining the aforementioned ‘facts’ rather than attempting to explain them, which feels like rather a more dangerous task. A wrong date can easily be taken up without question and used again, whereas an analysis is hopefully more self-evidently personal and hypothetical. I definitely do not want to confuse some poor Department of Entomology researcher in the future by saying that Lowie was on an expedition in Nebraska in the fall of 1911 if he actually wasn’t. There is no single authoritative source to check, and though my findings have often corrected earlier lists I am aware that the same will probably apply to my own work when future researchers continue the task.

The other challenge has been in playing a very small role in a very large project. Not only does my subject matter of expeditions extend in all directions, intersecting with broader Museum projects and policies, anthropological movements and individual personalities (hey, there’s a reason we’re working toward linked data!). Its methodology is also a living creature. It is being continually defined, refined, trialed, found wanting, adapted and tried again. My Excel document will be useful for some purposes and less ideal for others. Its nomenclature isn’t completely standardized, so that will need to happen before all of its sorting capacities are fully realized. Our decision to arrange data primarily by explorer name, while also providing context via historical notes, seems appropriate for the nature of North American expeditions between 1900-1920; however this is not necessarily the case for expeditions elsewhere or later or earlier.

So, as a means of accounting for these things, my final few blogs will be an attempt to define exactly what it is I have done and what I haven’t done, what I have focused on and what I have left out. I’ll try to describe the parameters of my investigation and point out the places where they may be somewhat fluid, and hopefully this will become something of a manual-to-the-beautiful-madness for the next person who takes up the task!

Finally, re: the beautiful madness, I’m definitely not looking forward to leaving behind this kind of material (on archeologist N. C. Nelson):

‘When beset by outlaws in Mongolia, he brandished his glass eye at the brigands, who quickly fled.’ (Mike Peed, ‘The Pictures: Digging’, The New Yorker, June 9 and 16, 2008.)

As the aforementioned Excel document becomes somewhat usable, I have begun to format my research a second way by writing biographical and historical notes. Hopefully, these will provide context for all the whos, wheres and whens of the expeditions, providing further opportunity to link disparate trips and offering researchers a starting point for their investigations.

It wasn’t long before some of the questions that popped up during our Excel discussions reared their heads again. Should there be an historical note solely for Robert Lowie’s visit to the Crow Indians in 1907, or should it also include the follow-up visits he made during summers in 1910-1914? Perhaps it should expand to include all the fieldworkers who were simultaneously assigned to investigate Northern Plains social organization at this time; after all, they were all given the same research briefing. Or if we choose to stick with Lowie, should we include the side trip he made to three other cultural groups on the same trip? And what about their follow-up visits?

As an historian, I am inclined toward a solution that provides a ‘big picture’ context – a vantage point from which a variety of disparate expeditions and researchers can be understood. The results of Lowie’s work were published as part of a compilation exploring themes across several Northern Plains cultures, and the Department of Anthropology’s research rationale was one of comparison: an analysis of cultural similarities between cultural groups with distinct languages.

At the same time, Lowie also published his results in a book solely about the Crow. I think the best solution is one of balance – there is a need for this bird’s eye view to zoom in, to distinguish components and to identify individual roles if it is to be of value for researchers. I view the Excel data as performing this role – providing specific names, dates and locations that will each link back to this broader historical context, enabling expeditions to be viewed as discrete components even as connections are made.

Barbara and I sat down last week to talk about how to collate the masses of data I’ve accumulated in my fact-finding expedition so far, which is—to put it mildly—all over the place. A Museum Journal article will state that RH Lowie visited a particular area in a particular year, an Annual Report will name the tribe visited, a photograph will suggest the presence of an assistant and, if I’m lucky, a helpful Anthropological Papers publication will state the rationale, broader project and funding source for the trip. On a really good day I stumble across some precise dates. All this is noted, alongside similarly random snippets for other Museum staff, in Word documents that are becoming increasingly unmanageable.

So: we decided to create an Excel document, using something that resembles Smithsonian EAC standards and will hopefully become useful in terms of the eventual transition to linked data: names, dates, locations, cultures studied, historical and biographical notes. A great idea – until I began to enter my findings. The biggest issue we have faced so far is how to group and sort data. ‘Expedition name’ becomes problematic as an identifier when multiple minor, unnamed trips form small parts of broader, named projects that extend over decades. ‘Staff name’ is problematic for the opposite reason – it makes no connection between multiple people working on the same broader project. Add to this the fact that many Museum staff spent a couple weeks in several different places for different projects on the same trip, and you have some severe categorization issues.

After some experimentation, we decided to list trips under the name of the expedition leader, which in many cases was the sole individual on the trip. The use of Excel means that the data can be sorted by other fields to find connections between trips based on time, place and broader project. A lot of information is still missing or unconfirmed, and what we have is extremely varied in its level of detail (one trip will be listed as ‘Summer 1914’, another simply as ‘1914’, another as ’12 August – 16 October 1914’) but it’s a start!

I’m excited to have flown from sunny Sydney to New York (sunny upon my arrival but sadly stormy as time moved on) to be part of this project for the next six weeks. As an historian with an emphasis on photography and 2oth century culture, the potential of linked data to cross disciplinary boundaries is incredibly exciting. Not only will it create new ways to find information, it also allows context to be built up around diverse objects in a variety of locations. It means that a given expedition photograph and its metadata will no longer languish in an obscure corner of the internet. Instead, it can be linked to a story, with a date, a location, and actors who might have written about their experience, not to mention the other photographs they took and the objects they collected.

My job is to work on the context part of this project, producing historical and biographical notes for expeditions and their participants. The Museum’s southwest expeditions exist in the archives as a tangle of personalities, itineraries and discoveries. In the next few weeks I’ll continue a long process of unravelling the stories and making connections with photographs in the archives, and hopefully bring these valuable sources one step closer to becoming linked data.

I’ll be keeping track of my method and findings on here, so stay tuned!

 

Susan Lynch, a librarian here at AMNH, will occasionally contribute posts to the blog. With her extensive programming experience, Susan has been instrumental in translating the technical infrastructure of linked open data (LOD) for the rest of us. She regularly contributes to our discussions of future LOD implementation to enhance our records.

There’s a lot of buzz in the library world about linked open data or LOD and the closely related concepts of the semantic web and Web 3.0.  Looking for a good introduction to LOD, the Semantic Web and Web 3.0?  I recommend watching a 2009 TED talk given by Tim Berners-Lee.  See below:

The staff at the AMNH Library and our peer institutions are scrambling to understand these concepts and to apply them to the data and information which we hold.  Fortunately for us, New York City provides many opportunities to learn about these ideas and to discuss them with others.  The New York Public Library (NYPL), New York University (NYU) and the Metropolitan New York Library Resource Council (METRO) worked together to organize an event called LOD-LAM-NYC 2012 which was held on February 23, 2012 at NYPL.  Disappointed that you missed it?  You’re in luck.  Much of the program was recorded and the recordings are available online at http://metro.org/articles/recapping-lodlamnyc/.

The Council for Library and Information Resources (CLIR), one of our funders for this project, co-sponsored a workshop on linked data with Stanford University Libraries last year and the meeting summary and technical report are available here:

http://www.clir.org/pubs/abstract/reports/pub152

The Digital Library Foundation, a program of CLIR has a site devoted to LOD:

http://www.diglib.org/community/groups/linkeddata/

Getting down to the nitty-gritty, there is also a web site administered by Tom Heath on behalf of the Linked Data community with links to many of the tools available for implementing linked data in a semantic web environment.

http://linkeddata.org/tools

We are beginning to explore how we will manage the records and finding aids that we have developed during this project so that the data collected for the AMNH archives will link with other data on the web.

More to come…