Today, I rolled up my sleeves (ok, it’s a metaphor, I am in shorts and a tshirt) and dug deep into the 1999 media and their metadata. A special shoutout to the original ‘metador’ Scott Calhoun, who worked (slaved) so hard to get the media mediated. Excellent work!
I am attempting to hand knit data from some 3000 images or so taken in 1999. Like most metadata projects of old, there’s a lot of munging that needs to get done to make this great resource of cool images useful to anyone. Making progress, though. Here’s a couple things I figured out in my manual natural language processing escapade.
About 64% of the 1999 images have captions. There are 4483 images, and I’ve been able to revive metadata for 2851 of them so far. This is not to say all is lost. A lot of images are people, parties, a couple trips here and there. Plus, we shot find (artifacts) by including the label and may not have ‘slated’ the image in the log. Well, this at least is my working theory.
This all said, I am going to use 1999 as a base model for the semantic linking we want to achieve between people, things, and media in places. To do this, I am working out how to grind up the data into these lovely packages. I feel compelled to do this old school style, the algorithm is my brain, eyes and hands.
Here’s an example of the fun. In 1999, we used a field called context, which later became Area, such as the BACH area, or the Dig House. Doing some quick analysis, here’s the list of variations of contexts that needs to be cleaned up:

1999 Contexts
The variations will be nice to grind over for ‘Did you mean: Dig House?’ later, but for now, I’m working to rationalize the terms to a master list of places across the site and off. My favorites here are asterisk *, some offsite mudmaking, and EFES (the place, not the beer).
