LHotH

LHOTH Processing Pipeline

In Development on April 14, 2010 at 2:41 am

Let us get back on the blog track with a slew of new updates. Today, we’ll cover our processing pipeline for moving data in and through the LHOTH database.

Here’s a video of the process in action:

Processing Pipeline

Processing Pipeline in DropBox

It’s taken us a while to get this going, but the process works very well. We use DropBox for handling all of our files. If you don’t, you should check it out, it’s an amazing tool for collaboration, syncing your life and backing up your stuff.

Thanks to DropBox, any file in the pipeline can be viewed from the web, iPhone, iPad, or laptop, and is always in sync.

LHOTH iPhone

LHOTH on the iPhone

The Pipeline

1) Unprocessed: Any Source file that we want to add to the database goes here. It’s a staging area. Any team member can easily get to the file and see it, open it, check it out.

2) For Review: Team member figures out who needs to provide input on the source document and contacts the person for their review. For example, I (Michael) may want Ruth to review a document to make sure it’s the right version, or to agree on which fields we want to map. Nico Tripcevich may be asked to review spatial data for the purposes of linking maps to the database. After review, the database team will process the source doc to prepare it for integration in the database, and move this file to the In Process folder. If changes need to be made to the original (if DATA is changed, that is), then the original is copied to the Archive folder.

3) In Process: Actively worked on source docs. These docs are linked to the database, using Filemaker 11′s new recurring import feature.

4) Ready for Grinding: Once a source file is completely processed, it’s ready to be transformed into Event data as RDF. We fondly call this meat grinding, as in making sausage from the data. Our mapping process produces open data that can be output in a variety of formats. We use XML, but we can output reports, Excel spreadsheets, or prepare the data to be visualized in a relational database.

5) Archive: Process sources are archived, along with their original companions and the mapping instructions, assuring full empirical provenance.

Last Note on DropBox: We love the fact that DropBox automagically versions all documents, including the database, so if we ever make a mistake (ok, when we make mistakes), we can roll back files to previous versions. Sweet!

DropBox Versioning

DropBox versioning of a file in the processing pipeline

Advertisement
  1. LOVE IT! Thank you Michael for putting this in writing… More later.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.