You are both a unique and precious snowflake

I've just got my hands on another arts and humanities data set. This one's smaller than most of the others I've been looking at, and it's been put together in an MS Access application. Fortunately, the owners are aware that that's not a maintainable approach, and want a method of publishing it on the Web. Also, rather nicely, they've been aware of a number of data issues, such as regularisation of text fields: they've partially normalised the data, and effectively have a good ontology for their data.

Sadly, it's not all rosy:

Wall of Fire

One of the major problems with building a distributed system is that it's distributed. This means that the parts of the system need to talk to each other. Of course, these days, networks are viewed by most large network operators (e.g. universities) as hostile environments, where anything even remotely risky is split out, preferably into its own little subnet.

The piece of Codd which passeth understanding

I took a good deep look at one of the datasets I'm meant to be linking up today. Actually, it's four separate datasets, but all held within the same database. I poked around a bit, and found this:

dev8D, days one and two

I'm currently at this year's dev8D.

Repository Issues: Real Soon Now™

When we started this project many moons ago, we started with 10 identified repositories that we wanted to work with. Of those, two were new systems, being planned or put in. And therein lies the rub… It's hard to write and test code against something that doesn't exist yet (or which is partly set-up and has little data in it). It's even harder to do when the configuration changes under you as they modify their testbed.

Repository issues: The Custom Application

After a project meeting today, it was suggested that I keep a note of all of the "interesting" issues that I encounter with the various data repositories I encounter on the project. So, here's the first of them.

At least two of our repositories consist of custom-built applications. One of them is a fairly large piece of PHP, backed by a MySQL database. The PHP for this repository is complex, and only understood by one person. The database schema is also fairly cryptic, and almost entirely undocumented. To make matters worse, the web user interface implemented by the PHP is actually a GUI, developed with (I think) Google Gears. The only way of querying this data store through an alternative interface is to access the database directly1.

Drupal and blog filtering again

After some struggling over the last couple of days to sort out tag-based blog aggregator filtering in Drupal, here's how I did it, with the extra patches and sub-modules I needed to make it work.

Drup, drup, drup...

... the sound of blog filtering through the percolator.

I've been putting together a website for my current work project, in Drupal, and wanted to aggregate items from many blogs, filtered by keyword on the item's tags. Now, Drupal's default Aggregator module doesn't do this. The News Page module seems to offer the feature, but I couldn't get it to display any blog posts, which was rather a shame. I eventually wound up with the FeedAPI modules.

Playing with Drupal again

I'm playing with Drupal, to see how I can aggregate RSS news items with particular tag keywords from several sources into a single place. It seems that the Drupal Aggregator module doesn't do that, but that the News Item plugin can be used to make it work.

This post is more of a test of that mechanism, than actually saying anything interesting. :)