After a project meeting today, it was suggested that I keep a note of all of the "interesting" issues that I encounter with the various data repositories I encounter on the project. So, here's the first of them.
At least two of our repositories consist of custom-built applications. One of them is a fairly large piece of PHP, backed by a MySQL database. The PHP for this repository is complex, and only understood by one person. The database schema is also fairly cryptic, and almost entirely undocumented. To make matters worse, the web user interface implemented by the PHP is actually a GUI, developed with (I think) Google Gears. The only way of querying this data store through an alternative interface is to access the database directly1.
Taking this problem one step further, one of the other repositories is a large and complex piece of Perl, only understood by one person. The data-storage behind the UI it isn't a relational SQL database. It's not XML. It's not even RDF. It's a custom data format, involving hierarchical arbitrary key:value pairs and cross-references within the data-store. With this system, I can't even poke the data store directly, because it exposes no standardised, or even commonly-used, interfaces to its data.
In this latter case, there's the hint of a solution, in that the owner of the data is aware that they have an unmaintainable system, and wants to migrate it to something better, but there's little funding available to do this, and they don't have a great deal of expertise in commissioning such development work.
1 OK, this isn't entirely true any more: there's a very basic HTTP POST interface that can be used to run canned SQL queries. It's not particularly flexible, though, and the data serialisation mechanisms it uses are fairly broken in many places.