The piece of Codd which passeth understanding

Fri, May 14, 2010

I took a good deep look at one of the datasets I’m meant to be linking up today. Actually, it’s four separate datasets, but all held within the same database. I poked around a bit, and found this:

Yes, that’s precisely one table, with 158 fields and over 17500 records. Of those 158 fields, 52 are computed fields (mostly containing logic to return “No information available” if the field value is NULL, or for doing display formatting). 5 fields are apparently common across all four datasets (one of those is “Collection” — i.e., which dataset this record belongs in). The remaining 101 fields have names prefixed with one or two letters, indicating which collection they hold data for. I haven’t yet found a unique row identifier, either for the table as a whole, or within the individual collections.

There are a few other delights in there, too, such as a field labelled “Number of copies”, of type Text. Similarly, “Year of acquisition” is a text field. Presumably, “Year of the Fruitbat” wouldn’t fit in a numerical field…

It’s not all doom and gloom, though. Each collection has a calculated field called “Search”, which tells me which fields are the important ones to search on. At least it’s all fairly well self-described, unlike some other databases I could mention.