Why schemas and not code?
ti@lemonia.org
ti at lemonia.org
Fri Jun 30 10:52:15 BST 2006
> Quoting Tim Sweetman <ti at lemonia.org>:
>
>> Creating a simple table with "artist" as a varchar is a doddle.
>> Turning "artist" into a foreign key on an artists table is simply more
>> work, with no immediate return on investment. More work still, if you
>> want the "artist" table to automatically populate itself when a new
>> artist appears; still more if you want to be able to rename an artist
>> (say, from "Alanis Morisete" to "alanis morissette") and have the
>> change cascade, or refer to abstract numeric keys instead.
>> And before you know it, you've got tables & triggers and maybe views
>> and INSTEAD OF INSERT triggers, and INSTEAD OF UPDATE triggers, and
>> you're having to name every field explicitly every time, and change
>> three things instead of one whenever you change your table. Can you
>> explain the purpose of that to a pragmatic technical lead? "It's four
>> or five times the work, for no gain".
>
> I'm not sure I follow your argument here.
> I have an artist table and a song table. The artist table contains a
> record with an id of 1 and the name "Alanis Morisete". I also have a
> song table which contains a number of rows containing songs by Alanis,
> all of which contain an artist id of 1 which links them to the "Alanis
> Morisete" row in artists. When I select details of these songs, I link
> to the artist table to get the artist's name.
This was the case I referred to in passing as "refer to abstract numeric
keys instead". That may not be the correct term. In fact, although I'm
convinced of the value of using numeric/serial primary keys for almost
every table, I'm unclear what, if any, level of normalisation that
corresponds to.
> Now when I need to correct the spelling of "Alanis Morissette" I just
> need to update one row in the artist table and it all just works. That
> seems far easier than searching for all possible mispellings of the
> name and correcting them all individually.
This is several changes at once:
1. The song table's representation of "artist" is now the foreign key on
an artists table.
Good: Artist-specific data now has somewhere to live
Bad: Overhead of another table. Overhead of populating that other
table. Importing dirty data might still insert "Allasi Moriettes"
into the artists table, unless you ensure it can't.
2. You assume, I think, that artist records don't autovivify.
Good: No nasty mispelings sneak in, which avoids having to send for
Mrs Danvers[3].
Bad: Oh dear, we can't autoimport dirty metadata in response to
the arrival of a new CD any more.
3. The foreign key on the artists table is a numeric ID, rather than
the artist's name.
Good: Performance, DRY, only one thing to change if the artist's name
changes (in this sort of way; TAFCAP's earlier work is probably
still attributed to Prince; OMG we now have an artists_aliases table
too, not to mention some character set issues)
Bad: Retrieving the list of songs now needs a join.
Which is a nuisance if that means changing lots of SQL.
Still more a nuisance if it means mucking about in forms applications,
Access, iTunes-alikes or something.
Bad: preserving insert-and-update behaviour on a VIEW
representing this join means writing PL/SQL. PL/SQL is to DRY
as Genghis Khan is to soft furnishings[3].
Conversely, in iTunes, which (as far as I can see) does not have an
"artists" table, I could readily find any "alansis" en masse, with search;
changing them en masse is also a doddle.
> Normalisation is DRY[1] for databases. Each piece of information has
> exactly one source in the database. It just makes life easier.
The iTunes experience suggests YAGNI[4].
I admit I'm playing devil's advocate here, and arguing against
normalisation, but I could equally argue the other way. Songs should have
multiple genres (which I think is true, actually); systems should support
bands and artists having multiple names (not sure about this one), and,
for that matter, multiple artists (almost certainly true).
AFAIK nobody builds systems like that, because even with state-of-the-art
O:R mappers, it'd too rapidly become a maze of twisty little many-to-many
tables, all different.
> Dave...
> [1] http://www.artima.com/intv/dry.html
ti'
[3] These are Jasper Fforde references. Read his books, they're immensely
entertaining. Start with "The Eyre Affair".
[4] You ain't going to need it. It's a nice technical solution whose costs
outweight its useful applications.
More information about the london.pm
mailing list