Sharding, and all that

Fri Dec 19 08:42:16 GMT 2008

Richard Huxton wrote:
> Hmm - skimming these articles, I'm not hugely impressed. The chap(ess?)
> behind them is clearly a developer rather than a DBA.

You're right.  I perhaps should have quantified that better as a good
*introduction* to the subject.  It was a bit hand-wavy on the detail,
but it got me thinking about the subject (speaking as more of a developer
than a DBA).

He said:
> "Split the database into smaller, specialized db’s. Use a DB for users,
> one for messanging functionalities, one for product orders, etc. Each of
> these databases must exist independently, 

You said:
> Brilliant! So now your messages don't necessarily have a valid user
> associated with them.

It's a shame he didn't go into the details of how you were supposed to
make the "must exist independently" bit happen.  Key management is clearly
going to be a critical part of this.

My take on this is that you would have to (re-)build your application and
underlying DB to be distributed from the start.  Your user DB/app would
provide an authentication service, perhaps something similar to OpenID, that
your message board would use to authenticate users.

The problem you mention is that the message.user foreign key then goes out
the window.

One approach would be to have a local users table in the messaging DB which
maps internal user IDs to some kind of external user identifier.  That could
be their email address (in the simplest case), a URL that identifies the user
at your authentication app (e.g. http://auth.your-app.com/joe90) or perhaps
a GUUID.

Either way, you're explicitly decoupling the internal requirement that a
message must have a "valid" user id (i.e. a record in the users table) from
the assumption that you actually know anything valuable about that user that
doesn't relate explicitly to the message board.

You can then maintain referential integrity in your message DB regardless of
what happens in the user DB.  It would be possible to delete a user from the
user system, for example, without having any *internal* effect on the message
board DB (no more cascading delete woes).

Of course, there is still an *external* effect, namely a "broken link" from
the user in your message board to the one in your auth app.  For example,
your message board may have messages written by user 42 who was identified at
the time as joe90 at example.com, but there's no guarantee that the user still
has login access, or even exists.  You'll have to go and ask the user
application for further information on that *and* accept the implicit
assumption that you may get back the SaaS equivalent of a blank stare.

I appreciate that "decoupling" is a fancy way of saying "broken", but I'm
beginning to see it as a feature rather than a liability in this context.

I should point out that I haven't implemented anything like this yet so I
could be way off course. But I'm about to implement something like it for a
$work project so any pointers on this would be welcome if I'm sailing in the
wrong direction.

> This sort of stuff is very easy to get wrong. 

Agreed.  I don't think there's any easy substitute for proper data design.
Scalability doesn't work as an afterthought.

A