Sharding, and all that

James Laver james.laver at gmail.com
Fri Dec 19 10:45:55 GMT 2008


On Fri, Dec 19, 2008 at 9:52 AM, Richard Huxton <dev at archonet.com> wrote:
>
> Yep - that's what "sharding" is all about - separate disconnected silos
> of data. You know, like the ones that were all the rage in the 60s that
> drove people to invent RDBMS. The good thing is, if your application is
> successful and is still in use a couple of years from now you get to
> either spend all your time fire-fighting or re-implementing integrity
> constraints.
>
> Then you're not maintaining referential integrity. There's no point in
> having a user-id that doesn't *mean* anything. Primary keys, foreign
> keys and all the other bits and pieces of RI in a SQL database are there
> to maintain the *meaning* of your data.
>

It depends on what you're using a database for.

I had a long, drawn-out discussion with my boss a couple of weeks ago
regarding the purpose of a database. I argued it was there to store
data reliably (eg. maintaining integrity, cascading deletes or
ensuring you can't delete something being referenced in another
relation etc.), he argued it was to improve the performance of the
application that was using it (I'm talking in the context of web-apps
since it's what pays the bills).

At my previous job we used a MySQL database. I chose the InnoDB engine
because I wanted to make sure the site data was going to stay as
unbroken as MySQL can enforce. The important thing was actual foreign
key checks. The site workflow naturally suited this since it was
updated overnight into these tables with real referential integrity
(MyISAM will not enforce foreign keys). After the synchronisation, you
can then regenerate cache tables which don't have to be perfect,
they're there to be faster.

On the other hand, it's more work to generate cache tables.

Really it depends what you want. I fall strongly on the side of good
data and then put in place other measures if it's not fast enough.
Other people are purely concerned with performance, Just wanting to
use database indexes to make things fast and easy. As a result you're
often implementing data security in your code which I don't feel is
the right place for it.

Cheers,
--James


More information about the london.pm mailing list