Sharding, and all that

Richard Huxton dev at archonet.com
Thu Dec 18 12:43:35 GMT 2008


Andy Wardley wrote:
> Mark Fowler wrote:
>> What's the collective group think on these?
> 
> There's a good series of articles on sharding starting here:
> 
> http://lifescaler.com/2008/04/database-sharding-unraveled-part-i/
> 
> The conclusion I drew from it was that functional partitioning
> (where possible) was much easier to implement than horizontal partitioning.

Hmm - skimming these articles, I'm not hugely impressed. The chap(ess?)
behind them is clearly a developer rather than a DBA.

http://lifescaler.com/2008/06/database-sharding-unraveled-part-iii/

"I’ll give you a practical example. It involves a forum with about 150k
registered users, 600k posts and about 10k unique visitors/day (peaks
reaching 50k unique visitors/day). The DB server is a 8 dual-core XEON
processors with 8gb/ram. The whole DB is about 1GB in size and the
server is MySQL.
When the number of simultaneous DB connections reaches a critical level
(it’s variable), MySQL will freeze. That’s not all, as the sessions and
posts tables will often get corrupted.
The first solution would be to put the tables in memory, so as not to
stress the HDD. This actually turns out to be the worst idea possible.
So what to do?"

Well, how about (1) connection-pooling and (2) running memcached using
some of the 6GB of RAM you have free. Even if that only reduces your
queries by 90% (and you'd expect better than that for a forum) you're
not going to be seeing more than 16 simultaneous queries - which is the
number of cores you have. While we're on the topic you've probably got
too many cores and not enough disks (and I'm basing that solely on the
fact that a DB server has been described *without* mentioning disks at all).

Oh, and if you're routinely finding your tables are getting corrupted
either (a) replace your hardware or (b) replace your database.

"Split the database into smaller, specialized db’s. Use a DB for users,
one for messanging functionalities, one for product orders, etc. Each of
these databases must exist independently, that is, the splitting must be
made so that I could, for example, create a new functionality
(implicitly a new database), let’s say discussion board, and just take
the users DB as a whole and use it, without affecting anything else."

Brilliant! So now your messages don't necessarily have a valid user
associated with them. That's OK though, because you're checking that in
your application aren't you? Everywhere? Including any admin scripts?
And manual interactions with the database? And you can prove that you've
not missed any checking?

This sort of stuff is very easy to get wrong. It's very easy to make the
sort of mistake that isn't noticed for months when it's impossible to
correct corrupt data. Now I know the likes of ebay and skype do it, but
you can bet they wouldn't unless they had to. These articles seem to put
it on a par with trying out a new theme in firefox.

-- 
  Richard Huxton
  Archonet Ltd



More information about the london.pm mailing list