MySQL Performance Example Was: Seriously, WTF?
c.jones at ucl.ac.uk
Fri May 9 11:02:52 BST 2008
First, thanks for all the helpful comments. I won't be able to test
some of the suggestions until Monday, but at least now I'm a bit more
optimistic that this table will actually end up being 'useable'.
Tim Sweetman wrote:
> Also, your location range looks quite large: if the query is
> reading the whole table, an index will yield limited improvement,
> or none. You want an index matching your most selective WHERE clauses.
> (That's assuming your data isn't being updated in real time or
> something, ruling out the expense of maintaining indexes).
The `chr` field is one of the 24 chromosome names, (1..22, 'X', 'Y'),
and the `location` field is the position on the chromosome, anywhere
between 1 and ~250,000,000 for the biggest chromosome (1) down to
between 1 and ~50,000,000 for the smallest chromosome (Y). The table
is a fixed chunk of data for read only.
So from your comment I was surprised that adding the `location` index
on its own did seem to improve things dramatically (similar queries
now running at a few seconds, rather than minutes). Or did I
misunderstand what you meant?
Iain Barnett wrote:
>> | chr | varchar(4)
> If you want a good, fast index, then make it fixed width (not
> var...), numeric (i.e. int, not a char holding an int as a string),
> and with lots of unique values. Why is chr a varchar?
Since `chr` is a set of defined values, at some point I did have it
as an ENUM. But then I read somewhere that this is bad, so it got
changed to VARCHAR. I always assumed that VARCHAR was better than
CHAR, and that the length didn't matter because it was variable....!
So is ENUM bad? Would it be better/worse than CHAR?
Robbie Bow wrote:
> Also, do the rows get updated much after insertion (if at all)? If not
> then, dependent on your query cache size and other activity on that
> server, re-running the queries *should* run in < 1 second as the
> result could cached from your previous query.
Its unlikely any one query will ever be run twice. Most queries would
specify a `chr` and a `location`, so there will be plenty of queries
'overlapping' in the block of data they'll return, even though
they're unlikely to be exactly the same. Would the cache help in such
circumstances, or only when an identical query has been cached?
More information about the london.pm