MySQL Performance Example Was: Seriously, WTF?

Fri May 9 11:02:52 BST 2008

First, thanks for all the helpful comments.  I won't be able to test  
some of the suggestions until Monday, but at least now I'm a bit more  
optimistic that this table will actually end up being 'useable'.

Tim Sweetman wrote:

> Also, your location range looks quite large: if the query is  
> reading the whole table, an index will yield limited improvement,  
> or none. You want an index matching your most selective WHERE clauses.
>
> (That's assuming your data isn't being updated in real time or  
> something, ruling out the expense of maintaining indexes).

The `chr` field is one of the 24 chromosome names, (1..22, 'X', 'Y'),  
and the `location` field is the position on the chromosome, anywhere  
between 1 and ~250,000,000 for the biggest chromosome (1) down to  
between 1 and ~50,000,000 for the smallest chromosome (Y). The table  
is a fixed chunk of data for read only.

So from your comment I was surprised that adding the `location` index  
on its own did seem to improve things dramatically (similar queries  
now running at a few seconds, rather than minutes). Or did I  
misunderstand what you meant?

Iain Barnett wrote:

>> | chr      | varchar(4)
>>
>
> If you want a good, fast index, then make it fixed width (not  
> var...), numeric (i.e. int, not a char holding an int as a string),  
> and with lots of unique values. Why is chr a varchar?

Since `chr` is a set of defined values, at some point I did have it  
as an ENUM. But then I read somewhere that this is bad, so it got  
changed to VARCHAR. I always assumed that VARCHAR was better than  
CHAR, and that the length didn't matter because it was variable....!  
So is ENUM bad? Would it be better/worse than CHAR?

Robbie Bow wrote:

> Also, do the rows get updated much after insertion (if at all)? If not
> then, dependent on your query cache size and other activity on that
> server, re-running the queries *should* run in < 1 second as the
> result could cached from your previous query.

Its unlikely any one query will ever be run twice. Most queries would  
specify a `chr` and a `location`, so there will be plenty of queries  
'overlapping' in the block of data they'll return, even though  
they're unlikely to be exactly the same. Would the cache help in such  
circumstances, or only when an identical query has been cached?

Cheers,

Chris.