file check
Christopher Jones
c.jones at ucl.ac.uk
Tue Jul 15 17:40:10 BST 2008
On 15 Jul 2008, at 16:34, Frank v Waveren wrote:
> On Tue, Jul 15, 2008 at 04:05:24PM +0100, Christopher Jones wrote:
>> Anyone got any better ideas for checking the integrity of my new
>> database?
>
> Maatkit contains a number of very useful MySQL tools, including
> mk-table-checksum, compares tables. You'll have to load both your
> dumps into MySQL though.
>
> http://maatkit.sourceforge.net/doc/mk-table-checksum.html
>
> F.
That looks like a neat tool - definitely worth having! But looking at
the Doc it seems it will probably fail for the same reason cmp failed
- presumably (as someone else pointed out) subtle differences in the
dump layout due to optimisations like varchar -> char.
I figure Michael is right to point out this is a job for Perl.... and
this is a Perl forum after all!
Assuming I don't have the time to have Perl cross-check every single
value (any guesses on how long it would take Perl to compare ~1
billion values from MySQL?), how do these approaches sound for a
quicker comparison;
1) For each table, return each field separately, ordered by the
primary key.
2) For int and double fields, calculate some statistical functions
(like the mean and stdev) of the field, and compare those values
between the databases. (It would be an incredible coincidence if
30,000 int values gave the same results if the databases were not the
same).
3) join() the returned values and compare them. If comparing huge
strings is an issue, then compare checksums of the strings.
4) As Michael suggested, directly compare some random values as a
double check.
If anyone has any other ideas how to extend this to make it a bit
better, I'd be interested.
Chris.
Gynaecological Cancer Research Laboratories,
UCL EGA Institute for Women's Health,
University College London,
Paul O'Gorman Building,
72 Huntley Street,
London WC1E 6DD
United Kingdom
Telephone: 020 3108 2007
More information about the london.pm
mailing list