SHA question

Thu Jan 14 12:22:48 GMT 2010

On 13 Jan 2010, at 17:53, David Cantrell wrote:
[...]
> Other hashing algorithms exist and are faster but more prone to
> inadvertant collisions.  If you've got a lot of data to compare, I'd
> use one of them (eg one of the variations on a CRC) and then only
> bring out the big SHA guns when that finds a collision.  

That's a premature optimisation which just complicates the code, unless you mean *a lot* such as in the rdiff algorithm.

For de-duping purposes, SHA is still faster than you can pull the files off the disk and a secondary cheaper hash is unnecessary.