SHA question

Wed Jan 13 12:56:07 GMT 2010

On Wed, Jan 13, 2010 at 12:44:47PM +0000, Dermot wrote:

>I have a lots of PDFs that I need to catalogue and I want to ensure
>the uniqueness of each PDF.  At LWP, Jonathan Rockway mentioned
>something similar with SHA1 and binary files.  Am I right in thinking
>that the code below is only taking the SHA on the name of the file and
>if I want to ensure uniqueness of the content I need to do something
>similar but as a file blob?

Yes.

You may want to be slightly cleverer about it - taking a SHAsum is
computationally expensive, and it's only worth doing if the files have
the same size.

If you don't require a pure-Perl solution, bear in mind that all this
has been done for you in the "fdupes" program, already in Debian or at
http://netdial.caribe.net/~adrian2/programs/ .

Roger