SHA question
Roger Burton West
roger at firedrake.org
Wed Jan 13 12:56:07 GMT 2010
On Wed, Jan 13, 2010 at 12:44:47PM +0000, Dermot wrote:
>I have a lots of PDFs that I need to catalogue and I want to ensure
>the uniqueness of each PDF. At LWP, Jonathan Rockway mentioned
>something similar with SHA1 and binary files. Am I right in thinking
>that the code below is only taking the SHA on the name of the file and
>if I want to ensure uniqueness of the content I need to do something
>similar but as a file blob?
Yes.
You may want to be slightly cleverer about it - taking a SHAsum is
computationally expensive, and it's only worth doing if the files have
the same size.
If you don't require a pure-Perl solution, bear in mind that all this
has been done for you in the "fdupes" program, already in Debian or at
http://netdial.caribe.net/~adrian2/programs/ .
Roger
More information about the london.pm
mailing list