file grouping answer.

michael michael at galton.ucl.ac.uk
Wed Mar 29 13:51:32 BST 2006



I was going to ask:-

I've got a load of files I want to efficently sort into groups of 4.2Gb(1)
for burning to DVD.  How do I go about doing this?

I did some digging and found the answer and thought I'd save time by
posting the answer as well.

The hard part was finding out what the problem was called I guessed it was
going to be some varient on the 'Travelling salesman problem' (2) which
ment that any any home rolled solution would be to say the least sub
optimal....

I started out by Googling on algorithm group, algorithm sort and perl
algorithm but came up with little of use, finally I tried just algorithm
which took me to the wikipedia Algorithm page (3).  I flicked through it
and finally noticed a reference to Approximation algorithms (4).  The
Travelling salesman is NP-Complete (5), however Nicholas Clark YAPC::Braga
talk 'When Perl is not quite fast enough' made the point that even if a
problem is NP-complete there are faster approximation algorithms which
will get a nearly optimal solution (6).  I was in luck it mentioned the
'bin packing problem' (7)

Search CPAN on bin packing got me almost straight to:-

Algorithm::BinPack and via BinPack to Algorithm::Bucketizer (which called
in it bin-packing and was hence missed in future I think I may just google
site:http://search.cpan.org)

(1) when a disk starts to fail IME it often starts from the outer edge,
    not filling it provides a buffer zone
(2) <http://en.wikipedia.org/wiki/Travelling_salesman>
(3) <http://en.wikipedia.org/wiki/Algorithm>
(4) <http://en.wikipedia.org/wiki/Approximation_algorithms>
(5) ie the time to get an exact answer  increases exponentaily with the
    number of variables
(6) if your wavering about going to YAPC::Birmingham this is proof it will
    save you time! <http://www.ccl4.org/~nick/P/Fast_Enough/>
(7) <http://en.wikipedia.org/wiki/Bin_packing_problem>


--
Michael

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Michael John Lush PhD			Tel:44-20-7679-5027
Nomenclature Bioinformatician		Fax:44-20-7387-3496
HUGO Gene Nomenclature Committee	Email:  nome at galton.ucl.ac.uk
The Galton Laboratory
University College London, UK
URL: http://www.gene.ucl.ac.uk/nomenclature/
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


More information about the london.pm mailing list