Matching algorithms

Mon Jun 23 06:20:02 BST 2008

I'm looking at comparing sets of data for similarities, and wondering if there
are any funky algorithms that will speed things up a bit..

I want to compare @list1 with @list2.. and then with @list3 and @list4, etc.
Then I'm probably going to compare @list2 with @list3 and @list4.. You can see
where this is going.

The thing is, I only care about getting the approximately-closest-matching
sets. And a lot of these sets are going to checked vs a new set over and over.
So I'm wondering if I could take some heuristic value from the array,
and use that for comparison.. kind of like hashing, but an approximate hash?

Any thoughts?

I'll attach an example program that demonstrates what I'm doing, and uses the
slow loops-inside-loops method :(

Cheers,
Toby