[REVIEW] Programming Collective Intelligence
Simon Wistow
simon at thegestalt.org
Thu Dec 6 08:29:39 GMT 2007
Author: Toby Segaran
ISBN: 0-596-52932-5
Publisher: O'Reilly Associates
The field of data mining is a tricky one to write about. For a start
what you're mining depends on the nature of your business and the shape
of the data - there is no one-size-fits-all technique, no off the shelf,
drag and drop solution.
Secondly some of the techniques require some pretty tricksy maths and
even if you do understand them then once they're applied you still have
to interpret the results and tweak the multitude of input variables.
Building a data mining tool - from a search engine to a collaborative
filter to a genetic algorithm - is an art as much as a science or
engineering problem.
So all that said, you should buy this book.
Reading it will help you understand why I just said all that. But it
will also give you a bunch more techniques in your mental toolbox so
that when you're looking at a problem you can think "Ooooh! I
remembering reading about some problem like that" and then you can go
pick up the book again and use it as a reference manual rather than
reading it from cover to cover.
And there's a goodly number of techniques to pick up and there's a lot
to cover - there are chapters on collaborative filtering and
recommendation systems, clustering and group discovery, search and
ranking techniques, document filtering, Bayesian classification, kernel
methods and support-vector machines, and genetic algorithms, amongst
others.
Each chapter gives an overview of the problem domain, gives an example
problem and then walks the reader through a simple solution. The
problems with the solution are then highlighted and various enhancements
are shown.
The techniques are demonstrated in Python - although they are all clear,
understandable and perfectly legible to any competent programmer,
especially a scripting language programmer. Just enough detail is
covered to give you a solid grounding without getting you bogged down.
In summary - this is well worth your 20 quid, even more so if you can
get your company to pay for it. If you're working with existing data
this may spark off an inspiration that will let you add some new
features or up your accuracy. Or if you're presented with a problem this
book may give you techniques that will help you solve it without having
to work everything out from first principles. It's well written manual
that'll handily expand your repetoire.
More information about the london.pm
mailing list