Programming Collective Intelligence

Author: Toby Segaran



Publisher: O'Reilly Associates

Reviewed by: Simon Wistow

The field of data mining is a tricky one to write about. For a start what you're mining depends on the nature of your business and the shape of the data - there is no one-size-fits-all technique, no off the shelf, drag and drop solution.

Secondly some of the techniques require some pretty tricksy maths and even if you do understand them then once they're applied you still have to interpret the results and tweak the multitude of input variables. Building a data mining tool - from a search engine to a collaborative filter to a genetic algorithm - is an art as much as a science or engineering problem.

So all that said, you should buy this book.

Reading it will help you understand why I just said all that. But it will also give you a bunch more techniques in your mental toolbox so that when you're looking at a problem you can think "Ooooh! I remembering reading about some problem like that" and then you can go pick up the book again and use it as a reference manual rather than reading it from cover to cover.

And there's a goodly number of techniques to pick up and there's a lot to cover - there are chapters on collaborative filtering and recommendation systems, clustering and group discovery, search and ranking techniques, document filtering, Bayesian classification, kernel methods and support-vector machines, and genetic algorithms, amongst others.

Each chapter gives an overview of the problem domain, gives an example problem and then walks the reader through a simple solution. The problems with the solution are then highlighted and various enhancements are shown.

The techniques are demonstrated in Python - although they are all clear, understandable and perfectly legible to any competent programmer, especially a scripting language programmer. Just enough detail is covered to give you a solid grounding without getting you bogged down.

In summary - this is well worth your 20 quid, even more so if you can get your company to pay for it. If you're working with existing data this may spark off an inspiration that will let you add some new features or up your accuracy. Or if you're presented with a problem this book may give you techniques that will help you solve it without having to work everything out from first principles. It's well written manual that'll handily expand your repetoire.