I did a thing about 10 years ago using HTML::TreeBuilder to remove elements and attributes which aren't on a whitelist.

In keeping with the spirit of the list, this isn't directly a perl question
but it might be part of the solution.

I'm picking up HTML from another site, and that HTML is pretty crappy.

Is there any way of quarantining it so it doesn't bugger up the rest of the 

