Why Perl needs a VM

Wed Sep 5 13:16:18 BST 2007

On 05/09/07, Ismail, Rafiq (IT) <Rafiq.Ismail at morganstanley.com> wrote:
> <!-- outlook.. Apologies for toppost -->

outlook is no excuse ;)

http://home.in.tum.de/~jain/software/outlook-quotefix/

> For large feeds, depending on the structure and semantics of your
> documents, one approach you may want to consider is a combination of
> SAX/DOM parsing, where at runtime you DOM parse a 'reconstructed
> subtree' (at sax time) of your main document.  This would contstrain the
> depth of your xpaths and the overall size of the DOM tree.  For large
> trees, which require DOM parsing, it can be quite performant.  You could
> also potentially parallelise your processing of a single document, via
> this approach..
>
> Depending on your requirements, you could also hold some state between
> subtree parses and inject nodes into the reconstructed tree - where
> there is a dependency on some previously parsed artifact..
>
> The likes of XML::Twig might also be worth looking at, but I don't know
> much about the underlying implementation/performance.

I'd be curious to know how much difference that made.

Certainly it could be worth appraising the way that the XML is being
processed - an inefficient algorithm at the top level will make a much
bigger difference than the speed of a lower level library.

A.

-- 
http://www.aarontrevena.co.uk
LAMP System Integration, Development and Hosting