Non Sucking YAML parser

Robin Berjon robin.berjon at expway.fr
Thu Sep 14 14:50:52 BST 2006


On Sep 14, 2006, at 15:18, David Cantrell wrote:
> On Thu, Sep 14, 2006 at 12:21:49PM +0200, Robin Berjon wrote:
>> [XML namespaces]
>
> Perhaps I'm being dense, but what is the point of namespaces in XML?

Similar to namespaces in, say, Perl. Identification and separation.  
You can get away with not using namespaces for local cases concerning  
closed systems in the same way that you can get away with putting  
everything in main for a script the code of which is not going to be  
reused.

If however you're mixing and matching, or talking across boundaries,  
you really want to keep your kittens easily split. And the sorting  
kitten is happy to know the difference between Buffy and a pony  
called Buffy.

> If
> your code comes across a document it can't deal with cos it's - say -
> describing a hospital visit instead of a book - then it's going to  
> barf.
> Code dealing with XML must know a lot about the documents anyway,  
> so all
> that extra verbiage is pointless.  Say 'title' instead of
> <hospitalvisit:patientdetails:title>Mr</crap>* and <book:title>Winnnie
> The Pooh</waffle>.

Yes, that verbiage is indeed pointless. Daft people do pointless  
things, what can I say? Should the namespaces spec have a big red  
blinking "DON'T BE FUCKING STUPID" sign at the top? Would it help?

Presumably a title element doesn't happen all on its own. So in the  
above two cases you'd see:

<hospital-visit xmlns='http://hospital...'>
   <patient-details>
     <title>Dahut</title>
     ....
   </patient-details>
   ....
</hospital-visit>

and

<book xmlns='http://book...'>
   <title>Wild Left Dahut Pie</title>
   ....
</book>

Do you want to keep track of the books that your patients have read  
during their stay? There are two ways of doing that while still  
preserving enough information to know which title is which.

Option one, give context (and let's assume there aren't any  
namespaces, just for fun):

<hospital-visit>
   <patient-details>
     <title>Dahut</title>
     ....
     <book>
       <title>Wild Left Dahut Pie</title>
     </book>
     <book>
       <title>Ponies From Hell</title>
     </book>
   </patient-details>
   ....
</hospital-visit>

With that, if you want all book titles read by all patients, you can  
search for //book/title. Likewise, if you want to know how many Lords  
have been your patients, you can go //patient-details[title = "Lord"]  
and you won't pick up books called "Lord".

Option two, use namespaces:

<hospital-visit xmlns='http://hospital...' xmlns:b='http://book...'>
   <patient-details>
     <title>Dahut</title>
     ....
     <b:title>Wild Left Dahut Pie</b:title>
     <b:title>Ponies From Hell</b:title>
   </patient-details>
   ....
</hospital-visit>

So, which is most verbose? Also, note that now getting all the book  
titles is just //b:title.

There isn't a single day that I don't edit XML, of many different  
kinds. I do it for work, I do it for play. I always use namespaces,  
and only rarely have to resort to using prefixes. The fact that  
people do so just shows that they're clueless — cause I sure ain't  
specially smart.


> If instead your code is meant to handle generic documents and not  
> really
> understand them - perhaps it is code to traverse an arbitrary document
> element tree, or to store a document in a database - then it doesn't
> need to know about the namespaces anyway, nor can it be expected to
> usefully compare documents, so again they're pointless.

The ability to apply processing to just parts of a document, most of  
which you might not understand but parts of which you do is extremely  
useful. And it's impossible to do reliably without namespaces — the  
alternative being to use absurdly verbose element names in the hope  
that they won't clash. Dispatching to different processors based on  
the root element's namespace is also very useful.

> Frankly, I want to take everyone who has been involved in speccing any
> version of XML since the first one and SPANK THEM HARD.  All the extra
> unnecessary crap makes XML harder to process both by machines and by
> humans.

Huh? You say versions of XML but you don't sound like you're talking  
about XML 1.1, or the various editions of 1.0 and 1.1 that fixed  
bugs. There have been horrible, horrible specs that use or apply to  
XML, like XML Schema and SOAP, but they have nothing to do with XML.  
Me, I sympathise with the poor sods who have to handle them, but I  
just ignore them — as many happily. There are many fine and simple  
specs out there, Namespaces, XPath, XSLT, RelaxNG, SVG...

Why bother with the extra unnecessary crap? I don't. If you have to,  
blame yourself, or your job.

-- 
Robin Berjon
    Senior Research Scientist
    Expway, http://expway.com/





More information about the london.pm mailing list