[OT] xml encoding

Dirk Koopman djk at tobit.co.uk
Fri Jan 6 15:40:11 GMT 2006

I am trying to coerce libxml2 into storing and printing "binary" data.
Could someone help my understanding a bit here. Take the following small
chunk of XML, which part of a much bigger and otherwise well formed XML

   <PASSWORD>rs&#16;&#30;&#25;*  &#6;</PASSWORD>

This, very nearly, represents the data that I require. 

What I am doing is to take some fields that contain binary data (a very
small percentage of the whole gamut of fields that are to be output) and
building up a libxml2 doc tree in memory. That all works just fine. The
input data is guaranteed to be UTF-8, either because it is (because I
convert the characters above 127 into UTF-8) or is converted to
character entities like &#16; (or &#x10, tried both). On output (as
UTF-8) for this field I get:


But putting that into any xml parser will fail on the '^P' after

What understanding am I missing? Why is the above not well formed? It is
UTF-8. If necessary, how do I force characters less than 32 to be output
as &#99; (or &#x99;)? 

Poking around in the interstices of libxml tells me that xmlNewChild()
carefully converts entities like &#99; back into the binary equivalent.
Preventing that (by doing things more "manually" or using
xmlNewTextChild()) produces output like the first example.

My (already sparse) hair is getting rapidly thinner! 


Dirk Koopman <djk at tobit.co.uk>

More information about the london.pm mailing list