[OT] xml encoding
Dirk Koopman
djk at tobit.co.uk
Fri Jan 6 15:40:11 GMT 2006
I am trying to coerce libxml2 into storing and printing "binary" data.
Could someone help my understanding a bit here. Take the following small
chunk of XML, which part of a much bigger and otherwise well formed XML
document.
<PASSWORD>rs* </PASSWORD>
This, very nearly, represents the data that I require.
What I am doing is to take some fields that contain binary data (a very
small percentage of the whole gamut of fields that are to be output) and
building up a libxml2 doc tree in memory. That all works just fine. The
input data is guaranteed to be UTF-8, either because it is (because I
convert the characters above 127 into UTF-8) or is converted to
character entities like  (or , tried both). On output (as
UTF-8) for this field I get:
<PASSWORD>rs^P^^^Y* ^F</PASSWORD>
But putting that into any xml parser will fail on the '^P' after
'<PASSWORD>rs'.
What understanding am I missing? Why is the above not well formed? It is
UTF-8. If necessary, how do I force characters less than 32 to be output
as c (or ™)?
Poking around in the interstices of libxml tells me that xmlNewChild()
carefully converts entities like c back into the binary equivalent.
Preventing that (by doing things more "manually" or using
xmlNewTextChild()) produces output like the first example.
My (already sparse) hair is getting rapidly thinner!
Dirk
--
Dirk Koopman <djk at tobit.co.uk>
More information about the london.pm
mailing list