[OT] xml encoding
Aaron Crane
perl at aaroncrane.co.uk
Fri Jan 6 16:07:16 GMT 2006
Dirk Koopman writes:
> I am trying to coerce libxml2 into storing and printing "binary" data.
XML can't directly represent arbitrary binary data.
> Take the following small chunk of XML, which part of a much bigger and
> otherwise well formed XML document.
>
> <PASSWORD>rs* </PASSWORD>
In particular: among characters with Unicode codepoints less than
U+0020, only U+0009, U+000A, and U+000D may appear in XML 1.0 documents,
either as characters, or expressed with numeric character references.
According to the standard:
"Well-formedness constraint: Legal Character
Characters referred to using character references MUST match the
production for Char."
http://www.w3.org/TR/2004/REC-xml-20040204/#wf-Legalchar
"Character Range
[2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
[#xE000-#xFFFD] | [#x10000-#x10FFFF]"
http://www.w3.org/TR/2004/REC-xml-20040204/#NT-Char
(Given which, it can perhaps be considered a bug that libxml2 is willing
to let you put such data into text nodes.)
Your best bet is probably to use base64 for your binary data; if there's
a lot of it, consider gzipping it before base64-ing it.
--
Aaron Crane
More information about the london.pm
mailing list