[OT] xml encoding

Aaron Crane perl at aaroncrane.co.uk
Fri Jan 6 16:07:16 GMT 2006

Dirk Koopman writes:
> I am trying to coerce libxml2 into storing and printing "binary" data.

XML can't directly represent arbitrary binary data.

> Take the following small chunk of XML, which part of a much bigger and
> otherwise well formed XML document.
>    <PASSWORD>rs&#16;&#30;&#25;*  &#6;</PASSWORD>

In particular: among characters with Unicode codepoints less than
U+0020, only U+0009, U+000A, and U+000D may appear in XML 1.0 documents,
either as characters, or expressed with numeric character references.

According to the standard:

  "Well-formedness constraint: Legal Character
  Characters referred to using character references MUST match the
  production for Char."

  "Character Range
  [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
               [#xE000-#xFFFD] | [#x10000-#x10FFFF]"

(Given which, it can perhaps be considered a bug that libxml2 is willing
to let you put such data into text nodes.)

Your best bet is probably to use base64 for your binary data; if there's
a lot of it, consider gzipping it before base64-ing it.

Aaron Crane

More information about the london.pm mailing list