[OT] xml encoding

Aaron Crane perl at aaroncrane.co.uk
Fri Jan 6 16:07:16 GMT 2006


Dirk Koopman writes:
> I am trying to coerce libxml2 into storing and printing "binary" data.

XML can't directly represent arbitrary binary data.

> Take the following small chunk of XML, which part of a much bigger and
> otherwise well formed XML document.
> 
>    <PASSWORD>rs&#16;&#30;&#25;*  &#6;</PASSWORD>

In particular: among characters with Unicode codepoints less than
U+0020, only U+0009, U+000A, and U+000D may appear in XML 1.0 documents,
either as characters, or expressed with numeric character references.

According to the standard:

  "Well-formedness constraint: Legal Character
  Characters referred to using character references MUST match the
  production for Char."
    http://www.w3.org/TR/2004/REC-xml-20040204/#wf-Legalchar

  "Character Range
  [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] |
               [#xE000-#xFFFD] | [#x10000-#x10FFFF]"
    http://www.w3.org/TR/2004/REC-xml-20040204/#NT-Char

(Given which, it can perhaps be considered a bug that libxml2 is willing
to let you put such data into text nodes.)

Your best bet is probably to use base64 for your binary data; if there's
a lot of it, consider gzipping it before base64-ing it.

-- 
Aaron Crane


More information about the london.pm mailing list