Using Template Toolkit and UTF-8

Aaron Crane perl at aaroncrane.co.uk
Thu Jan 19 11:42:05 GMT 2006


Steve Sims writes:
> These saved files have been generated by TT as UTF-8 but those files  
> do not contain a BOM

There wasn't really meant to be any such thing as a "UTF-8 BOM", and
there are situations in which it's harmful.  (It's not clear that XML
documents are well-formed if their first three bytes are 0xef 0xbb 0xbf
and they contain an XML declaration, for example.)  I'd be fairly
unhappy if things went round writing a "UTF-8 BOM" to my UTF-8 files
without my say-so.

If you know that your files are UTF-8, a simple option is to avoid a
"BOM", and just tell Perl when you read (or write) the file that it's
encoded as UTF-8:

  open my $fh, '<', $filename, ':utf8'
    or die "...";

> [% PERL %]print "\x{ef}\x{bb}\x{bf}";[% END %]
> I end up getting these characters (in  hex) at the beginning of my  
> saved files though:
> <C3><AF><C2><BB><C2><BF>

That's because "\x{ef}\x{bb}\x{bf}" is a three-character string, and
when Perl writes that out in UTF-8 encoding, it turns into those six
bytes.  If you really want to do this, just tell Perl to write a
zero-width no-break space character (aka "BOM"):

  print "\x{feff}";

-- 
Aaron Crane


More information about the london.pm mailing list