Using Template Toolkit and UTF-8
Aaron Crane
perl at aaroncrane.co.uk
Thu Jan 19 11:42:05 GMT 2006
Steve Sims writes:
> These saved files have been generated by TT as UTF-8 but those files
> do not contain a BOM
There wasn't really meant to be any such thing as a "UTF-8 BOM", and
there are situations in which it's harmful. (It's not clear that XML
documents are well-formed if their first three bytes are 0xef 0xbb 0xbf
and they contain an XML declaration, for example.) I'd be fairly
unhappy if things went round writing a "UTF-8 BOM" to my UTF-8 files
without my say-so.
If you know that your files are UTF-8, a simple option is to avoid a
"BOM", and just tell Perl when you read (or write) the file that it's
encoded as UTF-8:
open my $fh, '<', $filename, ':utf8'
or die "...";
> [% PERL %]print "\x{ef}\x{bb}\x{bf}";[% END %]
> I end up getting these characters (in hex) at the beginning of my
> saved files though:
> <C3><AF><C2><BB><C2><BF>
That's because "\x{ef}\x{bb}\x{bf}" is a three-character string, and
when Perl writes that out in UTF-8 encoding, it turns into those six
bytes. If you really want to do this, just tell Perl to write a
zero-width no-break space character (aka "BOM"):
print "\x{feff}";
--
Aaron Crane
More information about the london.pm
mailing list