Using Template Toolkit and UTF-8
Steve Sims
steve at karmamusicgroup.com
Thu Jan 19 12:58:15 GMT 2006
On 19 Jan 2006, at 11:42, Aaron Crane wrote:
> Steve Sims writes:
>> These saved files have been generated by TT as UTF-8 but those files
>> do not contain a BOM
>
> There wasn't really meant to be any such thing as a "UTF-8 BOM", and
> there are situations in which it's harmful. (It's not clear that XML
> documents are well-formed if their first three bytes are 0xef 0xbb
> 0xbf
> and they contain an XML declaration, for example.) I'd be fairly
> unhappy if things went round writing a "UTF-8 BOM" to my UTF-8 files
> without my say-so.
Understood and agreed. I wasn't suggesting that UTF-8 BOM's should
be unconditionally put onto output, I was just after a way of putting
it there myself, or telling TT to output it.
> If you know that your files are UTF-8, a simple option is to avoid a
> "BOM", and just tell Perl when you read (or write) the file that it's
> encoded as UTF-8:
>
> open my $fh, '<', $filename, ':utf8'
> or die "...";
Yup - that could indeed work. In my case though this would have been
difficult to use, since my INSERT statement is in a template that
includes a WRAPPER. I guess I might have been able to write my own
INSERT equivalent.
> That's because "\x{ef}\x{bb}\x{bf}" is a three-character string, and
> when Perl writes that out in UTF-8 encoding, it turns into those six
> bytes. If you really want to do this, just tell Perl to write a
> zero-width no-break space character (aka "BOM"):
>
> print "\x{feff}";
Now that makes perfect sense. A quick edit of my code later, and I
now have a valid BOM on the file I'm saving, and TT is correctly
interpreting it as being UTF-8.
Thanks a lot for the help!
Steve
More information about the london.pm
mailing list