Using Template Toolkit and UTF-8

Steve Sims steve at karmamusicgroup.com
Thu Jan 19 12:58:15 GMT 2006


On 19 Jan 2006, at 11:42, Aaron Crane wrote:
> Steve Sims writes:
>> These saved files have been generated by TT as UTF-8 but those files
>> do not contain a BOM
>
> There wasn't really meant to be any such thing as a "UTF-8 BOM", and
> there are situations in which it's harmful.  (It's not clear that XML
> documents are well-formed if their first three bytes are 0xef 0xbb  
> 0xbf
> and they contain an XML declaration, for example.)  I'd be fairly
> unhappy if things went round writing a "UTF-8 BOM" to my UTF-8 files
> without my say-so.

Understood and agreed.  I wasn't suggesting that UTF-8 BOM's should  
be unconditionally put onto output, I was just after a way of putting  
it there myself, or telling TT to output it.

> If you know that your files are UTF-8, a simple option is to avoid a
> "BOM", and just tell Perl when you read (or write) the file that it's
> encoded as UTF-8:
>
>   open my $fh, '<', $filename, ':utf8'
>     or die "...";

Yup - that could indeed work.  In my case though this would have been  
difficult to use, since my INSERT statement is in a template that  
includes a WRAPPER.  I guess I might have been able to write my own  
INSERT equivalent.

> That's because "\x{ef}\x{bb}\x{bf}" is a three-character string, and
> when Perl writes that out in UTF-8 encoding, it turns into those six
> bytes.  If you really want to do this, just tell Perl to write a
> zero-width no-break space character (aka "BOM"):
>
>   print "\x{feff}";

Now that makes perfect sense.  A quick edit of my code later, and I  
now have a valid BOM on the file I'm saving, and TT is correctly  
interpreting it as being UTF-8.

Thanks a lot for the help!

Steve


More information about the london.pm mailing list