Using Template Toolkit and UTF-8

Steve Sims steve at karmamusicgroup.com
Thu Jan 19 10:43:52 GMT 2006


Hi Andy,

Thanks for the reply.

On 19 Jan 2006, at 07:57, Andy Wardley wrote:
> Steve Sims wrote:
>> The problem I have is that the processed templates that I cache to
>> files aren't being treated as UTF-8 by Template Toolkit.  They don't
>> get created with a UTF-8 BOM.  As a result I get pages where the
>> stuff created on the fly by Template Toolkit is all nicely UTF-8, but
>> the cached bits are garbage. :-(
>
> I'm not sure what you mean when you talk about cached templates.
> If you mean templates compiled to disk by TT (e.g. by setting the
> COMPILE_DIR / COMPILE_EXT options) then they should work fine with
> UTF8 (and do, to the best of my knowledge).

I struggled to explain the problem and maybe wasn't entirely clear.   
Sorry about that.

The "cached" bits I'm talking about above aren't cached templates,  
they're output from TT - fully processed templates.  This output gets  
saved to disc so that I can use this stuff for web page generation.   
These saved files have been generated by TT as UTF-8 but those files  
do not contain a BOM - I can see this by loading the files into a  
text editor.

I do use COMPILE_DIR / COMPILE_EXT options, and these do indeed seem  
to work just fine with UTF-8.

My actual site page generation is also done through TT.  In my page  
template I use an INSERT to include the saved sections of the pages.   
Because the file that is getting INSERTed doesn't contain a BOM it  
seems that TT munges that file, treating the UTF-8 byte sequences as  
if they were latin-1 characters.

If I manually add in a BOM to the saved file then the INSERT works  
properly.

The solution to my problem therefore seems to be to write out a BOM  
to my saved files.  When I try to do that (using a print statement in  
a PERL block of my template) the BOM gets mangled, ending up double  
length - maybe I'm not doing the print right.  This is the code I'm  
using:
[% PERL %]print "\x{ef}\x{bb}\x{bf}";[% END %]
I end up getting these characters (in  hex) at the beginning of my  
saved files though:
<C3><AF><C2><BB><C2><BF>

> Or is this some additional caching step that you're implementing
> yourself?

Yup - see above.

> The TT mailing list might be a better place to ask.  The UTF8 issue  
> has
> come up a few times.
>
> http://template-toolkit.org/pipermail/templates/2005-July/007532.html
> http://template-toolkit.org/pipermail/templates/2004-June/006270.html

Yeah - I'd found one of these discussions before, and have now read  
through them both.  I'm not sure either really sheds much light onto  
my problem.

I'll go subscribe to the list though and re-post there to see if  
anybody has any bright ideas.

Cheers,

Steve


More information about the london.pm mailing list