LWP output encoding
andy at hexten.net
Wed Nov 23 14:49:22 GMT 2005
Googled. Can't figure. Can anyone update me on what the current
semantics of HTTP::Response->decoded_content are?
Specifics: I'm parsing a bunch of RSS feeds. I have two, both of
which claim to be encoded UTF-8. I'm generating a hash for the
contents of the feeds like this
my $content = $res->decoded_content;
my $hash = md5_base64($content);
md5_base64() barfs on one of the feeds with
"Wide character in subroutine entry"
but not the other. Both feeds are fine if I retrieve the content
using $res->content instead of $res->decoded_content. The strange
thing is that the feed that succeeds is in arabic so I'm fairly
certain it contains some wide characters.
I assume what happens is that $res->content always returns an octet
stream while $res->decoded_content tries to sniff the encoding and
returns a unicode normalised result based on the content encoding.
I guess what I really need to know is what $res->decoded_content
returns in practical situations.
Andy Armstrong, hexten.net
More information about the london.pm