LWP output encoding

Andy Armstrong andy at hexten.net
Wed Nov 23 14:49:22 GMT 2005

Googled. Can't figure. Can anyone update me on what the current  
semantics of HTTP::Response->decoded_content are?

Specifics: I'm parsing a bunch of RSS feeds. I have two, both of  
which claim to be encoded UTF-8. I'm generating a hash for the  
contents of the feeds like this

my $content = $res->decoded_content;
my $hash    = md5_base64($content);

md5_base64() barfs on one of the feeds with

"Wide character in subroutine entry"

but not the other. Both feeds are fine if I retrieve the content  
using $res->content instead of $res->decoded_content. The strange  
thing is that the feed that succeeds is in arabic so I'm fairly  
certain it contains some wide characters.

I assume what happens is that $res->content always returns an octet  
stream while $res->decoded_content tries to sniff the encoding and  
returns a unicode normalised result based on the content encoding.

I guess what I really need to know is what $res->decoded_content  
returns in practical situations.

Andy Armstrong, hexten.net

More information about the london.pm mailing list