Perl 5.16 vs Ruby 2.0 UTF-8 support

gvim gvimrc at gmail.com
Thu Aug 22 16:39:35 BST 2013


Can anyone who also uses Ruby enlighten me? For benchmarking purposes 
this Perl 5.16 script works fine parsing a large Maildir folder:

------------------------------------------------------------
use 5.016;
use autodie;

my $dir = 'my/mail/path';
chdir $dir;
opendir my $dh, $dir;

while (readdir $dh) {
   next unless /^\d{4}/;
   open my $fh, '<', $_;
   say "\n\n************* Opening $_ *************";
   while (<$fh>) {
     chomp;
     say if /^\w{4}\s/;
   }
   close $fh;
}
closedir $dh;

-------------------------------------------------------------

However, the equivalent Ruby 2.0 script produces at UTF-8 error after 
parsing 7 files:

---------------------------------------------------------
dir = 'my/maildir/path'
Dir.chdir(dir)

Dir.foreach(dir) do |file|
   next unless file =~ /^\d{4}/
   print "\n\n************* Opening #{file} *************\n"
   fh = File.open(file)
   while fh.gets do
     print if $_ =~ /^\w{4}\b/
   end
   fh.close
end

-------------------------------------------------------------

The problematic mail file doesn't display any non-ASCII characters when 
opened in Vim. Here's the Ruby 2.0 error message:


************* Opening 1270516984.M407293P18051.mac,S=1601,W=1645:2,Sb 
*************
Paul
./1.rb:13:in `block in <main>': invalid byte sequence in UTF-8 
(ArgumentError)
     from ./1.rb:8:in `foreach'
     from ./1.rb:8:in `<main>'


gvim


More information about the london.pm mailing list