Perl syntax highlighting

Randy J. Ray rjray at blackperl.com
Fri Oct 26 18:30:10 BST 2012


> I built a web-based tool around Perl::Critic for my employer, and for
> syntax-coloring I used PPI::HTML. I have yet to see it get confused by
> anything, and it has had to chew through some pretty atrocious Perl code...
>
> https://metacpan.org/module/PPI::HTML
>
> I don't remember all the details off-hand, and I can't share the code
> unfortunately. But tomorrow when I'm back in my office I might look it
> over and see if I can relate a few helpful hints.

I can get away with posting this small snippet:

my $document = PPI::Document->new($sourcefile);
my $ppi_html = PPI::HTML->new(line_numbers => 1);
my @lines = split /<br>\n/, $ppi_html->html($document);
# Now strip the line-numbers PPI provided, and force-close the <span> on
# each line. This seems wasteful, but it was the only way to get PPI::HTML
# to markup the code on a line-by-line basis, as opposed to leaving spans
# open across whole blocks of lines.
for my $lineno (0 .. $#lines) {
     if ($lines[$lineno] =~ s{^</span>}{}) {
         $lines[$lineno - 1] .= '</span>';
     }
     $lines[$lineno] =~ s{<span\s+class="line_number">\s*\d+:\s+</span>}{}x;
}

Basically, I needed individual lines as opposed to just a big chunk of 
HTML (each line goes into a table row, with some other information in 
other <td>'s of that same row). Calling PPI::HTML->new() with 
"line_numbers => 1" prevents PPI::HTML from spanning mark-up across 
lines. For example, a 5-line comments would have the starting <span> on 
the first line, but the closing </span> on the 5th line. I needed each 
line to be marked up individually. "line_numbers" does this as a 
side-effect of having to insert a <span> for each line number.

But I didn't want their line numbers :-). Plus, they close whatever 
currently-open <span> there is at the start of the next line instead of 
on the line they opened it. So in the for-loop, I first excise the 
leading "</span>", and if there was one to excise (there isn't on the 
first line, of course) I append it to the previous line. Now each line 
is truly self-contained. I then delete the leading span that matches 
'<span\s+class="line_number">\s*\d+:\s+</span>', which drops their line 
numbering.

And I now have @lines, which are my marked-up, syntax-highlighted 
(highlit?) lines.

Of course, if you don't need individual lines, you can skip all of the 
extra work:

my $document = PPI::Document->new($sourcefile);
my $ppi_html = PPI::HTML->new();
my $html = $ppi_html->html($document);

And you can decide for yourself if you need line numbers or not...

Randy
-- 
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Randy J. Ray      Sunnyvale, CA      http://www.rjray.org 
rjray at blackperl.com

Silicon Valley Scale Modelers: http://www.svsm.org


More information about the london.pm mailing list