Perl syntax highlighting
Randy J. Ray
rjray at blackperl.com
Fri Oct 26 18:30:10 BST 2012
> I built a web-based tool around Perl::Critic for my employer, and for
> syntax-coloring I used PPI::HTML. I have yet to see it get confused by
> anything, and it has had to chew through some pretty atrocious Perl code...
>
> https://metacpan.org/module/PPI::HTML
>
> I don't remember all the details off-hand, and I can't share the code
> unfortunately. But tomorrow when I'm back in my office I might look it
> over and see if I can relate a few helpful hints.
I can get away with posting this small snippet:
my $document = PPI::Document->new($sourcefile);
my $ppi_html = PPI::HTML->new(line_numbers => 1);
my @lines = split /<br>\n/, $ppi_html->html($document);
# Now strip the line-numbers PPI provided, and force-close the <span> on
# each line. This seems wasteful, but it was the only way to get PPI::HTML
# to markup the code on a line-by-line basis, as opposed to leaving spans
# open across whole blocks of lines.
for my $lineno (0 .. $#lines) {
if ($lines[$lineno] =~ s{^</span>}{}) {
$lines[$lineno - 1] .= '</span>';
}
$lines[$lineno] =~ s{<span\s+class="line_number">\s*\d+:\s+</span>}{}x;
}
Basically, I needed individual lines as opposed to just a big chunk of
HTML (each line goes into a table row, with some other information in
other <td>'s of that same row). Calling PPI::HTML->new() with
"line_numbers => 1" prevents PPI::HTML from spanning mark-up across
lines. For example, a 5-line comments would have the starting <span> on
the first line, but the closing </span> on the 5th line. I needed each
line to be marked up individually. "line_numbers" does this as a
side-effect of having to insert a <span> for each line number.
But I didn't want their line numbers :-). Plus, they close whatever
currently-open <span> there is at the start of the next line instead of
on the line they opened it. So in the for-loop, I first excise the
leading "</span>", and if there was one to excise (there isn't on the
first line, of course) I append it to the previous line. Now each line
is truly self-contained. I then delete the leading span that matches
'<span\s+class="line_number">\s*\d+:\s+</span>', which drops their line
numbering.
And I now have @lines, which are my marked-up, syntax-highlighted
(highlit?) lines.
Of course, if you don't need individual lines, you can skip all of the
extra work:
my $document = PPI::Document->new($sourcefile);
my $ppi_html = PPI::HTML->new();
my $html = $ppi_html->html($document);
And you can decide for yourself if you need line numbers or not...
Randy
--
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Randy J. Ray Sunnyvale, CA http://www.rjray.org
rjray at blackperl.com
Silicon Valley Scale Modelers: http://www.svsm.org
More information about the london.pm
mailing list