ActiveState::Scineplex - Perl extension to access Scineplex code lexer.
use ActiveState::Scineplex qw(Annotate); $color_info = Annotate($code, $lang, %options);
Scineplex is a C library for heuristic parsing of source code in
various languages. Scineplex is based on the Scintilla sources. The
ActiveState::Scineplex
module provide a Perl interface to this library.
Currently this module implements an interface consisting of one function, Annotate, which returns a scineplex-driven colorization for one or more lines of source code. It either returns a string giving the colorization or throws an exception.
$color_info = Annotate($code, $lang, %options);
The $code is one or more lines of source-code to be analyzed passed as a single string. The lines are separated by any newline sequence.
The $lang argument can be one of 'perl', 'python', 'ruby', 'vbscript', or 'xslt'. The default is 'perl'.
Additional %options can be passed as key/value pairs. The following options are supported (defaults in parentheses):
outputFormat => 'html' | 'json' | 'line' | 'classic' ('line') parsingStartState => number (0) DumpSource => 0 | 1 (0) DumpEndState => 0 | 1 (0) DumpFoldLevels => 0 | 1 (0) StopAfterDataSectionLine1 => 0 | 1 (0)
The outputFormat
is the most important option. In classic
mode,
Annotate
echos back each character on the start of a line, followed
by separating white-space and its style value:
$res = Annotate('$abc = 3;', 'perl', outputFormat => 'classic'); print $res;
$ 12 a 12 b 12 c 12 chr(32) 0 = 10 chr(32) 0 3 4 ; 10 chr(10) 0
Symbolic names for the numeric style values can be looked up in the %SCE_TOKEN hash (exportable). For example $SCE_TOKEN{perl}{12} is the string "SCE_PL_SCALAR".
Setting outputFormat
to line
gives a terser output, and
represents each numeric style with the character corresponding to the
style added to the ASCII value of character '0':
$res = Annotate('$abc = 3;', 'perl', outputFormat => 'line'); print $res;
<<<<0:04:
Setting outputFormat
to html
returns an HTML-encoded string
containing the original code wrapped in span
tags with generic
classes with names like "variable", "operator", etc. This kind of
output is designed to be wrapped in pre
tags, and styled with a CSS
file of that contains rules like
pre span.comments { color: 0x696969; font-style: italic; }
Default text is not placed in a span tag.
Setting outputFormat
to json
returns a JSON array of arrays.
Each one of the inner arrays contains a generic style label together
with the span in positions; [$tag, $line, $col, $len]. The returned
JSON array will also be valid Perl code and can be converted to a Perl
array using Perl's builtin eval
function.
Example:
$res = Annotate('$abc = 3;', 'perl', outputFormat => 'json'); print $res; $array = eval $res;
[ ["variable",1,0,4], ["operator",1,5,1], ["number",1,7,1], ["operator",1,8,1] ]
The parsingStartState
setting should be used only when you know
that the code starts with a given style, such as lines 3-5 of a
multi-line string.
The DumpSource
flag is used only with line
output. It is
intended mostly for human consumption, and produces output like the
following:
$res = Annotate('$abc = 3;', 'perl', DumpSource=>1); print $res;
$abc = 3; <<<<0:04:
The DumpEndState
is used only in line
mode, and gives the styles
for whichever characters constitute the line-end sequence:
$res = Annotate(qq($abc = 3;\r\n), 'perl', DumpSource=>1, DumpEndState=>1); print $res;
$abc = 3; <<<<0:04:00
The DumpFoldLevels
is used only in line
mode, and gives the fold
levels as a 4-hex-digit sequence in a leading column.
$res = Annotate(qq(if(1) {\n$abc = 3;\n}\n), 'perl', DumpSource=>1, DumpEndState=>1); print $res;
2400 if(1) { 55:4:0: 0401 $abc = 3 <<<<0:04 0401 } :
The StopAfterDataSectionLine1
is used only for Perl code in line
mode.
Info on scintilla available at http://www.scintilla.org.
Copyright (C) 2005 by ActiveState Software Inc.