Scalar functions return integers, strings, or new spans over the document text. They can also be combined with predicate functions to produce complex predicates.
The CombineSpans
function takes two spans as input and returns the
shortest span that completely covers both input spans:
CombineSpans(<span1>, <span2>)
CombineSpans
is sensitive to the order of its input spans. If
span2
comes before span1
, the result of
CombineSpans
is undefined. For example,
CombineSpans([5, 10], [50, 60])
will return the span [5,60]
, and
CombineSpans([50, 60], [5, 10])
will cause an error.
The GetBegin
function takes a single span argument and returns the
begin offset of the input span. For example,
GetBegin([5, 10])
would return 5. Likewise, the GetEnd
function returns the end
offset of its input span.
The LeftContext
function takes a span and a count as input:
LeftContext(<input span>, <nchars>)
The function call LeftContext(<input span>, <nchars>) returns a new span containing the nchars characters of the document immediately to the left of <input span>. If the input span starts less than <nchars> characters from the beginning of the document, then LeftContext() will return a span that starts at the beginninng of the document and continues until the beginning of the input span. For example, LeftContext([20, 30], 10) would return the span [10, 20], and LeftContext([5, 10], 10) would return [0, 5]. If the input starts on the first character of the document, LeftContext() will return a zero-length span. Similarly, the RightContext function returns the text to the right of its input span.
LeftContextTok
and RightContextTok
are versions of
LeftContext
and RightContext
that take distances in
terms of tokens:
LeftContextTok(<input span>, <num tokens>) RightContextTok(<input span>, <num tokens>)
Currently, the tokenization used for these functions is the same basic whitespace tokenization used in the section called “Token Constraints” for regular expression extractions, as well as in dictionary extractions.
The SpanBetween
function takes two spans as input and returns the span that exactly covers the text between the two spans:
SpanBetween(<span1>, <span2>)
If there is no text between the two spans, then SpanBetween
will
return an empty span starting at the end of <span1>
.
Like CombineSpans
, SpanBetween
is sensitive to the
order of its inputs. So
SpanBetween([5, 10], [50, 60])
returns the span [10, 50]
, while
SpanBetween([50, 60], [5, 10])
returns the span [60, 60]
.