Scalar Functions

Scalar functions return integers, strings, or new spans over the document text. They can also be combined with predicate functions to produce complex predicates.

CombineSpans

The CombineSpans function takes two spans as input and returns the shortest span that completely covers both input spans:

CombineSpans(<span1>, <span2>)

CombineSpans is sensitive to the order of its input spans. If span2 comes before span1, the result of CombineSpans is undefined. For example,

CombineSpans([5, 10], [50, 60])

will return the span [5,60], and

CombineSpans([50, 60], [5, 10])

will cause an error.

GetBegin and GetEnd

The GetBegin function takes a single span argument and returns the begin offset of the input span. For example,

    GetBegin([5, 10])

would return 5. Likewise, the GetEnd function returns the end offset of its input span.

LeftContext and RightContext

The LeftContext function takes a span and a count as input:

LeftContext(<input span>, <nchars>)

The function call LeftContext(<input span>, <nchars>) returns a new span containing the nchars characters of the document immediately to the left of <input span>. If the input span starts less than <nchars> characters from the beginning of the document, then LeftContext() will return a span that starts at the beginninng of the document and continues until the beginning of the input span. For example, LeftContext([20, 30], 10) would return the span [10, 20], and LeftContext([5, 10], 10) would return [0, 5]. If the input starts on the first character of the document, LeftContext() will return a zero-length span. Similarly, the RightContext function returns the text to the right of its input span.

LeftContextTok and RightContextTok

LeftContextTok and RightContextTok are versions of LeftContext and RightContext that take distances in terms of tokens:

LeftContextTok(<input span>, <num tokens>)
RightContextTok(<input span>, <num tokens>)

Currently, the tokenization used for these functions is the same basic whitespace tokenization used in the section called “Token Constraints” for regular expression extractions, as well as in dictionary extractions.

SpanBetween

The SpanBetween function takes two spans as input and returns the span that exactly covers the text between the two spans:

SpanBetween(<span1>, <span2>)

If there is no text between the two spans, then SpanBetween will return an empty span starting at the end of <span1>.

Like CombineSpans, SpanBetween is sensitive to the order of its inputs. So

SpanBetween([5, 10], [50, 60])

returns the span [10, 50], while

SpanBetween([50, 60], [5, 10])

returns the span [60, 60].