Table of Contents
AQL has a collection of built-in functions for use in extraction rules.
These functions fall into three categories: Predicates, scalar
functions, and table functions. Predicates are
functions that return a Boolean value and can be used in the
where
clause of a select
statement (See the section called “The where Clause”). Scalar functions are
functions that return a value of one of AQL's non-Boolean types, such
as Span or Text; a scalar function can be used in the
select
list or as the input to a predicate.
Table functions return a set of tuples and can
only be used inside a from
clause (See the section called “The from List”).
The sections that follow list the current set of built-in functions in AQL, broken down by function type. Within each section, functions are listed in alphabetical order.
Predicate functions are the basic building block from which
where
clauses are built. The current version of AQL implements
a number of these functions, and more are planned for the future. Functions
are listed in alphabetical order.
The And
function takes a variable number of Boolean inputs and
returns the logical AND of their results.
The current version of the AQL optimizer does not attempt to optimize the
order in which the arguments of And()
are evaluated. As a
result, a query in the form
select ... from ... where And(predicate1, predicate2);
will often run considerably slower than the same query in the form
select ... from ... where predicate1 and predicate2;
When possible, use the SQL-style and
keyword instead of this
function.
The Contains
function takes two spans as arguments:
Contains(<span1>, <span2>)
This function returns TRUE if span1
completely contains
span2
; that is, if span2
starts after the beginning
of span1
and ends before the end of span1
.
The ContainsDict
function takes a dictionary (as in a the section called “Dictionaries”) and a span as arguments:
ContainsDict('<dictionary>', <span>)
ContainsDict
returns TRUE if the span contains one or more matches
of the dictionary.
The ContainsRegex
function looks for matches of a regular
expression in the text of a span:
ContainsRegex(/<regular expression>/, <span>)
The function returns TRUE if the span's text, taken as a separate Java string, contains one or more matches of the regular expression.
The Equals function takes two arguments of arbitrary type:
Equals(<arg1>, <arg2>)
The function returns TRUE if both arguments are equal. Spans are considered equal if they have the same start, end, and source text. Two strings are considered equal if they contain the same sequence of characters.
At some point in the future, the Equals function will be replaced with the more standard SQL "=" syntax.
The Follows
scalar function takes two span arguments and two
integer arguments:
Follows(<span1>, <span2>, <minchar>, <maxchar>)
The function returns TRUE if the number of characters between the end of
span1
and the begin of span2
is between
minchar
and maxchar
, inclusive.
The FollowsTok
scalar function is a version of Follows whose
distance arguments are in terms of tokens instead of characters:
FollowsTok(<span1>, <span2>, <mintok>, <maxtok>)
FollowsTok
returns TRUE if the number of tokens between the end of
span1
and the begin of span2
is between
mintok
and maxtok
, inclusive.
Currently, the tokenization used for FollowsTok
is the same
basic whitespace tokenization used in the section called “Token Constraints” for regular
expression extractions, as well as in dictionary extractions.
MatchesRegex
has a similar syntax to ContainsRegex
:
MatchesRegex(/<regular expression>/, <span>)
Unlike ContainsRegex
, the MatchesRegex
function
returns TRUE only if the span's entire text, taken as a
separate Java string, matches the regular expression.
The OnWordBoundaries
function takes a single span argument.
It returns TRUE if the span in question starts and ends either at the
beginning of the document or on a boundary between word and non-word
characters.
The Or
function takes a variable number of Boolean arguments
and returns TRUE if any of them evaluates to TRUE. This function will
eventually be replaced by SQL-style infix OR syntax.