Chapter 6. Built-In Functions

Table of Contents

Predicate Functions
Scalar Functions
Table Functions

AQL has a collection of built-in functions for use in extraction rules. These functions fall into three categories: Predicates, scalar functions, and table functions. Predicates are functions that return a Boolean value and can be used in the where clause of a select statement (See the section called “The where Clause”). Scalar functions are functions that return a value of one of AQL's non-Boolean types, such as Span or Text; a scalar function can be used in the select list or as the input to a predicate. Table functions return a set of tuples and can only be used inside a from clause (See the section called “The from List”).

The sections that follow list the current set of built-in functions in AQL, broken down by function type. Within each section, functions are listed in alphabetical order.

Predicate Functions

Predicate functions are the basic building block from which where clauses are built. The current version of AQL implements a number of these functions, and more are planned for the future. Functions are listed in alphabetical order.

And

The And function takes a variable number of Boolean inputs and returns the logical AND of their results.

Order of Evaluation

The current version of the AQL optimizer does not attempt to optimize the order in which the arguments of And() are evaluated. As a result, a query in the form

select ...
from ...
where And(predicate1, predicate2);

will often run considerably slower than the same query in the form

select ...
from ...
where predicate1 and predicate2;

When possible, use the SQL-style and keyword instead of this function.

Contains

The Contains function takes two spans as arguments:

Contains(<span1>, <span2>)

This function returns TRUE if span1 completely contains span2; that is, if span2 starts after the beginning of span1 and ends before the end of span1.

ContainsDict

The ContainsDict function takes a dictionary (as in a the section called “Dictionaries”) and a span as arguments:

ContainsDict('<dictionary>', <span>)

ContainsDict returns TRUE if the span contains one or more matches of the dictionary.

ContainsRegex

The ContainsRegex function looks for matches of a regular expression in the text of a span:

ContainsRegex(/<regular expression>/, <span>)

The function returns TRUE if the span's text, taken as a separate Java string, contains one or more matches of the regular expression.

Equals

The Equals function takes two arguments of arbitrary type:

Equals(<arg1>, <arg2>)

The function returns TRUE if both arguments are equal. Spans are considered equal if they have the same start, end, and source text. Two strings are considered equal if they contain the same sequence of characters.

Note

At some point in the future, the Equals function will be replaced with the more standard SQL "=" syntax.

Follows

The Follows scalar function takes two span arguments and two integer arguments:

Follows(<span1>, <span2>, <minchar>, <maxchar>)

The function returns TRUE if the number of characters between the end of span1 and the begin of span2 is between minchar and maxchar, inclusive.

FollowsTok

The FollowsTok scalar function is a version of Follows whose distance arguments are in terms of tokens instead of characters:

FollowsTok(<span1>, <span2>, <mintok>, <maxtok>)

FollowsTok returns TRUE if the number of tokens between the end of span1 and the begin of span2 is between mintok and maxtok, inclusive.

Currently, the tokenization used for FollowsTok is the same basic whitespace tokenization used in the section called “Token Constraints” for regular expression extractions, as well as in dictionary extractions.

MatchesRegex

MatchesRegex has a similar syntax to ContainsRegex:

MatchesRegex(/<regular expression>/, <span>)

Unlike ContainsRegex, the MatchesRegex function returns TRUE only if the span's entire text, taken as a separate Java string, matches the regular expression.

Not

The Not function takes a single Boolean argument and returns its complement.

OnWordBoundaries

The OnWordBoundaries function takes a single span argument. It returns TRUE if the span in question starts and ends either at the beginning of the document or on a boundary between word and non-word characters.

Or

The Or function takes a variable number of Boolean arguments and returns TRUE if any of them evaluates to TRUE. This function will eventually be replaced by SQL-style infix OR syntax.

Overlaps

The Overlaps scalar function takes two span arguments:

Overlaps(<span1>, <span2>)

The function returns TRUE if the two input spans overlap in the document text.