When the Development Environment starts up, you should see the following AQL code in the main editing area:
create view PhoneNum as extract regex /[0-9]{3}-[0-9]{4}/ on D.text as number from Document D; output view PhoneNum;
These AQL statements create a simple phone number annotator. Let's take a look at the AQL line-by-line.
The first line of the AQL is:
create view PhoneNum as ...
This line creates a new AQL view called
PhoneNum
. A view in AQL is a logical collection of tuples
that can be sent to the output of the AQL rule set or used as the input
to another view. System Text will only invoke a given view if it is
necessary to do so in order to generate a required output.
The next few lines of AQL define the PhoneNum
view:
extract regex /[0-9]{3}-[0-9]{4}/ on D.text as number from Document D;
This view runs a regular expression over the text
field of
each document. The regular expression matches phone numbers like
"555-1212": strings of 3 numbers, followed by a hyphen, followed by 4
numbers. For each match of the regular expression, the view will contain a
tuple with a single output column, number
, of type
Span
.
Views are a logical construct; System Text will not
evaluate the body of a view unless it is necessary to produce an output.
The final line of AQL in our example declares that the contents of the
PhoneNum
view should be sent to the system's output:
output view PhoneNum;
To run the AQL rules, click on the "Execute" button at the bottom of the Development Environment window:
You should see a page that looks like this:
This page summarizes the results of running the AQL annotator over a collection of sample documents. In this case, the documents are drawn from the Enron email corpus (see http://www.cs.cmu.edu/~enron/).
The information displayed for each document consists of two parts. The
first part contains a table of all the output tuples for the selected
output type. In this case, the output tuples only contain one field,
number
. The values in this field are of type
Span
, which is one of System Text's built in types. The Span
type represents a contiguous range of characters in a text field of the
document. For example, the first Span in the screenshot above is
Document.text[1665-1673]: '466-9196'
, meaning "Characters
1665-1673 of the text
field of Document
". This
range of characters corresponds to the string '466-9196' in the original
document.
The second part of the output screen shows the locations of annotations within the document. By default, only the regions of text that are close to an annotation are shown. If you want to see annotations marked over the entire text of the document, you can click on the "Full Text" tab above the list of tuples.