Running an Example Annotator

When the Development Environment starts up, you should see the following AQL code in the main editing area:

create view PhoneNum as
extract 
    regex /[0-9]{3}-[0-9]{4}/
        on D.text as number
from Document D;

output view PhoneNum;

These AQL statements create a simple phone number annotator. Let's take a look at the AQL line-by-line.

The first line of the AQL is:

create view PhoneNum as ...

This line creates a new AQL view called PhoneNum. A view in AQL is a logical collection of tuples that can be sent to the output of the AQL rule set or used as the input to another view. System Text will only invoke a given view if it is necessary to do so in order to generate a required output.

The next few lines of AQL define the PhoneNum view:

extract 
    regex /[0-9]{3}-[0-9]{4}/
        on D.text as number
from Document D;

This view runs a regular expression over the text field of each document. The regular expression matches phone numbers like "555-1212": strings of 3 numbers, followed by a hyphen, followed by 4 numbers. For each match of the regular expression, the view will contain a tuple with a single output column, number, of type Span.

Views are a logical construct; System Text will not evaluate the body of a view unless it is necessary to produce an output. The final line of AQL in our example declares that the contents of the PhoneNum view should be sent to the system's output:

output view PhoneNum;

To run the AQL rules, click on the "Execute" button at the bottom of the Development Environment window:

You should see a page that looks like this:

This page summarizes the results of running the AQL annotator over a collection of sample documents. In this case, the documents are drawn from the Enron email corpus (see http://www.cs.cmu.edu/~enron/).

The information displayed for each document consists of two parts. The first part contains a table of all the output tuples for the selected output type. In this case, the output tuples only contain one field, number. The values in this field are of type Span, which is one of System Text's built in types. The Span type represents a contiguous range of characters in a text field of the document. For example, the first Span in the screenshot above is Document.text[1665-1673]: '466-9196', meaning "Characters 1665-1673 of the text field of Document". This range of characters corresponds to the string '466-9196' in the original document.

The second part of the output screen shows the locations of annotations within the document. By default, only the regions of text that are close to an annotation are shown. If you want to see annotations marked over the entire text of the document, you can click on the "Full Text" tab above the list of tuples.