it.unipi.di.tokenizer
Class FieldTermTokenizer

java.lang.Object
  extended by it.unipi.di.tokenizer.TermTokenizer
      extended by it.unipi.di.tokenizer.FieldTermTokenizer
All Implemented Interfaces:
Tokenizer

public class FieldTermTokenizer
extends TermTokenizer

This class is an extension of TermTokenizer that implements support for field segmentation given a separator character. It works in the same way as the TermTokenizer, except that the separator character will be considered as a single term.

The default separator character is ' ' (blank space).

Author:
Claudio Corsi, Paolo Ferragina, Alessandro Barilari

Field Summary
static char DEFAULT_SEPARATOR
           
 
Constructor Summary
FieldTermTokenizer(String file)
          Creates a TermSpecialTokenizer for a specified source file with the default field separator
FieldTermTokenizer(String file, char sep)
          Creates a TermSpecialTokenizer for a specified source file with the selected field separator
 
Method Summary
 char getSeparator()
          Returns the char separator.
 String toString()
           
 
Methods inherited from class it.unipi.di.tokenizer.TermTokenizer
close, next, reset
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_SEPARATOR

public static final char DEFAULT_SEPARATOR
See Also:
Constant Field Values
Constructor Detail

FieldTermTokenizer

public FieldTermTokenizer(String file)
                   throws IOException
Creates a TermSpecialTokenizer for a specified source file with the default field separator

Parameters:
file -
Throws:
IOException

FieldTermTokenizer

public FieldTermTokenizer(String file,
                          char sep)
                   throws IOException
Creates a TermSpecialTokenizer for a specified source file with the selected field separator

Parameters:
file - the source file to tokenize
sep - the separator to use as a field separator
Throws:
IOException
Method Detail

getSeparator

public char getSeparator()
Returns the char separator.

Returns:
the separator of this Tokenizer

toString

public String toString()
Overrides:
toString in class TermTokenizer