it.unipi.di.tokenizer
Class FieldTermTokenizer
java.lang.Object
it.unipi.di.tokenizer.TermTokenizer
it.unipi.di.tokenizer.FieldTermTokenizer
- All Implemented Interfaces:
- Tokenizer
public class FieldTermTokenizer
- extends TermTokenizer
This class is an extension of TermTokenizer
that implements support
for field segmentation given a separator character. It works in the same way
as the TermTokenizer, except that the separator character will be considered
as a single term.
The default separator character is ' ' (blank space).
- Author:
- Claudio Corsi, Paolo Ferragina, Alessandro Barilari
Constructor Summary |
FieldTermTokenizer(String file)
Creates a TermSpecialTokenizer for a specified source file with the default field separator |
FieldTermTokenizer(String file,
char sep)
Creates a TermSpecialTokenizer for a specified source file with the selected field separator |
DEFAULT_SEPARATOR
public static final char DEFAULT_SEPARATOR
- See Also:
- Constant Field Values
FieldTermTokenizer
public FieldTermTokenizer(String file)
throws IOException
- Creates a TermSpecialTokenizer for a specified source file with the default field separator
- Parameters:
file
-
- Throws:
IOException
FieldTermTokenizer
public FieldTermTokenizer(String file,
char sep)
throws IOException
- Creates a TermSpecialTokenizer for a specified source file with the selected field separator
- Parameters:
file
- the source file to tokenizesep
- the separator to use as a field separator
- Throws:
IOException
getSeparator
public char getSeparator()
- Returns the char separator.
- Returns:
- the separator of this
Tokenizer
toString
public String toString()
- Overrides:
toString
in class TermTokenizer