it.unipi.di.util
Class TermTokenizer
java.lang.Object
it.unipi.di.util.TermTokenizer
- All Implemented Interfaces:
- Tokenizer
- Direct Known Subclasses:
- FixedTokenizer, URLTokenizer
public class TermTokenizer
- extends Object
- implements Tokenizer
Splits a text into a list of tokens, '\n' is added.
A token is a maximal sequence of alphanumeric
characters or other symbols. White-spaces are considered part of
the previous token.
- Author:
- Claudio Corsi, Paolo Ferragina
TermTokenizer
public TermTokenizer(String file)
throws IOException
- Create a new TermTokenizer over the given file.
- Parameters:
file
- the file to split in tokens
- Throws:
IOException
toString
public String toString()
- Overrides:
toString
in class Object
next
public String next()
throws IOException
- Specified by:
next
in interface Tokenizer
- Throws:
IOException
reset
public void reset()
throws IOException
- Specified by:
reset
in interface Tokenizer
- Throws:
IOException
close
public void close()
throws IOException
- Specified by:
close
in interface Tokenizer
- Throws:
IOException