it.unipi.di.util
Class TermTokenizer

java.lang.Object
  extended by it.unipi.di.util.TermTokenizer
All Implemented Interfaces:
Tokenizer
Direct Known Subclasses:
FixedTokenizer, URLTokenizer

public class TermTokenizer
extends Object
implements Tokenizer

Splits a text into a list of tokens, '\n' is added. A token is a maximal sequence of alphanumeric characters or other symbols. White-spaces are considered part of the previous token.

Author:
Claudio Corsi, Paolo Ferragina

Constructor Summary
TermTokenizer(String file)
          Create a new TermTokenizer over the given file.
 
Method Summary
 void close()
           
 String next()
           
 void reset()
           
 String toString()
           
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

TermTokenizer

public TermTokenizer(String file)
              throws IOException
Create a new TermTokenizer over the given file.

Parameters:
file - the file to split in tokens
Throws:
IOException
Method Detail

toString

public String toString()
Overrides:
toString in class Object

next

public String next()
            throws IOException
Specified by:
next in interface Tokenizer
Throws:
IOException

reset

public void reset()
           throws IOException
Specified by:
reset in interface Tokenizer
Throws:
IOException

close

public void close()
           throws IOException
Specified by:
close in interface Tokenizer
Throws:
IOException