it.unipi.di.util
Class FixedTokenizer

java.lang.Object
  extended by it.unipi.di.util.TermTokenizer
      extended by it.unipi.di.util.FixedTokenizer
All Implemented Interfaces:
Tokenizer

public class FixedTokenizer
extends TermTokenizer

Splits a text into a list of fixed-size tokens. Token "\n" is added.

Author:
Claudio Corsi, Paolo Ferragina

Field Summary
static int DEFAULT_LENGTH
           
 
Constructor Summary
FixedTokenizer(String file)
          Create a new FixedTokenizer object over the given file using the default length value of 4.
FixedTokenizer(String file, int length)
          Create a new FixedTokenizer over the given file using a custom token length.
 
Method Summary
 String toString()
           
 
Methods inherited from class it.unipi.di.util.TermTokenizer
close, next, reset
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

DEFAULT_LENGTH

public static final int DEFAULT_LENGTH
See Also:
Constant Field Values
Constructor Detail

FixedTokenizer

public FixedTokenizer(String file)
               throws IOException
Create a new FixedTokenizer object over the given file using the default length value of 4.

Parameters:
file - the file to split in tokens
Throws:
IOException

FixedTokenizer

public FixedTokenizer(String file,
                      int length)
               throws IOException
Create a new FixedTokenizer over the given file using a custom token length.

Parameters:
file - the file to split in tokens
length - the (maximum) length of the tokens in the number of chars.
Throws:
IOException
Method Detail

toString

public String toString()
Overrides:
toString in class TermTokenizer