it.unipi.di.util
Class URLTokenizer

java.lang.Object
  extended by it.unipi.di.util.TermTokenizer
      extended by it.unipi.di.util.URLTokenizer
All Implemented Interfaces:
Tokenizer

public class URLTokenizer
extends TermTokenizer

A Tokenizer for a list of URLs. Each URL is tokenized into: protocol, host, port, directories of the path, query part. The token '\n' is added.

Author:
Claudio Corsi, Paolo Ferragina

Constructor Summary
URLTokenizer(String file)
          Create a new URLTokenizer over the given file.
 
Method Summary
 
Methods inherited from class it.unipi.di.util.TermTokenizer
close, next, reset, toString
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

URLTokenizer

public URLTokenizer(String file)
             throws IOException
Create a new URLTokenizer over the given file.

Parameters:
file - the file containing a list of URLs to split in tokens (separated by '\n')
Throws:
IOException