it.unipi.di.tokenizer
Class URLTokenizer
java.lang.Object
it.unipi.di.tokenizer.TermTokenizer
it.unipi.di.tokenizer.URLTokenizer
- All Implemented Interfaces:
- Tokenizer
public class URLTokenizer
- extends TermTokenizer
A Tokenizer
for a list of URLs.
Each URL is tokenized into: protocol, host, port, directories of
the path, query part. The token '\n' is added.
- Author:
- Claudio Corsi, Paolo Ferragina
Constructor Summary |
URLTokenizer(String file)
Create a new URLTokenizer over the given file. |
URLTokenizer
public URLTokenizer(String file)
throws IOException
- Create a new URLTokenizer over the given file.
- Parameters:
file
- the file containing a list of URLs to split in tokens (separated by '\n')
- Throws:
IOException
split
protected String[] split(String line)