it.unipi.di.textdb
Class ZipCursor

java.lang.Object
  extended by it.unipi.di.textdb.TextDB
      extended by it.unipi.di.textdb.ZipCursor

public class ZipCursor
extends TextDB

A forward only cursor over a Zip compressed file. The implementation is based on the JCraft library, a 100% pure Java implementation of ZLib (see the JCraft home page).

GZip format is also supported but, in this case, the compression and access is made using the standard java.util.zip package.

NOTE: The JCraft library doesn't support the GZip format instead.

Author:
Claudio Corsi, Paolo Ferragina

Field Summary
protected  int currentPos
           
protected  it.unimi.dsi.mg4j.util.MutableString currentRec
           
protected  it.unimi.dsi.mg4j.io.FastBufferedReader cursor
           
protected  boolean isOpen
           
protected  boolean isZip
           
 
Fields inherited from class it.unipi.di.textdb.TextDB
DEFAULT_FIELD_SEPARATOR, fieldSeparator, filename
 
Constructor Summary
ZipCursor(String filename)
          Creates an instance of a ZipFileCursor over a zipped (or gzipped) compressed file.
 
Method Summary
 TextDB build(String outfile, PrintStream log)
          Create a TextDB compressing the input file with Zip.
static TextDB build(String filename, String outfile, boolean useGZip)
          Create a TextDB compressing the input file with Zip or optionally with GZip.
 void close()
          Closes the TextDB and releases all of its resources.
 String get(int record)
          Returns the String in position pos for this compressed TextDB.
 int getCurrentPos()
          Returns the current record position.
 String getCurrentValue()
          Returns the content of the current record.
 String[] getRange(int i, int j)
          Returns the records having positions from i to j in the TextDB.
 void getRange(int i, int j, int field, BufferedWriter out)
          Print on the passed PrintStream the specified field for the records in the range [i,j].
 String[] getSequential(int[] records)
          Given a sorted array of record positions, this method returns all of them.
 void getSequential(int[] records, int field, BufferedWriter out)
          Given a sorted array of record positions and the position of a field, this method retrieves the specified field from those records.
 String[] getSequential(int[] records, int pos, int length)
          Given an array of record positions containing a sorted subrange defined by the parameters pos and length, this method returns the records for such positions.
static void main(String[] args)
           
 String next()
          Moves the cursor on the next record and returns it.
 String next(int pos)
          Moves the cursor on a specified record and returns it.
 void open()
          Opens the TextDB.
 int size()
          Returns the number of records contained in this TextDB.
protected  void skip(int pos)
          Forwards this cursor to the record at position pos, which must be positive and greater than or equal to the previous one.
 
Methods inherited from class it.unipi.di.textdb.TextDB
build, fromTDBFile, get, getField, getFieldValues, getName, getRange, getRecordFields, getSequential, setFieldSeparator
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

cursor

protected it.unimi.dsi.mg4j.io.FastBufferedReader cursor

currentPos

protected int currentPos

currentRec

protected it.unimi.dsi.mg4j.util.MutableString currentRec

isOpen

protected boolean isOpen

isZip

protected boolean isZip
Constructor Detail

ZipCursor

public ZipCursor(String filename)
Creates an instance of a ZipFileCursor over a zipped (or gzipped) compressed file.

Parameters:
filename - the TDB file containing the compressed file
Method Detail

open

public void open()
          throws IOException
Description copied from class: TextDB
Opens the TextDB.
This method has to be called before any other operation on the TextDB.

Overrides:
open in class TextDB
Throws:
IOException

next

public String next()
            throws IOException
Moves the cursor on the next record and returns it. A null value is returned when the cursor is over.

Returns:
the next String
Throws:
IOException

next

public String next(int pos)
            throws IOException
Moves the cursor on a specified record and returns it. The new record position have to be greater or equals than the current one. A null value is returned when the cursor is over.

Parameters:
pos - the position of the record where to move this cursor
Returns:
the String relative to the new position or null if the cursor is over
Throws:
IOException

skip

protected void skip(int pos)
             throws IOException
Forwards this cursor to the record at position pos, which must be positive and greater than or equal to the previous one.

Parameters:
pos - the new (positive) position for this cursor
Throws:
IOException

getCurrentPos

public int getCurrentPos()
Returns the current record position.

Returns:
the current position of this cursor.

getCurrentValue

public String getCurrentValue()
Returns the content of the current record.

Returns:
the content of the current record

close

public void close()
           throws IOException
Description copied from class: TextDB
Closes the TextDB and releases all of its resources.

Overrides:
close in class TextDB
Throws:
IOException

get

public String get(int record)
           throws IOException
Returns the String in position pos for this compressed TextDB. As this TextDB is a forward-only cursor, it is possible to retrieve only consecutive positions. In other words, the required position must be greater than or equal to the previous one.

Specified by:
get in class TextDB
Parameters:
record - a position in the range [0, N-1]
Returns:
the requested record
Throws:
IOException

size

public int size()
Description copied from class: TextDB
Returns the number of records contained in this TextDB. If N is the returned value then records of this database are numbered from 0 to N-1.

Specified by:
size in class TextDB
Returns:
the size of this TextDB as the number of the contained records

getRange

public String[] getRange(int i,
                         int j)
                  throws IOException
Description copied from class: TextDB
Returns the records having positions from i to j in the TextDB.

Specified by:
getRange in class TextDB
Parameters:
i - the starting position of the records to retrieve (inclusive)
j - the ending position of the records to retrieve (inclusive)
Returns:
the records in the defined range
Throws:
IOException

getRange

public void getRange(int i,
                     int j,
                     int field,
                     BufferedWriter out)
              throws IOException
Description copied from class: TextDB
Print on the passed PrintStream the specified field for the records in the range [i,j]. If not present, an empty line will be dumped out.

Specified by:
getRange in class TextDB
Parameters:
i - the starting position of the records to be fetched (included)
j - the ending position of the records to be fetched (included)
field - the position (counting from 0) of the field to return for all the records in range, or -1 to retrieve the entire record
out - the output BufferedWriter
Throws:
IOException

getSequential

public String[] getSequential(int[] records)
                       throws IOException
Description copied from class: TextDB
Given a sorted array of record positions, this method returns all of them.

If some of the requested records are not available, the behavior is unspecified and depend on the underlying implementation.

Overrides:
getSequential in class TextDB
Parameters:
records - a sorted array of record positions
Returns:
the records having these positions (order is preserved)
Throws:
IOException

getSequential

public String[] getSequential(int[] records,
                              int pos,
                              int length)
                       throws IOException
Description copied from class: TextDB
Given an array of record positions containing a sorted subrange defined by the parameters pos and length, this method returns the records for such positions.

The fetched positions are the ones in the range records[pos] (included) to records[pos+length] (exluded).

Specified by:
getSequential in class TextDB
Parameters:
records - array with a sorted subrange of records positions
pos - the starting position of the subrange
length - the length of the subrange
Returns:
the records having these positions (order is preserved)
Throws:
IOException

getSequential

public void getSequential(int[] records,
                          int field,
                          BufferedWriter out)
                   throws IOException
Description copied from class: TextDB
Given a sorted array of record positions and the position of a field, this method retrieves the specified field from those records. If a record doesn't contain the requested field, the behavior of the method depends on its implementation (implementing classes are encouraged to dump a new line in this case, i.e. empty string).
In order to dump all fields of the specified records, you have to input the integer -1 as field position.

The retrieved records are not kept in memory but immediately dumped on the provided PrintStream without wasting further memory.

NOTE: implementations can use the method TextDB.getField(String, int) provided by this abstract class that selects a field of a record through a sequential access to the record itself. The use of a more efficient implementation of this function is encouraged.

Specified by:
getSequential in class TextDB
Parameters:
records - a sorted array of record positions
field - the position of the field to extract, or -1 to dump all fields
out - the output BufferedWriter
Throws:
IOException

build

public static TextDB build(String filename,
                           String outfile,
                           boolean useGZip)
                    throws IOException
Create a TextDB compressing the input file with Zip or optionally with GZip. The built database will be accessed in a sequential manner a la zcat.

Parameters:
filename - the file to compress
outfile - the output file name
useGZip - if true compress with gzip instead of zip
Returns:
the TextDB to access the built database
Throws:
IOException

build

public TextDB build(String outfile,
                    PrintStream log)
             throws IOException
Create a TextDB compressing the input file with Zip. The built database will be accessed in a sequential manner a la zcat.

Specified by:
build in class TextDB
Parameters:
log - this parameter is not used
outfile - the output file name
Returns:
A TextDB instance to access the built database.
Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception