it.unipi.di.textdb
Class GZipFileCursor

java.lang.Object
  extended by it.unipi.di.textdb.TextDB
      extended by it.unipi.di.textdb.GZipFileCursor

public class GZipFileCursor
extends TextDB

A forward only cursor over a GZip compressed file, implemented using the standard interface to ZLib provided by the package java.util.zip.

Zip file format is also supported but, in this case, the compression and access is made using the JCraft library, a 100% pure Java implementation of ZLib (see the JCraft home page).

NOTE: The JCraft library doesn't support the GZip format instead.

Author:
Claudio Corsi, Paolo Ferragina

Field Summary
 
Fields inherited from class it.unipi.di.textdb.TextDB
DEFAULT_FIELD_SEPARATOR
 
Constructor Summary
GZipFileCursor(String filename)
          Creates an instance of a GZipFileCursor over a zip or gzip compressed file.
 
Method Summary
 TextDB build(PrintStream log)
          Create a TextDB compressing the input file with GZip.
static TextDB build(String filename, boolean useZip)
          Create a TextDB compressing the input file with GZip (optionally Zip).
 void close()
          Closes the TextDB and releases all of its resources.
 String get(int record)
          Returns the String in position pos for this compressed TextDB.
 int getCurrentPos()
          Returns the current record position.
 String getCurrentValue()
          Returns the content of the current record.
 String[] getRange(int i, int j)
          Returns the records having positions from i to j in the TextDB.
 String[] getSequential(int[] records)
          Given a sorted array of record positions, this method returns all of them.
 String[] getSequential(int[] records, int pos, int length)
          Given an array of record positions containing a sorted subrange defined by the parameters pos and length, this method returns the records for such positions.
 void getSequential(int[] records, int field, PrintStream out)
          Given a sorted array of record positions and the position of a field, this method retrieves the specified field from those records.
static void main(String[] args)
           
 String next()
          Moves the cursor on the next record and returns it.
 String next(int pos)
          Moves the cursor on a specified record and returns it.
 void open()
          Opens the TextDB.
 int size()
          It always returns zero.
 
Methods inherited from class it.unipi.di.textdb.TextDB
build, fromTDBFile, get, getFieldValues, getName, getRange, getRecordFields, getSequential, setFieldSeparator
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

GZipFileCursor

public GZipFileCursor(String filename)
Creates an instance of a GZipFileCursor over a zip or gzip compressed file.

Parameters:
filename - the TDB file containing the compressed file
Method Detail

open

public void open()
          throws IOException
Description copied from class: TextDB
Opens the TextDB.
This method has to be called before any other operation on the TextDB.

Overrides:
open in class TextDB
Throws:
IOException

next

public String next()
            throws IOException
Moves the cursor on the next record and returns it. A null value is returned when the cursor is over.

Returns:
the next String
Throws:
IOException

next

public String next(int pos)
            throws IOException
Moves the cursor on a specified record and returns it. The new record position have to be greater or equals than the current one. A null value is returned when the cursor is over.

Parameters:
pos - the position of the record where to move this cursor
Returns:
the String relative to the new position or null if the cursor is over
Throws:
IOException

getCurrentPos

public int getCurrentPos()
Returns the current record position.

Returns:
the current position of this cursor.

getCurrentValue

public String getCurrentValue()
Returns the content of the current record.

Returns:
the content of the current record

close

public void close()
           throws IOException
Description copied from class: TextDB
Closes the TextDB and releases all of its resources.

Overrides:
close in class TextDB
Throws:
IOException

get

public String get(int record)
           throws IOException
Returns the String in position pos for this compressed TextDB. As this TextDB is a forward-only cursor, it is possible to retrieve only consecutive positions. In other words, the required position must be greater than or equal to the previous one.

Specified by:
get in class TextDB
Parameters:
record - a position in the range [0, N-1]
Returns:
the requested record
Throws:
IOException

size

public int size()
It always returns zero.

Specified by:
size in class TextDB
Returns:
the size of this TextDB as the number of the contained records

getRange

public String[] getRange(int i,
                         int j)
                  throws IOException
Description copied from class: TextDB
Returns the records having positions from i to j in the TextDB.

Specified by:
getRange in class TextDB
Parameters:
i - the starting position of the records to retrieve (inclusive)
j - the ending position of the records to retrieve (inclusive)
Returns:
the records in the defined range
Throws:
IOException

getSequential

public String[] getSequential(int[] records)
                       throws IOException
Description copied from class: TextDB
Given a sorted array of record positions, this method returns all of them.

If some of the requested records are not available, the behavior is unspecified and depend on the underlying implementation.

Overrides:
getSequential in class TextDB
Parameters:
records - a sorted array of record positions
Returns:
the records having these positions (order is preserved)
Throws:
IOException

getSequential

public String[] getSequential(int[] records,
                              int pos,
                              int length)
                       throws IOException
Description copied from class: TextDB
Given an array of record positions containing a sorted subrange defined by the parameters pos and length, this method returns the records for such positions.

The fetched positions are the ones in the range records[pos] (included) to records[pos+length] (exluded).

Specified by:
getSequential in class TextDB
Parameters:
records - array with a sorted subrange of records positions
pos - the starting position of the subrange
length - the length of the subrange
Returns:
the records having these positions (order is preserved)
Throws:
IOException

getSequential

public void getSequential(int[] records,
                          int field,
                          PrintStream out)
                   throws IOException
Description copied from class: TextDB
Given a sorted array of record positions and the position of a field, this method retrieves the specified field from those records. If a record doesn't contain the requested field, the behavior of the method depends on its implementation (implementing classes are encouraged to dump a new line in this case, i.e. empty string).
In order to dump all fields of the specified records, you have to input the integer -1 as field position.

The retrieved records are not kept in memory but immediately dumped on the provided PrintStream without wasting further memory.

NOTE: implementations can use the method TextDB.getField(String, int) provided by this abstract class that selects a field of a record through a sequential access to the record itself. The use of a more efficient implementation of this function is encouraged.

Specified by:
getSequential in class TextDB
Parameters:
records - a sorted array of record positions
field - the position of the field to extract, or -1 to dump all fields
out - the output PrintStream
Throws:
IOException

build

public static TextDB build(String filename,
                           boolean useZip)
                    throws IOException
Create a TextDB compressing the input file with GZip (optionally Zip). The built database will be accessed in a sequential manner a la zcat.

Parameters:
filename - the file to compress
useZip - if true compress with zip (--best) instead of gzip
Returns:
the TextDB to access the built database
Throws:
IOException

build

public TextDB build(PrintStream log)
             throws IOException
Create a TextDB compressing the input file with GZip. The built database will be accessed in a sequential manner a la zcat.

Specified by:
build in class TextDB
Parameters:
log - this parameter is not used
Returns:
A TextDB instance to access the built database.
Throws:
IOException

main

public static void main(String[] args)
                 throws Exception
Throws:
Exception