|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unipi.di.textdb.TextDB
it.unipi.di.textdb.BucketedGZip
public class BucketedGZip
This is a TextDB
which uses a combination of a bucketing scheme
and the GZip data compression technique. A bucket is defined
as a fixed-number of contiguous records. Each bucket is compressed with
GZip (thus it has variable length),
and may be accessed via a pointer (also called jumper) kept in a
file on disk.
At query time the bucket containing the requested record is identified,
using its corresponding jumper, loaded in memory and (fully-)uncompressed until
the requested record is met.
SearchableDB
will
work over this TextDB. That support must be required at building time using
the proper custom parameter (see build(String, int, boolean, PrintStream)
)
or the --search-support
command line option. Remember that no check will be
performed by the library to verify if the input file is really sorted. The user has to
guarantee it.
ExternalSort
Field Summary |
---|
Fields inherited from class it.unipi.di.textdb.TextDB |
---|
DEFAULT_FIELD_SEPARATOR |
Constructor Summary | |
---|---|
BucketedGZip(String filename)
Create a new BucketedGZip object loading the needed data structures from the provided file. |
Method Summary | |
---|---|
TextDB |
build(PrintStream log)
Builds the TextDB over the textual file identified by the filename string used in the constructor (see TextDB.TextDB(String) ). |
static TextDB |
build(String inputFile,
int bucketSize,
boolean searchSupport,
PrintStream log)
Build a BucketedGZip over an input file. |
void |
close()
Closes the TextDB and releases all of its resources. |
String |
get(int record)
Returns the record for a given position in the range [0, N-1], where N is the number of records present in the TextDB. |
String[] |
getRange(int i,
int j)
Returns the records having positions from i to j in the TextDB. |
String[] |
getSequential(int[] records)
Given a sorted array of record positions, this method returns all of them. |
String[] |
getSequential(int[] records,
int pos,
int length)
Given an array of record positions containing a sorted subrange defined by the parameters pos and length ,
this method returns the records for such positions. |
void |
getSequential(int[] records,
int field,
PrintStream out)
Given a sorted array of record positions and the position of a field, this method retrieves the specified field from those records. |
static void |
main(String[] args)
|
void |
open()
Opens the TextDB. |
Range |
prefix(String p)
Returns the range [i, j) of consecutive records in the TextDB that are prefixed by string p. |
int |
rank(String s)
Returns the position in this TextDB of the input string. |
int |
size()
Returns the number of records contained in this TextDB. |
Methods inherited from class it.unipi.di.textdb.TextDB |
---|
build, fromTDBFile, get, getFieldValues, getName, getRange, getRecordFields, getSequential, setFieldSeparator |
Methods inherited from class java.lang.Object |
---|
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public BucketedGZip(String filename)
filename
- the file containing the content and the data structures to load,
stored in TDB formatMethod Detail |
---|
public void close() throws IOException
TextDB
close
in class TextDB
IOException
public String get(int record) throws IOException
TextDB
get
in class TextDB
record
- a position in the range [0, N-1]
IOException
public int size()
TextDB
size
in class TextDB
public String[] getRange(int i, int j) throws IOException
TextDB
getRange
in class TextDB
i
- the starting position of the records to retrieve (inclusive)j
- the ending position of the records to retrieve (inclusive)
IOException
public String[] getSequential(int[] records) throws IOException
TextDB
getSequential
in class TextDB
records
- a sorted array of record positions
IOException
public String[] getSequential(int[] records, int pos, int length) throws IOException
TextDB
pos
and length
,
this method returns the records for such positions.
records[pos]
(included) to records[pos+length]
(exluded).
getSequential
in class TextDB
records
- array with a sorted subrange of records positionspos
- the starting position of the subrangelength
- the length of the subrange
IOException
public void getSequential(int[] records, int field, PrintStream out) throws IOException
TextDB
TextDB.getField(String, int)
provided by
this abstract class that selects a field of a record through a sequential access
to the record itself. The use of a more efficient implementation of this function
is encouraged.
getSequential
in class TextDB
records
- a sorted array of record positionsfield
- the position of the field to extract, or -1 to dump all fieldsout
- the output PrintStream
IOException
public void open() throws IOException
TextDB
open
in class TextDB
IOException
public Range prefix(String p) throws IOException
SearchableDB
prefix
in interface SearchableDB
p
- the prefix to search
IOException
public int rank(String s) throws IOException
SearchableDB
rank
in interface SearchableDB
s
- the string to be searched
IOException
public static TextDB build(String inputFile, int bucketSize, boolean searchSupport, PrintStream log) throws IOException
inputFile
- the file to compressbucketSize
- the maximum size (in the number of records) of each bucketsearchSupport
- if true build the data structures needed to support the search methods
defined in the interface SearchableDB
log
- a PrintStream
where to send the log messages. If null that messages will be suppressed
IOException
public TextDB build(PrintStream log) throws IOException
TextDB
TextDB.TextDB(String)
).
This method runs a build process with default values for all input parameters.
PrintStream
, or suppressed
if the passed reference is null.
build
in class TextDB
log
- a PrintStream for log messages. A null value will suppress any output message
IOException
public static void main(String[] args) throws Exception
Exception
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |