|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unipi.di.textdb.TextDB
public abstract class TextDB
Consider a textual file consisting of records (i.e. lines separated
by new lines). Each record is composed of a variable number of fields (i.e.
strings separated by '\t'). Possibly, each field may be composed by multiple
values separated by a char-sequence specified by the user. The number of values and the number
of fields may differ among the records, thus generalizing the classical relational approach.
A TextDB stores this input file on disk in a compressed
form and offers efficient access to individual records and fields.
A record and a field can be identified by means of an ordinal
position, counted from 0 and starting from the beginning of the file (for the records) or
the beginning of the record (for the fields).
Implementations of this class provide different compression techniques and
different accessing methods, thus offering time/space trade-offs.
build(String, PrintStream)
method that must be
provided by each implementation.
This method will build the data structures using default values for any
custom parameter. In order
to build the TextDB with customized parameters, you need to
define a static "build" method accepting parameters and
performing all the work in order to produce on disk the permanent data structures.
This is only a suggestion on how to develop a better implementation, but there are no constraints to
respect it (except the good practice to respect a suggested standard).
fromTDBFile(String)
that load a TDB file and returns the correct instance of the stored TextDB without knowing nothing
about it.
Field Summary | |
---|---|
static String |
DEFAULT_FIELD_SEPARATOR
The default separator '\t' for fields. |
protected it.unimi.dsi.mg4j.util.MutableString |
fieldSeparator
The char sequence used to separate the fields within a record. |
protected String |
filename
|
Constructor Summary | |
---|---|
TextDB(String filename)
Creates a new TextDB from an input textual file. |
Method Summary | |
---|---|
TextDB |
build(String outfile)
Builds a TextDB over the textual file identified by the filename string used in the constructor (see TextDB(String) ). |
abstract TextDB |
build(String outfile,
PrintStream log)
Builds the TextDB over the textual file identified by the filename string used in the constructor (see TextDB(String) ). |
void |
close()
Closes the TextDB and releases all of its resources. |
static TextDB |
fromTDBFile(String tdbFile)
Returns a TextDB from a TDB file. |
abstract String |
get(int record)
Returns the record for a given position in the range [0, N-1], where N is the number of records present in the TextDB. |
String |
get(int record,
int field)
Returns the field of a record, given their ordinal positions, or null if one is not present. |
protected String |
getField(String record,
int field)
Splits the input record into fields, using the separator specified with setFieldSeparator(String) ,
and returns the field at the specified position, or null if that position
is out-of-bound. |
String[] |
getFieldValues(String field,
String sep)
Returns the values of a multi-valued field, where values are separated by a user-defined separator. |
String |
getName()
Returns the name of this TextDB. |
abstract String[] |
getRange(int i,
int j)
Returns the records having positions from i to j in the TextDB. |
String[] |
getRange(int i,
int j,
int field)
Returns the specified field for the records in the range [i,j]. |
abstract void |
getRange(int i,
int j,
int field,
BufferedWriter out)
Print on the passed PrintStream the specified field for the records in the range [i,j]. |
String[] |
getRecordFields(String record)
Returns all fields forming the input record. |
String[] |
getSequential(int[] records)
Given a sorted array of record positions, this method returns all of them. |
String[] |
getSequential(int[] records,
int field)
Given a sorted array of record positions and the position of a field, this method returns that field of those records. |
abstract void |
getSequential(int[] records,
int field,
BufferedWriter out)
Given a sorted array of record positions and the position of a field, this method retrieves the specified field from those records. |
abstract String[] |
getSequential(int[] records,
int pos,
int length)
Given an array of record positions containing a sorted subrange defined by the parameters pos and length ,
this method returns the records for such positions. |
void |
open()
Opens the TextDB. |
void |
setFieldSeparator(String sep)
Set the sequence of chars used to separate fields. |
abstract int |
size()
Returns the number of records contained in this TextDB. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
protected String filename
public static final String DEFAULT_FIELD_SEPARATOR
protected it.unimi.dsi.mg4j.util.MutableString fieldSeparator
Constructor Detail |
---|
public TextDB(String filename)
Directory
object and to respect the TDB format.
filename
- the file containing this TextDBMethod Detail |
---|
public static TextDB fromTDBFile(String tdbFile) throws IOException
tdbFile
- the TDB file
IOException
- if the input file is not a valid TDB file or some I/O errors occurpublic String getName()
public void setFieldSeparator(String sep)
sep
- the new separator for fieldsprotected String getField(String record, int field)
setFieldSeparator(String)
,
and returns the field at the specified position, or null if that position
is out-of-bound.
record
- the record to parsefield
- the ordinal position of the field to return, or -1 to return the entire record
public String[] getFieldValues(String field, String sep)
field
- the field contentsep
- the separator used to separate the values composing this field
public String[] getRecordFields(String record)
record
- the record to parse
public abstract String get(int record) throws IOException
record
- a position in the range [0, N-1]
IOException
public String get(int record, int field) throws IOException
record
- the position of a recordfield
- the position of the field to be retrieved
IOException
public abstract String[] getRange(int i, int j) throws IOException
i
- the starting position of the records to retrieve (inclusive)j
- the ending position of the records to retrieve (inclusive)
IOException
public String[] getRange(int i, int j, int field) throws IOException
i
- the starting position of the records to be fetched (included)j
- the ending position of the records to be fetched (included)field
- the position of the field to return for all those records
IOException
public abstract void getRange(int i, int j, int field, BufferedWriter out) throws IOException
PrintStream
the specified field for the records in the range [i,j].
If not present, an empty line will be dumped out.
i
- the starting position of the records to be fetched (included)j
- the ending position of the records to be fetched (included)field
- the position (counting from 0) of the field to return for all the records in range, or -1 to retrieve the entire recordout
- the output BufferedWriter
IOException
public String[] getSequential(int[] records) throws IOException
records
- a sorted array of record positions
IOException
public abstract String[] getSequential(int[] records, int pos, int length) throws IOException
pos
and length
,
this method returns the records for such positions.
records[pos]
(included) to records[pos+length]
(exluded).
records
- array with a sorted subrange of records positionspos
- the starting position of the subrangelength
- the length of the subrange
IOException
public String[] getSequential(int[] records, int field) throws IOException
records
- a sorted array of record positionsfield
- the field to select into each of these records
IOException
public abstract void getSequential(int[] records, int field, BufferedWriter out) throws IOException
getField(String, int)
provided by
this abstract class that selects a field of a record through a sequential access
to the record itself. The use of a more efficient implementation of this function
is encouraged.
records
- a sorted array of record positionsfield
- the position of the field to extract, or -1 to dump all fieldsout
- the output BufferedWriter
IOException
public void open() throws IOException
IOException
public void close() throws IOException
IOException
public abstract int size()
public abstract TextDB build(String outfile, PrintStream log) throws IOException
TextDB(String)
).
This method runs a build process with default values for all input parameters.
PrintStream
, or suppressed
if the passed reference is null.
log
- a PrintStream for log messages. A null value will suppress any output messageoutfile
- The output file name.
IOException
public TextDB build(String outfile) throws IOException
TextDB(String)
).
This method runs a build process with default values for all input parameters.
outfile
- The output file name
IOException
|
||||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |