it.unipi.di.rs
Class SparseSelect

java.lang.Object
  extended by it.unipi.di.rs.SparseSelect
All Implemented Interfaces:
IRankSelect

public class SparseSelect
extends Object
implements IRankSelect

An index implementing the select function over binary files with a small number of 1s.
The index uses the SDarrays method and creates two vectors from the source: H and L; it can performs select operations in constant time and rank operations as binary search.
This index is not suitable for source files having an amount of 1s greater than the 15% of their size.

Being a self-index, after the first build the source will be no longer necessary to perform rank and select operations. If the two support vectors are already stored on disk it is possible to create this index without source using constructor SparseSelect(String, String) and then create the data structures using createOnlyIndex methods or load it with load().

Author:
Claudio Corsi, Paolo Ferragina, Alessandro Barilari

Field Summary
static double DENSITY_LIMIT
           
 
Fields inherited from interface it.unipi.di.rs.IRankSelect
DISK, MEMORY
 
Constructor Summary
SparseSelect(String source)
          Creates an object and initializes basic data structures; it will be necessary to use a specific method to create data structures or load them.
SparseSelect(String source, Directory tdb)
          Creates an object from a source contained in a Directory and initializes basic data structures; it will be necessary to use a specific method to create data structures or load them.
SparseSelect(String HFileName, String LFileName)
          Create an index that will work without the source; H and L are required.
SparseSelect(String HFileName, String LFileName, Directory tdb)
          Create an index that will work without the source; H and L are required inside the Directory.
 
Method Summary
 long count1(long start, long end)
          Returns the number of 1s in the selected range of the source
 void createIndex(int mode)
          Creates the index in memory or on disk.
 void createIndex(int mode, float iPerc)
          Creates the index in memory or on disk using the specified amount of space
 void createOnlyIndex(int mode)
          Creates the Index to perform rank operations without creating files that contains H and L vectors.
 void createOnlyIndex(int mode, float iPerc)
          Creates the Index to perform select operations without creating files that contains H and L.
 long DenseVectorSize()
          Returns the size of the dense vector
 boolean get(long index)
          Returns the value at the specified position
static String getIndexFileName(String baseName)
          Generates the names of the file that will contain the index in case of store() operations
static String[] getVectorsFileNames(String baseName)
          Generates the names of the two files that will contain the two vectors generated by this index.
 long ISize()
          Returns the size in bits of the index
 void load()
          Loads the data structures from the default file(s)
 void load(Directory directory)
          This methods acts as IRankSelect.load(), but load data structures from a Directory
 long rank0(long pos)
          Returns the rank0 at the pos-th bit (starting from 0)
 long rank1(long pos)
          Returns the rank1 at the pos-th bit (starting from 0)
 long Scard()
          Returns the number of 1s in the source
 long select1(long rank)
          Returns the position of the rank-th 1 (starting from 0)
 long Ssize()
          Returns the size in bits of the source file.
 void store()
          Stores data structures in the default file(s).
 void store(Directory directory)
          This methods acts as IRankSelect.store(), but store data structures inside a Directory
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

DENSITY_LIMIT

public static final double DENSITY_LIMIT
See Also:
Constant Field Values
Constructor Detail

SparseSelect

public SparseSelect(String source)
             throws IOException
Creates an object and initializes basic data structures; it will be necessary to use a specific method to create data structures or load them.

Parameters:
source - the name of the source file
Throws:
IOException

SparseSelect

public SparseSelect(String source,
                    Directory tdb)
             throws IOException
Creates an object from a source contained in a Directory and initializes basic data structures; it will be necessary to use a specific method to create data structures or load them.

Parameters:
source - the name of the source file
Throws:
IOException

SparseSelect

public SparseSelect(String HFileName,
                    String LFileName)
             throws IOException
Create an index that will work without the source; H and L are required. it will be necessary to use a specific method to create data structures or load them.

Parameters:
HFileName - the file inside tdb containing the vector H
LFileName - the file inside tdb containing the vector L
Throws:
IOException

SparseSelect

public SparseSelect(String HFileName,
                    String LFileName,
                    Directory tdb)
             throws IOException
Create an index that will work without the source; H and L are required inside the Directory. it will be necessary to use a specific method to create data structures or load them.

Parameters:
HFileName - the file inside tdb containing the vector H
LFileName - the file inside tdb containing the vector L
tdb - the Directory used to store information to disk
Throws:
IOException
Method Detail

getVectorsFileNames

public static String[] getVectorsFileNames(String baseName)
Generates the names of the two files that will contain the two vectors generated by this index.

Parameters:
baseName - is the name from which generate the new names; tipically the name of the source of the index
Returns:
the name of the two files that will contain the vectors

getIndexFileName

public static String getIndexFileName(String baseName)
Generates the names of the file that will contain the index in case of store() operations

Parameters:
baseName - is the name from which generate the new names; tipically the name of the source of the index
Returns:
the name of the file that will contain main index data structures

Scard

public long Scard()
           throws IOException
Description copied from interface: IRankSelect
Returns the number of 1s in the source

Specified by:
Scard in interface IRankSelect
Throws:
IOException

Ssize

public long Ssize()
Description copied from interface: IRankSelect
Returns the size in bits of the source file.

Specified by:
Ssize in interface IRankSelect

DenseVectorSize

public long DenseVectorSize()
Returns the size of the dense vector

Throws:
IOException

ISize

public long ISize()
           throws IOException
Description copied from interface: IRankSelect
Returns the size in bits of the index

Specified by:
ISize in interface IRankSelect
Throws:
IOException

createIndex

public void createIndex(int mode)
                 throws IOException
Description copied from interface: IRankSelect
Creates the index in memory or on disk.

Specified by:
createIndex in interface IRankSelect
Parameters:
mode - the mode of creation (IRankSelect.MEMORY or IRankSelect.DISK)
Throws:
IOException

createIndex

public void createIndex(int mode,
                        float iPerc)
                 throws IOException
Description copied from interface: IRankSelect
Creates the index in memory or on disk using the specified amount of space

Specified by:
createIndex in interface IRankSelect
Parameters:
mode - the mode of creation (IRankSelect.MEMORY or IRankSelect.DISK)
iPerc - is the space occupancy in terms of % of the source dimension that the index must achieve
Throws:
IOException

createOnlyIndex

public void createOnlyIndex(int mode)
                     throws IOException
Creates the Index to perform rank operations without creating files that contains H and L vectors. It is supposed that these files are already filled with correct values or results will be corrupted

Parameters:
mode - the mode of creation
Throws:
YQLException
IOException

createOnlyIndex

public void createOnlyIndex(int mode,
                            float iPerc)
                     throws IOException
Creates the Index to perform select operations without creating files that contains H and L. It is supposed that these files are already filled with correct values or results will be corrupted

Parameters:
mode - the mode of creation
iPerc - is the % of the dense vector to use to save index information
Throws:
YQLException
IOException

rank0

public long rank0(long pos)
           throws IOException
Description copied from interface: IRankSelect
Returns the rank0 at the pos-th bit (starting from 0)

Specified by:
rank0 in interface IRankSelect
Parameters:
pos - is the position to analyze
Throws:
IOException

rank1

public long rank1(long pos)
           throws IOException
Description copied from interface: IRankSelect
Returns the rank1 at the pos-th bit (starting from 0)

Specified by:
rank1 in interface IRankSelect
Parameters:
pos - is the position to analyze
Throws:
IOException

select1

public long select1(long rank)
             throws IOException
Description copied from interface: IRankSelect
Returns the position of the rank-th 1 (starting from 0)

Specified by:
select1 in interface IRankSelect
Parameters:
rank - is the rank to find
Throws:
IOException

load

public void load()
          throws IOException
Description copied from interface: IRankSelect
Loads the data structures from the default file(s)

Specified by:
load in interface IRankSelect
Throws:
IOException

load

public void load(Directory directory)
          throws IOException
Description copied from interface: IRankSelect
This methods acts as IRankSelect.load(), but load data structures from a Directory

Specified by:
load in interface IRankSelect
Parameters:
directory - the source Directory; must be opened in READ_MODE
Throws:
IOException

store

public void store()
           throws IOException,
                  IOException
Description copied from interface: IRankSelect
Stores data structures in the default file(s). Previous data contained in the output file will be deleted.

Specified by:
store in interface IRankSelect
Throws:
IOException

store

public void store(Directory directory)
           throws IOException
Description copied from interface: IRankSelect
This methods acts as IRankSelect.store(), but store data structures inside a Directory

Specified by:
store in interface IRankSelect
Parameters:
directory - the destination Directory; must be opened in WRITE_MODE
Throws:
IOException

count1

public long count1(long start,
                   long end)
            throws IOException
Description copied from interface: IRankSelect
Returns the number of 1s in the selected range of the source

Specified by:
count1 in interface IRankSelect
Returns:
the number of 1s in the range [start, end]
Throws:
IOException

get

public boolean get(long index)
            throws IOException
Description copied from interface: IRankSelect
Returns the value at the specified position

Specified by:
get in interface IRankSelect
Returns:
the boolean value of the i-th bit of the source (starting from 0)
Throws:
IOException