it.unipi.di.graph
Class GraphLabelerConfig

java.lang.Object
  extended by it.unipi.di.graph.GraphLabelerConfig

public class GraphLabelerConfig
extends Object

Configuration class for a GraphLabeler. Objects of this class are used to store information about how the vertex labels and the edge-attributes values are stored on disk.

Once a configuration is defined, it can be saved in a permanent way in a text file using the method store(String). A previous configuration can be loaded using the static method load(String). The configuration file is written in a text form and can be easily modified with a text editor.

Vertex Labels

The main assumption is that vertices are identified by id which are integers in the range [0, N-1], where N is the total number of vertices in the underlying graph. Labels can be associated to vertices by creating a textual file containing one label per line, so that the n-th line contains the label for the vertex id n. The labels can then be efficiently associated to their vertices by using an instance of a TextDB. A TextDB stores on disk a textual file in a compressed way and offers fast access to its records (i.e. lines, and thus labels) without the need to uncompress the entire file. Among all implementations of a TextDB, the programmer can choose the one that best fits the type of vertex labels. Then one can invoke a build method, or issue it from the command line. Once the build stage is complete the plain textual file can be thrown away.

For maximum flexibility it is possible to define multiple labels subranges (not overlapping) and associate to them different types of TextDBs. This gives the possibility to choose the best TextDB (in time ad space) considering the nature of the textual labels. Indeed, given a subrange [i, j] of vertices (possibly i=0 and j=N-1), one has to form first a textual file containing their labels, then build a TextDB on this file, and finally use the method setVertexDB(Range, TextDB) to tie this TextDB with its vertex subrange.

Edges Attributes

The information relative to the edges can be divided into attributes, each identified by a unique name. The definition of an attribute is global or, in other words, an attribute is defined over all edges of the graph. Using the method setEdgeDB(String, TextDB) it is possible to define an attribute (name) and declare the TextDB object used to access the values of that attribute. As in the case of vertex labels, the attribute values are stored on disk and accessed using a TextDB. This means that first a textual file containing the attribute values must be built, and then a (proper) TextDB has to be constructed on this file. Once the building stage is over, the plain textual file can be thrown away.

As the TextDB provides access to lines (i.e. records) of the underlying file, a convention must be defined in order to store the edges values. Suppose you want to define the attribute "weight" over the edges of your graph. A textual file called "weight" should be created where at the i-th line you store the weight-values for the outgoing edges of vertex "i". If K is the number of these outgoing edges, then K different values should be stored in this line, separated by a special sequence of chars called the "attributes separator" (by default the single char '\t'). The attribute values stored in a line (or record) should be placed accordingly to the numerical order of the neighbors of "i". For example suppose you are defining the weights w1, w2 and w4 for the edges (i, 1), (i, 2) and (i, 4). Then the textual file has to store at line i the string: w1 + '\t' + w2 + '\t' + w4.
If you want to define values for a multi-attribute (that is, if you have more than one edge connecting the same source and target nodes) then you need to put these values separated by a single space (i.e. '\b') into one string which will form the weight-attribute corresponding to that edge. For instance, if you have two edges connecting the vertices i and j where j is the k-th successors of i, then you have to define the string w1 + '\b' + w2 and put it into the "weight" attribute file at i-th row and k-th position.

Configuration File

Another way to bind TextDBs to vertex ranges or edge attributes is to define a configuration file and load it using the load(String) method. The syntax to use is simple and intuitive. For instance, suppose to have two label ranges [0, 99] and [100, 199] for a graph of 200 vertices and two edge attributes, namely "weight" and "color". Suppose to store that informations in four TextDBs stored respectively into the files "label0.tdb", "labels1.tdb", "weight.tdb" and "color.tdb". Then the configuration file will be:
[0, 99]: label0.tdb
[100, 199]: label1.tdb
weight: weight.tdb
color: color.tdb
The first two lines defines two vertex ranges through the usual syntax [i, j] and bind to them the TextDBs storing its labels. The same happens in the last two lines, but the values stored in the referred TextDBs are the values of the edge attributes "weight" and "color". Note that range definitions and edge attribute names are separated from the TextDB file name by a ":" character. Remember also that the defined ranges cannot be overlapping and the attribute names must be unique.

Author:
Claudio Corsi, Paolo Ferragina

Field Summary
protected  SortedMap<String,TextDB> edges
           
protected  SortedMap<Range,TextDB> vertices
           
 
Constructor Summary
GraphLabelerConfig()
          Creates a new configuration.
 
Method Summary
 boolean attributeAlreadyDefined(String attribute)
          Returns true if the given edge attribute has been already configured.
 void close()
          Closes all the TextDBs in this configuration.
 Set<String> getEdgeAttributes()
          Returns the available edge attributes defined in this configuration.
 TextDB getEdgeDB(String attribute)
          Returns the TextDB storing the values of an edge attribute.
 SortedMap<Range,TextDB> getRangeDBMap(Range r)
          Given an input range, this method collect all the Range/TextDB pairs for all the subranges defined in this configuration and contained in the input one.
 Range getRangeFor(int vertex)
          Returns the range containing the input vertex, as defined in this configuration.
 Range getSpanningRange()
          Returns the spanning range containing all the vertex ranges defined in this configuration.
 TextDB getVertexDB(int v)
          Returns the TextDB associated to a vertex.
 TextDB[] getVertexDBs(Range r)
          Returns the TextDBs covering the passed range.
static GraphLabelerConfig load(String file)
          Loads a configuration previously stored in a file.
protected  void open()
          Opens all the TextDBs in this configuration.
 TextDB setEdgeDB(String attribute, TextDB db)
          Defines an edge attribute and its TextDB for accessing its values.
 void setVertexDB(int i, int j, TextDB db)
          Sets a TextDB storing the labels for the vertices in the range [i, j].
 void setVertexDB(Range r, TextDB db)
          Sets a TextDB storing the labels for the vertices in the given range.
 void setVertexDB(TextDB db)
          Sets a TextDB storing the vertex labels.
 void store(String file)
          Writes on a file the current configuration.
 boolean vertexRangeAlreadyDefined(int i, int j)
          Returns true if the range [i, j] has been already defined as vertex range in this configuration.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

vertices

protected SortedMap<Range,TextDB> vertices

edges

protected SortedMap<String,TextDB> edges
Constructor Detail

GraphLabelerConfig

public GraphLabelerConfig()
Creates a new configuration.

Method Detail

vertexRangeAlreadyDefined

public boolean vertexRangeAlreadyDefined(int i,
                                         int j)
Returns true if the range [i, j] has been already defined as vertex range in this configuration.

Parameters:
i - lower bound of the range
j - upper bound of the range
Returns:
true if the range [i, j] has already associated a TextDB in this configuration, false otherwise.

attributeAlreadyDefined

public boolean attributeAlreadyDefined(String attribute)
Returns true if the given edge attribute has been already configured.

Parameters:
attribute - the edge attribute to check
Returns:
true if this edge attribute has already a TextDB associated to it, false otherwise.

setVertexDB

public void setVertexDB(int i,
                        int j,
                        TextDB db)
                 throws IOException
Sets a TextDB storing the labels for the vertices in the range [i, j].

Parameters:
i - the lower bound of the range
j - the upper bound of the range
db - the TextDB to associate to the range [i, j]
Throws:
IOException

setVertexDB

public void setVertexDB(TextDB db)
                 throws IOException
Sets a TextDB storing the vertex labels. If N is the size of the passed DB then vertices from 0 to N-1 will be labeled.

Parameters:
db -
Throws:
IOException

setVertexDB

public void setVertexDB(Range r,
                        TextDB db)
                 throws IOException
Sets a TextDB storing the labels for the vertices in the given range.

Parameters:
r - a range of vertices
db - the TextDB associated to the input range of vertices
Throws:
IOException

setEdgeDB

public TextDB setEdgeDB(String attribute,
                        TextDB db)
                 throws IOException
Defines an edge attribute and its TextDB for accessing its values. If the attribute has a previously associated TextDB, this will be substituted and returned.

Parameters:
attribute - the edge attribute to define
db - the TextDB used to access the values of this attribute
Returns:
a previously associated TextDB, if any
Throws:
IOException

getVertexDBs

public TextDB[] getVertexDBs(Range r)
Returns the TextDBs covering the passed range.

Parameters:
r - a range
Returns:
an array of TextDBs

getRangeDBMap

public SortedMap<Range,TextDB> getRangeDBMap(Range r)
Given an input range, this method collect all the Range/TextDB pairs for all the subranges defined in this configuration and contained in the input one.

Parameters:
r - the input range
Returns:
the Range/TextDB pairs

getVertexDB

public TextDB getVertexDB(int v)
Returns the TextDB associated to a vertex.

Parameters:
v - the vertex id
Returns:
the TextDB containing the label of v

getRangeFor

public Range getRangeFor(int vertex)
Returns the range containing the input vertex, as defined in this configuration. If no range is found, then the range [vertex, vertex] is returned.

Parameters:
vertex - the input vertex
Returns:
the range containing the input vertex, as defined by this configuration

getSpanningRange

public Range getSpanningRange()
Returns the spanning range containing all the vertex ranges defined in this configuration. If no vertex range are defined then the range [-1, -1] is returned.

Returns:
the spanning range over the vertex domain.

getEdgeDB

public TextDB getEdgeDB(String attribute)
Returns the TextDB storing the values of an edge attribute.

Parameters:
attribute - the attribute name
Returns:
the TextDB providing access to the attribute values

store

public void store(String file)
           throws IOException
Writes on a file the current configuration.

Parameters:
file - the output file
Throws:
IOException

load

public static GraphLabelerConfig load(String file)
                               throws IOException
Loads a configuration previously stored in a file.

Parameters:
file - the configuration file to load
Returns:
an instance of LabellerConfig representing the loaded configuration
Throws:
IOException

open

protected void open()
             throws IOException
Opens all the TextDBs in this configuration.

Throws:
IOException

close

public void close()
           throws IOException
Closes all the TextDBs in this configuration.

Throws:
IOException

getEdgeAttributes

public Set<String> getEdgeAttributes()
Returns the available edge attributes defined in this configuration.

Returns:
the available edge attributes.