net.sf.myra.datamining.data
Class Dataset

java.lang.Object
  extended by net.sf.myra.datamining.data.Dataset
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, java.lang.Iterable<Instance>

public final class Dataset
extends java.lang.Object
implements java.lang.Iterable<Instance>, java.lang.Cloneable, java.io.Serializable

This class represents a dataset.

Version:
$Revision: 2326 $ $Date:: 2011-01-25 10:54:47#$
Author:
Fernando Esteban Barril Otero
See Also:
Serialized Form

Field Summary
static java.lang.String CLASS_SEPARATOR
          Default class separator for multi-label class value.
 
Constructor Summary
Dataset(Metadata metadata)
          Creates a new Dataset instance.
 
Method Summary
 void add(java.lang.String[] values)
          Adds a new instance to the dataset.
 Dataset clone()
           
 void computeDomains()
          Computes the continuous attributes' domain values.
 boolean contains(Attribute attribute, java.lang.String value)
          Checks if the dataset contains an instance with the (attribute,value) pair.
 int[] count()
          Returns the class count vector.
 java.util.List<Instance> filter(java.lang.String... c)
          Returns the list of instances that belongs to any of the specified class labels.
 double[] frequency()
          Returns the class frequency vector.
 Instance get(int index)
          Returns the instance specified by the index.
 java.lang.String getFilename()
          Returns the dataset filename.
 java.util.List<Instance> getInstances()
          Returns the list of instances.
 java.util.List<Instance> getInstances(Attribute attribute, java.lang.String value)
          Returns the list of instances that have the specified (attribute, value) combination.
 java.util.List<Instance> getInstances(Label label)
          Returns the list of instances that have the specified label.
 java.util.List<Instance> getInstances(java.lang.String... c)
          Returns the list of instances that belongs to the specified class.
 java.util.List<Instance> getInstances(Term term)
          Returns the list of instances that satisfy the specified term.
 Metadata getMetadata()
          Returns the Metadata object associated with this dataset.
 int getSize()
          Returns the size (number of instances) of the dataset.
 int getSize(java.lang.String c)
          Returns the number of instances that belong to the specified class.
 boolean hasDuplicates()
          Checks if the dataset has duplicated instances (instances with the same values for every attribute).
 boolean hasMissing()
          Checks of the dataset contains instances with missing values.
 boolean isEmpty()
          Returns true if the dataset contains no instances.
 boolean isMultilabel()
          Verifies if this is a multi-label dataset (i.e.
 java.util.Iterator<Instance> iterator()
           
 Instance newInstance()
          Returns a new Instance associated with this dataset.
 void remove(Attribute attribute)
          Removes the specified attribute from the dataset.
 boolean remove(java.util.Collection<Instance> instances)
          Removes all this dataset's instances that are contained in the specified collection.
 boolean remove(Instance instance)
          Removes the first occurrence in this dataset of the specified instance.
 boolean remove(int index)
          Removes the instance with the specified index.
 void reset()
          Resets the instances' weights.
 void setFilename(java.lang.String filename)
          Sets the dataset filename.
 void setInstances(java.util.List<Instance> instances)
          Sets the dataset instances.
 java.lang.String toString()
           
 Dataset withoutDuplicates()
          Returns a new dataset instance without duplicated instances.
 
Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

CLASS_SEPARATOR

public static final java.lang.String CLASS_SEPARATOR
Default class separator for multi-label class value.

See Also:
Constant Field Values
Constructor Detail

Dataset

public Dataset(Metadata metadata)
Creates a new Dataset instance.

Parameters:
metadata - the metadata information.
Method Detail

getFilename

public java.lang.String getFilename()
Returns the dataset filename.

Returns:
the dataset filename.

setFilename

public void setFilename(java.lang.String filename)
Sets the dataset filename.

Parameters:
filename - the filename to set.

add

public void add(java.lang.String[] values)
Adds a new instance to the dataset.

Parameters:
values - the instance values.

iterator

public java.util.Iterator<Instance> iterator()
Specified by:
iterator in interface java.lang.Iterable<Instance>

hasDuplicates

public boolean hasDuplicates()
Checks if the dataset has duplicated instances (instances with the same values for every attribute).

Returns:
true if the dataset contains duplicated instances; false otherwise.

hasMissing

public boolean hasMissing()
Checks of the dataset contains instances with missing values.

Returns:
true if the dataset contains instances with missing values; false otherwise.

newInstance

public Instance newInstance()
Returns a new Instance associated with this dataset.

Returns:
a new Instance associated with this dataset.

getMetadata

public final Metadata getMetadata()
Returns the Metadata object associated with this dataset.

Returns:
the Metadata object associated with this dataset.

getSize

public int getSize()
Returns the size (number of instances) of the dataset.

Returns:
the size (number of instances) of the dataset.

getSize

public int getSize(java.lang.String c)
Returns the number of instances that belong to the specified class.

Parameters:
c - the class value.
Returns:
the number of instances that belong to the specified class found in the dataset.

getInstances

public java.util.List<Instance> getInstances()
Returns the list of instances.

Returns:
the list of instances.

getInstances

public java.util.List<Instance> getInstances(java.lang.String... c)
Returns the list of instances that belongs to the specified class. This method returns all instances that either have the specified class as its most specific class or as an ancestor class.

Parameters:
c - the class value.
Returns:
the list of instances that belongs to the specified class.

getInstances

public java.util.List<Instance> getInstances(Attribute attribute,
                                             java.lang.String value)
Returns the list of instances that have the specified (attribute, value) combination.

Parameters:
attribute - the attribute to check.
value - the value to check.
Returns:
the list of instances that have the specified (attribute, value) combination.

getInstances

public java.util.List<Instance> getInstances(Term term)
Returns the list of instances that satisfy the specified term.

Parameters:
term - the term to be used.
Returns:
the list of instances that satisfy the specified term.

getInstances

public java.util.List<Instance> getInstances(Label label)
Returns the list of instances that have the specified label. Note that this is different from the getInstances(String...) method, since it only returns the instances that have the exact label.

Parameters:
label - the label of the instances.
Returns:
the list of instances that have the specified label.

filter

public java.util.List<Instance> filter(java.lang.String... c)
Returns the list of instances that belongs to any of the specified class labels.

Parameters:
c - the specified class labels.
Returns:
the list of instances that belongs to any of the specified class labels.

isEmpty

public boolean isEmpty()
Returns true if the dataset contains no instances.

Returns:
true if the dataset contains no instances; false otherwise.

isMultilabel

public boolean isMultilabel()
Verifies if this is a multi-label dataset (i.e. contains at least one multi-label instance).

Returns:
true if this is a multi-label dataset; false otherwise.

setInstances

public void setInstances(java.util.List<Instance> instances)
Sets the dataset instances. Note that any instance contained in the dataset will be removed.

Parameters:
instances - the new dataset instances.

contains

public boolean contains(Attribute attribute,
                        java.lang.String value)
Checks if the dataset contains an instance with the (attribute,value) pair.

Parameters:
attribute - the attribute to check.
value - the value to check.
Returns:
true if the dataset contains an instance with the specified (attribute,value) pair; false otherwise.

remove

public boolean remove(Instance instance)
Removes the first occurrence in this dataset of the specified instance.

Parameters:
instance - the instance to be removed, if present.
Returns:
true if the dataset contained the specified instance; false otherwise.

remove

public boolean remove(int index)
Removes the instance with the specified index.

Parameters:
index - the instance index.
Returns:
true if the dataset contained the specified instance; false otherwise.

remove

public void remove(Attribute attribute)
Removes the specified attribute from the dataset. This operation will remove the attribute from the associated Metadata instance.

Parameters:
attribute - the attribute to be removed.

remove

public boolean remove(java.util.Collection<Instance> instances)
Removes all this dataset's instances that are contained in the specified collection.

Parameters:
instances - instances to be removed from this dataset.
Returns:
true if this dataset changed as a result of the call; false otherwise.

count

public int[] count()
Returns the class count vector.

Returns:
the class count vector.

frequency

public double[] frequency()
Returns the class frequency vector.

Returns:
the class frequency vector.

get

public Instance get(int index)
Returns the instance specified by the index.

Parameters:
index - the instance's index.
Returns:
the instance specified by the index.

withoutDuplicates

public Dataset withoutDuplicates()
Returns a new dataset instance without duplicated instances.

Returns:
a new dataset instance without duplicated instances.

computeDomains

public void computeDomains()
Computes the continuous attributes' domain values.


reset

public void reset()
Resets the instances' weights.


toString

public java.lang.String toString()
Overrides:
toString in class java.lang.Object

clone

public Dataset clone()
Overrides:
clone in class java.lang.Object


Copyright © 2013. All Rights Reserved.