GClasses
|
This is an efficient learning algorithm. It divides on the attributes that reduce entropy the most, or alternatively can make random divisions. More...
#include <GDecisionTree.h>
Public Types | |
enum | DivisionAlgorithm { MINIMIZE_ENTROPY, RANDOM } |
Public Member Functions | |
GDecisionTree (GRand &rand) | |
GDecisionTree (GDomNode *pNode, GLearnerLoader &ll) | |
Loads from a DOM. | |
virtual | ~GDecisionTree () |
virtual GDomNode * | serialize (GDom *pDoc) |
Marshal this object into a DOM, which can then be converted to a variety of serial formats. | |
void | useRandomDivisions (size_t randomDraws=1) |
Specifies for this decision tree to use random divisions (instead of divisions that reduce entropy). Random divisions make the algorithm train somewhat faster, and also increase model variance, so it is better suited for ensembles, but random divisions also make the decision tree vulnerable to problems with irrelevant attributes. | |
size_t | leafThresh () |
Returns the leaf threshold. | |
void | setLeafThresh (size_t n) |
Sets the leaf threshold. When the number of samples is <= this value, it will no longer try to divide the data, but will create a leaf node. The default value is 1. For noisy data, a larger value may be advantageous. | |
void | setMaxLevels (size_t n) |
Sets the max levels. When a path from the root to the current node contains n nodes (including the root), it will no longer try to divide the data, but will create a leaf node. If set to 0, then there is no maximum. 0 is the default. | |
virtual void | clear () |
Frees the model. | |
size_t | treeSize () |
Returns the number of nodes in this tree. | |
void | print (std::ostream &stream, GArffRelation *pFeatureRel=NULL, GArffRelation *pLabelRel=NULL) |
Prints an ascii representation of the decision tree to the specified stream. pRelation is an optional relation that can be supplied in order to provide better meta-data to make the print-out richer. | |
void | autoTune (GMatrix &features, GMatrix &labels) |
Uses cross-validation to find a set of parameters that works well with the provided data. | |
Static Public Member Functions | |
static void | test () |
Performs unit tests for this class. Throws an exception if there is a failure. | |
Protected Member Functions | |
virtual void | trainInner (GMatrix &features, GMatrix &labels) |
See the comment for GSupervisedLearner::trainInner. | |
virtual void | predictInner (const double *pIn, double *pOut) |
See the comment for GSupervisedLearner::predictInner. | |
virtual void | predictDistributionInner (const double *pIn, GPrediction *pOut) |
See the comment for GSupervisedLearner::predictDistributionInner. | |
GDecisionTreeLeafNode * | findLeaf (const double *pIn, size_t *pDepth) |
Finds the leaf node that corresponds with the specified feature vector. | |
GDecisionTreeNode * | buildBranch (GMatrix &features, GMatrix &labels, std::vector< size_t > &attrPool, size_t nDepth, size_t tolerance) |
A recursive helper method used to construct the decision tree. | |
double | measureInfoGain (GMatrix *pData, size_t nAttribute, double *pPivot) |
InfoGain is defined as the difference in entropy in the data before and after dividing it based on the specified attribute. For continuous attributes it uses the difference between the original variance and the sum of the variances of the two parts after dividing at the point the maximizes this value. | |
size_t | pickDivision (GMatrix &features, GMatrix &labels, double *pPivot, std::vector< size_t > &attrPool, size_t nDepth) |
Protected Attributes | |
sp_relation | m_pFeatureRel |
sp_relation | m_pLabelRel |
GDecisionTreeNode * | m_pRoot |
DivisionAlgorithm | m_eAlg |
size_t | m_leafThresh |
size_t | m_randomDraws |
size_t | m_maxLevels |
This is an efficient learning algorithm. It divides on the attributes that reduce entropy the most, or alternatively can make random divisions.
GClasses::GDecisionTree::GDecisionTree | ( | GRand & | rand | ) |
GClasses::GDecisionTree::GDecisionTree | ( | GDomNode * | pNode, |
GLearnerLoader & | ll | ||
) |
Loads from a DOM.
virtual GClasses::GDecisionTree::~GDecisionTree | ( | ) | [virtual] |
Uses cross-validation to find a set of parameters that works well with the provided data.
GDecisionTreeNode* GClasses::GDecisionTree::buildBranch | ( | GMatrix & | features, |
GMatrix & | labels, | ||
std::vector< size_t > & | attrPool, | ||
size_t | nDepth, | ||
size_t | tolerance | ||
) | [protected] |
A recursive helper method used to construct the decision tree.
virtual void GClasses::GDecisionTree::clear | ( | ) | [virtual] |
Frees the model.
Implements GClasses::GSupervisedLearner.
GDecisionTreeLeafNode* GClasses::GDecisionTree::findLeaf | ( | const double * | pIn, |
size_t * | pDepth | ||
) | [protected] |
Finds the leaf node that corresponds with the specified feature vector.
size_t GClasses::GDecisionTree::leafThresh | ( | ) | [inline] |
Returns the leaf threshold.
double GClasses::GDecisionTree::measureInfoGain | ( | GMatrix * | pData, |
size_t | nAttribute, | ||
double * | pPivot | ||
) | [protected] |
InfoGain is defined as the difference in entropy in the data before and after dividing it based on the specified attribute. For continuous attributes it uses the difference between the original variance and the sum of the variances of the two parts after dividing at the point the maximizes this value.
size_t GClasses::GDecisionTree::pickDivision | ( | GMatrix & | features, |
GMatrix & | labels, | ||
double * | pPivot, | ||
std::vector< size_t > & | attrPool, | ||
size_t | nDepth | ||
) | [protected] |
virtual void GClasses::GDecisionTree::predictDistributionInner | ( | const double * | pIn, |
GPrediction * | pOut | ||
) | [protected, virtual] |
See the comment for GSupervisedLearner::predictDistributionInner.
Implements GClasses::GSupervisedLearner.
virtual void GClasses::GDecisionTree::predictInner | ( | const double * | pIn, |
double * | pOut | ||
) | [protected, virtual] |
See the comment for GSupervisedLearner::predictInner.
Implements GClasses::GSupervisedLearner.
void GClasses::GDecisionTree::print | ( | std::ostream & | stream, |
GArffRelation * | pFeatureRel = NULL , |
||
GArffRelation * | pLabelRel = NULL |
||
) |
Prints an ascii representation of the decision tree to the specified stream. pRelation is an optional relation that can be supplied in order to provide better meta-data to make the print-out richer.
Marshal this object into a DOM, which can then be converted to a variety of serial formats.
Implements GClasses::GSupervisedLearner.
void GClasses::GDecisionTree::setLeafThresh | ( | size_t | n | ) | [inline] |
Sets the leaf threshold. When the number of samples is <= this value, it will no longer try to divide the data, but will create a leaf node. The default value is 1. For noisy data, a larger value may be advantageous.
void GClasses::GDecisionTree::setMaxLevels | ( | size_t | n | ) | [inline] |
Sets the max levels. When a path from the root to the current node contains n nodes (including the root), it will no longer try to divide the data, but will create a leaf node. If set to 0, then there is no maximum. 0 is the default.
static void GClasses::GDecisionTree::test | ( | ) | [static] |
Performs unit tests for this class. Throws an exception if there is a failure.
Reimplemented from GClasses::GSupervisedLearner.
virtual void GClasses::GDecisionTree::trainInner | ( | GMatrix & | features, |
GMatrix & | labels | ||
) | [protected, virtual] |
See the comment for GSupervisedLearner::trainInner.
Implements GClasses::GSupervisedLearner.
size_t GClasses::GDecisionTree::treeSize | ( | ) |
Returns the number of nodes in this tree.
void GClasses::GDecisionTree::useRandomDivisions | ( | size_t | randomDraws = 1 | ) | [inline] |
Specifies for this decision tree to use random divisions (instead of divisions that reduce entropy). Random divisions make the algorithm train somewhat faster, and also increase model variance, so it is better suited for ensembles, but random divisions also make the decision tree vulnerable to problems with irrelevant attributes.
DivisionAlgorithm GClasses::GDecisionTree::m_eAlg [protected] |
size_t GClasses::GDecisionTree::m_leafThresh [protected] |
size_t GClasses::GDecisionTree::m_maxLevels [protected] |
sp_relation GClasses::GDecisionTree::m_pFeatureRel [protected] |
sp_relation GClasses::GDecisionTree::m_pLabelRel [protected] |
GDecisionTreeNode* GClasses::GDecisionTree::m_pRoot [protected] |
size_t GClasses::GDecisionTree::m_randomDraws [protected] |