GClasses

GClasses::GTransducer Class Reference

This is the base class of supervised learning algorithms (that may or may not have an internal model allowing them to generalize rows that were not available at training time). Note that the literature typically refers to supervised learning algorithms that can't generalize (because they lack an internal hypothesis model) as "Semi-supervised". (You cannot generalize with a semi-supervised algorithm--you have to train again with the new rows.) More...

#include <GLearner.h>

Inheritance diagram for GClasses::GTransducer:
GClasses::GAgglomerativeTransducer GClasses::GGraphCutTransducer GClasses::GNeighborTransducer GClasses::GSupervisedLearner GClasses::GBaselineLearner GClasses::GBucket GClasses::GDecisionTree GClasses::GEnsemble GClasses::GIdentityFunction GClasses::GIncrementalLearner GClasses::GLinearRegressor GClasses::GMeanMarginsTree GClasses::GPolynomial GClasses::GRandomForest GClasses::GWag

List of all members.

Public Member Functions

 GTransducer (GRand &rand)
virtual ~GTransducer ()
virtual bool canGeneralize ()
 Returns false because semi-supervised learners have no internal model, so they can't evaluate previously unseen rows.
virtual bool canTrainIncrementally ()
 Returns false because semi-supervised learners cannot be trained incrementally.
GMatrixtransduce (GMatrix &features1, GMatrix &labels1, GMatrix &features2)
 Predicts a set of labels to correspond with features2, such that these labels will be consistent with the patterns exhibited by features1 and labels1.
virtual void trainAndTest (GMatrix &trainFeatures, GMatrix &trainLabels, GMatrix &testFeatures, GMatrix &testLabels, double *pOutResults, std::vector< GMatrix * > *pNominalLabelStats=NULL)
 Trains and tests this learner. pOutResults should have an element for each label dim.
GMatrixcrossValidate (GMatrix &features, GMatrix &labels, size_t nFolds, RepValidateCallback pCB=NULL, size_t nRep=0, void *pThis=NULL)
 Perform n-fold cross validation on pData. Uses trainAndTest for each fold. pCB is an optional callback method for reporting intermediate stats. It can be NULL if you don't want intermediate reporting. nRep is just the rep number that will be passed to the callback. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. The results of each fold is returned in a dataset.
GMatrixrepValidate (GMatrix &features, GMatrix &labels, size_t reps, size_t nFolds, RepValidateCallback pCB=NULL, void *pThis=NULL)
 Perform cross validation "nReps" times and return the average score. (5 reps with 2 folds is preferred over 10-fold cross validation because it yields less type 1 error.) pCB is an optional callback method for reporting intermediate stats It can be NULL if you don't want intermediate reporting. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. The results of each fold is returned in a dataset.
double heuristicValidate (GMatrix &features, GMatrix &labels)
 This performs two-fold cross-validation on a shuffled non-uniform split of the data, and returns an error value that represents the results of all labels combined.
GRandrand ()
 Returns a reference to the random number generator associated with this object.

Protected Member Functions

virtual GMatrixtransduceInner (GMatrix &features1, GMatrix &labels1, GMatrix &features2)=0
 This is the algorithm's implementation of transduction. (It is called by the transduce method.)
virtual bool canImplicitlyHandleNominalFeatures ()
 Returns true iff this algorithm can implicitly handle nominal features. If it cannot, then the GNominalToCat transform will be used to convert nominal features to continuous values before passing them to it.
virtual bool canImplicitlyHandleContinuousFeatures ()
 Returns true iff this algorithm can implicitly handle continuous features. If it cannot, then the GDiscretize transform will be used to convert continuous features to nominal values before passing them to it.
virtual bool supportedFeatureRange (double *pOutMin, double *pOutMax)
 Returns true if this algorithm supports any feature value, or if it does not implicitly handle continuous features. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range.
virtual bool canImplicitlyHandleMissingFeatures ()
 Returns true iff this algorithm supports missing feature values. If it cannot, then an imputation filter will be used to predict missing values before any feature-vectors are passed to the algorithm.
virtual bool canImplicitlyHandleNominalLabels ()
 Returns true iff this algorithm can implicitly handle nominal labels (a.k.a. classification). If it cannot, then the GNominalToCat transform will be used during training to convert nominal labels to continuous values, and to convert categorical predictions back to nominal labels.
virtual bool canImplicitlyHandleContinuousLabels ()
 Returns true iff this algorithm can implicitly handle continuous labels (a.k.a. regression). If it cannot, then the GDiscretize transform will be used during training to convert nominal labels to continuous values, and to convert nominal predictions back to continuous labels.
virtual bool supportedLabelRange (double *pOutMin, double *pOutMax)
 Returns true if this algorithm supports any label value, or if it does not implicitly handle continuous labels. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range.

Protected Attributes

GRandm_rand

Detailed Description

This is the base class of supervised learning algorithms (that may or may not have an internal model allowing them to generalize rows that were not available at training time). Note that the literature typically refers to supervised learning algorithms that can't generalize (because they lack an internal hypothesis model) as "Semi-supervised". (You cannot generalize with a semi-supervised algorithm--you have to train again with the new rows.)


Constructor & Destructor Documentation

GClasses::GTransducer::GTransducer ( GRand rand)
virtual GClasses::GTransducer::~GTransducer ( ) [virtual]

Member Function Documentation

virtual bool GClasses::GTransducer::canGeneralize ( ) [inline, virtual]

Returns false because semi-supervised learners have no internal model, so they can't evaluate previously unseen rows.

Reimplemented in GClasses::GSupervisedLearner.

virtual bool GClasses::GTransducer::canImplicitlyHandleContinuousFeatures ( ) [inline, protected, virtual]

Returns true iff this algorithm can implicitly handle continuous features. If it cannot, then the GDiscretize transform will be used to convert continuous features to nominal values before passing them to it.

Reimplemented in GClasses::GNaiveBayes.

virtual bool GClasses::GTransducer::canImplicitlyHandleContinuousLabels ( ) [inline, protected, virtual]

Returns true iff this algorithm can implicitly handle continuous labels (a.k.a. regression). If it cannot, then the GDiscretize transform will be used during training to convert nominal labels to continuous values, and to convert nominal predictions back to continuous labels.

Reimplemented in GClasses::GAgglomerativeTransducer, GClasses::GGraphCutTransducer, GClasses::GBayesianModelAveraging, GClasses::GBayesianModelCombination, GClasses::GAdaBoost, GClasses::GNeighborTransducer, and GClasses::GNaiveBayes.

virtual bool GClasses::GTransducer::canImplicitlyHandleMissingFeatures ( ) [inline, protected, virtual]

Returns true iff this algorithm supports missing feature values. If it cannot, then an imputation filter will be used to predict missing values before any feature-vectors are passed to the algorithm.

Reimplemented in GClasses::GKNN, and GClasses::GNeuralNet.

virtual bool GClasses::GTransducer::canImplicitlyHandleNominalFeatures ( ) [inline, protected, virtual]

Returns true iff this algorithm can implicitly handle nominal features. If it cannot, then the GNominalToCat transform will be used to convert nominal features to continuous values before passing them to it.

Reimplemented in GClasses::GMeanMarginsTree, GClasses::GWag, GClasses::GNeighborTransducer, GClasses::GInstanceTable, GClasses::GLinearRegressor, GClasses::GNaiveInstance, GClasses::GNeuralNet, and GClasses::GPolynomial.

virtual bool GClasses::GTransducer::canImplicitlyHandleNominalLabels ( ) [inline, protected, virtual]

Returns true iff this algorithm can implicitly handle nominal labels (a.k.a. classification). If it cannot, then the GNominalToCat transform will be used during training to convert nominal labels to continuous values, and to convert categorical predictions back to nominal labels.

Reimplemented in GClasses::GMeanMarginsTree, GClasses::GWag, GClasses::GLinearRegressor, GClasses::GNaiveInstance, GClasses::GNeuralNet, and GClasses::GPolynomial.

virtual bool GClasses::GTransducer::canTrainIncrementally ( ) [inline, virtual]

Returns false because semi-supervised learners cannot be trained incrementally.

Reimplemented in GClasses::GIncrementalLearner.

GMatrix* GClasses::GTransducer::crossValidate ( GMatrix features,
GMatrix labels,
size_t  nFolds,
RepValidateCallback  pCB = NULL,
size_t  nRep = 0,
void *  pThis = NULL 
)

Perform n-fold cross validation on pData. Uses trainAndTest for each fold. pCB is an optional callback method for reporting intermediate stats. It can be NULL if you don't want intermediate reporting. nRep is just the rep number that will be passed to the callback. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. The results of each fold is returned in a dataset.

double GClasses::GTransducer::heuristicValidate ( GMatrix features,
GMatrix labels 
)

This performs two-fold cross-validation on a shuffled non-uniform split of the data, and returns an error value that represents the results of all labels combined.

GRand& GClasses::GTransducer::rand ( ) [inline]

Returns a reference to the random number generator associated with this object.

GMatrix* GClasses::GTransducer::repValidate ( GMatrix features,
GMatrix labels,
size_t  reps,
size_t  nFolds,
RepValidateCallback  pCB = NULL,
void *  pThis = NULL 
)

Perform cross validation "nReps" times and return the average score. (5 reps with 2 folds is preferred over 10-fold cross validation because it yields less type 1 error.) pCB is an optional callback method for reporting intermediate stats It can be NULL if you don't want intermediate reporting. pThis is just a pointer that will be passed to the callback for you to use however you want. It doesn't affect this method. The results of each fold is returned in a dataset.

virtual bool GClasses::GTransducer::supportedFeatureRange ( double *  pOutMin,
double *  pOutMax 
) [inline, protected, virtual]

Returns true if this algorithm supports any feature value, or if it does not implicitly handle continuous features. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range.

Reimplemented in GClasses::GNeuralNet.

virtual bool GClasses::GTransducer::supportedLabelRange ( double *  pOutMin,
double *  pOutMax 
) [inline, protected, virtual]

Returns true if this algorithm supports any label value, or if it does not implicitly handle continuous labels. If a limited range of continuous values is supported, returns false and sets pOutMin and pOutMax to specify the range.

Reimplemented in GClasses::GNeuralNet.

virtual void GClasses::GTransducer::trainAndTest ( GMatrix trainFeatures,
GMatrix trainLabels,
GMatrix testFeatures,
GMatrix testLabels,
double *  pOutResults,
std::vector< GMatrix * > *  pNominalLabelStats = NULL 
) [virtual]

Trains and tests this learner. pOutResults should have an element for each label dim.

Reimplemented in GClasses::GSupervisedLearner.

GMatrix* GClasses::GTransducer::transduce ( GMatrix features1,
GMatrix labels1,
GMatrix features2 
)

Predicts a set of labels to correspond with features2, such that these labels will be consistent with the patterns exhibited by features1 and labels1.

virtual GMatrix* GClasses::GTransducer::transduceInner ( GMatrix features1,
GMatrix labels1,
GMatrix features2 
) [protected, pure virtual]

This is the algorithm's implementation of transduction. (It is called by the transduce method.)

Implemented in GClasses::GAgglomerativeTransducer, GClasses::GGraphCutTransducer, GClasses::GNeighborTransducer, and GClasses::GSupervisedLearner.


Member Data Documentation