GClasses

GClasses::GCollaborativeFilter Class Reference

The base class for collaborative filtering recommender systems. More...

#include <GRecommender.h>

Inheritance diagram for GClasses::GCollaborativeFilter:
GClasses::GBagOfRecommenders GClasses::GBaselineRecommender GClasses::GDenseClusterRecommender GClasses::GInstanceRecommender GClasses::GMatrixFactorization GClasses::GNonlinearPCA GClasses::GSparseClusterRecommender

List of all members.

Public Member Functions

 GCollaborativeFilter (GRand &rand)
 GCollaborativeFilter (GDomNode *pNode, GLearnerLoader &ll)
virtual ~GCollaborativeFilter ()
virtual void train (GMatrix &data)=0
 Trains this recommender system. Let R be an m-by-n sparse matrix of known ratings from m users of n items. pData should contain 3 columns, and one row for each known element in R. Column 0 in pData specifies the user index from 0 to m-1, column 1 in pData specifies the item index from 0 to n-1, and column 2 in pData specifies the rating vector for that user-item pair. All attributes in pData should be continuous.
void trainDenseMatrix (GMatrix &data, GMatrix *pLabels=NULL)
 Train from an m-by-n dense matrix, where m is the number of users and n is the number of items. All attributes must be continuous. Missing values are indicated with UNKNOWN_REAL_VALUE. If pLabels is non-NULL, then the labels will be appended as additional items.
virtual double predict (size_t user, size_t item)=0
 This returns a prediction for how the specified user will rate the specified item. (The model must be trained before this method is called. Also, some values for that user and item should have been included in the training set, or else this method will have no basis to make a good prediction.)
virtual void impute (double *pVec, size_t dims)=0
 pVec should be a vector of n real values, where n is the number of items/attributes/columns in the data that was used to train the model. to UNKNOWN_REAL_VALUE. This method will evaluate the known elements and impute (predict) values for the unknown elements. (The model should be trained before this method is called. Unlike the predict method, this method can operate on row-vectors that were not part of the training data.)
virtual GDomNodeserialize (GDom *pDoc)=0
 Marshal this object into a DOM that can be converted to a variety of formats. (Implementations of this method should use baseDomNode.)
double crossValidate (GMatrix &data, size_t folds, double *pOutMAE=NULL)
 This randomly assigns each rating to one of the folds. Then, for each fold, it calls train with a dataset that contains everything except for the ratings in that fold. It predicts values for the items in the fold, and returns the mean-squared difference between the predictions and the actual ratings. If pOutMAE is non-NULL, it will be set to the mean-absolute error.
double trainAndTest (GMatrix &train, GMatrix &test, double *pOutMAE=NULL)
 This trains on the training set, and then tests on the test set. Returns the mean-squared difference between actual and target predictions.
GMatrixprecisionRecall (GMatrix &data, bool ideal=false)
 This divides the data into two equal-size parts. It trains on one part, and then measures the precision/recall using the other part. It returns a three-column data set with recall scores in column 0 and corresponding precision scores in column 1. The false-positive rate is in column 2. (So, if you want a precision-recall plot, just drop column 2. If you want an ROC curve, drop column 1 and swap the remaining two columns.) This method assumes the ratings range from 0 to 1, so be sure to scale the ratings to fit that range before calling this method. If ideal is true, then it will ignore your model and report the ideal results as if your model always predicted the correct rating. (This is useful because it shows the best possible results.)
void basicTest (double minMSE)
 Performs a basic unit test on this collaborative filter.

Static Public Member Functions

static double areaUnderCurve (GMatrix &data)
 Pass in the data returned by the precisionRecall function (unmodified), and this will compute the area under the ROC curve.

Protected Member Functions

GDomNodebaseDomNode (GDom *pDoc, const char *szClassName)
 Child classes should use this in their implementation of serialize.

Protected Attributes

GRandm_rand

Detailed Description

The base class for collaborative filtering recommender systems.


Constructor & Destructor Documentation

GClasses::GCollaborativeFilter::GCollaborativeFilter ( GRand rand) [inline]
GClasses::GCollaborativeFilter::GCollaborativeFilter ( GDomNode pNode,
GLearnerLoader ll 
)
virtual GClasses::GCollaborativeFilter::~GCollaborativeFilter ( ) [inline, virtual]

Member Function Documentation

static double GClasses::GCollaborativeFilter::areaUnderCurve ( GMatrix data) [static]

Pass in the data returned by the precisionRecall function (unmodified), and this will compute the area under the ROC curve.

GDomNode* GClasses::GCollaborativeFilter::baseDomNode ( GDom pDoc,
const char *  szClassName 
) [protected]

Child classes should use this in their implementation of serialize.

void GClasses::GCollaborativeFilter::basicTest ( double  minMSE)

Performs a basic unit test on this collaborative filter.

double GClasses::GCollaborativeFilter::crossValidate ( GMatrix data,
size_t  folds,
double *  pOutMAE = NULL 
)

This randomly assigns each rating to one of the folds. Then, for each fold, it calls train with a dataset that contains everything except for the ratings in that fold. It predicts values for the items in the fold, and returns the mean-squared difference between the predictions and the actual ratings. If pOutMAE is non-NULL, it will be set to the mean-absolute error.

virtual void GClasses::GCollaborativeFilter::impute ( double *  pVec,
size_t  dims 
) [pure virtual]

pVec should be a vector of n real values, where n is the number of items/attributes/columns in the data that was used to train the model. to UNKNOWN_REAL_VALUE. This method will evaluate the known elements and impute (predict) values for the unknown elements. (The model should be trained before this method is called. Unlike the predict method, this method can operate on row-vectors that were not part of the training data.)

Implemented in GClasses::GBaselineRecommender, GClasses::GInstanceRecommender, GClasses::GSparseClusterRecommender, GClasses::GDenseClusterRecommender, GClasses::GMatrixFactorization, GClasses::GNonlinearPCA, and GClasses::GBagOfRecommenders.

GMatrix* GClasses::GCollaborativeFilter::precisionRecall ( GMatrix data,
bool  ideal = false 
)

This divides the data into two equal-size parts. It trains on one part, and then measures the precision/recall using the other part. It returns a three-column data set with recall scores in column 0 and corresponding precision scores in column 1. The false-positive rate is in column 2. (So, if you want a precision-recall plot, just drop column 2. If you want an ROC curve, drop column 1 and swap the remaining two columns.) This method assumes the ratings range from 0 to 1, so be sure to scale the ratings to fit that range before calling this method. If ideal is true, then it will ignore your model and report the ideal results as if your model always predicted the correct rating. (This is useful because it shows the best possible results.)

virtual double GClasses::GCollaborativeFilter::predict ( size_t  user,
size_t  item 
) [pure virtual]

This returns a prediction for how the specified user will rate the specified item. (The model must be trained before this method is called. Also, some values for that user and item should have been included in the training set, or else this method will have no basis to make a good prediction.)

Implemented in GClasses::GBaselineRecommender, GClasses::GInstanceRecommender, GClasses::GSparseClusterRecommender, GClasses::GDenseClusterRecommender, GClasses::GMatrixFactorization, GClasses::GNonlinearPCA, and GClasses::GBagOfRecommenders.

virtual GDomNode* GClasses::GCollaborativeFilter::serialize ( GDom pDoc) [pure virtual]

Marshal this object into a DOM that can be converted to a variety of formats. (Implementations of this method should use baseDomNode.)

Implemented in GClasses::GBaselineRecommender, GClasses::GInstanceRecommender, GClasses::GSparseClusterRecommender, GClasses::GDenseClusterRecommender, GClasses::GMatrixFactorization, GClasses::GNonlinearPCA, and GClasses::GBagOfRecommenders.

virtual void GClasses::GCollaborativeFilter::train ( GMatrix data) [pure virtual]

Trains this recommender system. Let R be an m-by-n sparse matrix of known ratings from m users of n items. pData should contain 3 columns, and one row for each known element in R. Column 0 in pData specifies the user index from 0 to m-1, column 1 in pData specifies the item index from 0 to n-1, and column 2 in pData specifies the rating vector for that user-item pair. All attributes in pData should be continuous.

Implemented in GClasses::GBaselineRecommender, GClasses::GInstanceRecommender, GClasses::GSparseClusterRecommender, GClasses::GDenseClusterRecommender, GClasses::GMatrixFactorization, GClasses::GNonlinearPCA, and GClasses::GBagOfRecommenders.

double GClasses::GCollaborativeFilter::trainAndTest ( GMatrix train,
GMatrix test,
double *  pOutMAE = NULL 
)

This trains on the training set, and then tests on the test set. Returns the mean-squared difference between actual and target predictions.

void GClasses::GCollaborativeFilter::trainDenseMatrix ( GMatrix data,
GMatrix pLabels = NULL 
)

Train from an m-by-n dense matrix, where m is the number of users and n is the number of items. All attributes must be continuous. Missing values are indicated with UNKNOWN_REAL_VALUE. If pLabels is non-NULL, then the labels will be appended as additional items.


Member Data Documentation