GClasses

GClasses::GQLearner Class Reference

The base class of a Q-Learner. To use this class, there are four abstract methods you'll need to implement. See also the comment for GPolicyLearner. More...

#include <GReinforcement.h>

Inheritance diagram for GClasses::GQLearner:
GClasses::GPolicyLearner GClasses::GIncrementalLearnerQAgent

List of all members.

Public Member Functions

 GQLearner (sp_relation &pRelation, int actionDims, double *pInitialState, GRand *pRand, GAgentActionIterator *pActionIterator)
virtual ~GQLearner ()
void setLearningRate (double d)
 Sets the learning rate (often called "alpha"). If state is deterministic and actions have deterministic consequences, then this should be 1. If there is any non-determinism, there are three common approaches for picking the learning rate: 1- use a fairly small value (perhaps 0.1), 2- decay it over time (by calling this method before every iteration), 3- remember how many times 'n' each state has already been visited, and set the learning rate to 1/(n+1) before each iteration. The third technique is the best, but is awkward with continuous state spaces.
void setDiscountFactor (double d)
 Sets the factor for discounting future rewards (often called "gamma").
virtual double getQValue (const double *pState, const double *pAction)=0
 You must implement some kind of structure to store q-values. This method should return the current q-value for the specified state and action.
virtual void setQValue (const double *pState, const double *pAction, double qValue)=0
 This is the complement to GetQValue.
virtual void refinePolicyAndChooseNextAction (const double *pSenses, double *pOutActions)
 See GPolicyLearner::refinePolicyAndChooseNextAction.
void setActionCap (int n)
 This specifies a cap on how many actions to sample. (If actions are continuous, you obviously don't want to try them all.)

Protected Member Functions

virtual void chooseAction (const double *pSenses, double *pOutActions)=0
 This method picks the action during training. This method is called by refinePolicyAndChooseNextAction. (If it makes things easier, the agent may actually perform the action here, but it's a better practise to wait until refinePolicyAndChooseNextAction returns, because that keeps the "thinking" and "acting" stages separated from each other.) One way to pick the next action is to call GetQValue for all possible actions in the current state, and pick the one with the highest Q-value. But if you always pick the best action, you'll never discover things you don't already know about, so you need to find some balance between exploration and exploitation. One way to do this is to usually pick the best action, but sometimes pick a random action.
virtual double rewardFromLastAction ()=0
 A reward is obtained when the agent performs a particular action in a particular state. (A penalty is a negative reward. A reward of zero is no reward.) This method returns the reward that was obtained when the last action was performed. If you return UNKNOWN_REAL_VALUE, then the q-table will not be updated for that action.

Protected Attributes

GRandm_pRand
GAgentActionIteratorm_pActionIterator
double m_learningRate
double m_discountFactor
double * m_pSenses
double * m_pAction
int m_actionCap

Detailed Description

The base class of a Q-Learner. To use this class, there are four abstract methods you'll need to implement. See also the comment for GPolicyLearner.


Constructor & Destructor Documentation

GClasses::GQLearner::GQLearner ( sp_relation pRelation,
int  actionDims,
double *  pInitialState,
GRand pRand,
GAgentActionIterator pActionIterator 
)
virtual GClasses::GQLearner::~GQLearner ( ) [virtual]

Member Function Documentation

virtual void GClasses::GQLearner::chooseAction ( const double *  pSenses,
double *  pOutActions 
) [protected, pure virtual]

This method picks the action during training. This method is called by refinePolicyAndChooseNextAction. (If it makes things easier, the agent may actually perform the action here, but it's a better practise to wait until refinePolicyAndChooseNextAction returns, because that keeps the "thinking" and "acting" stages separated from each other.) One way to pick the next action is to call GetQValue for all possible actions in the current state, and pick the one with the highest Q-value. But if you always pick the best action, you'll never discover things you don't already know about, so you need to find some balance between exploration and exploitation. One way to do this is to usually pick the best action, but sometimes pick a random action.

Implemented in GClasses::GIncrementalLearnerQAgent.

virtual double GClasses::GQLearner::getQValue ( const double *  pState,
const double *  pAction 
) [pure virtual]

You must implement some kind of structure to store q-values. This method should return the current q-value for the specified state and action.

Implemented in GClasses::GIncrementalLearnerQAgent.

virtual void GClasses::GQLearner::refinePolicyAndChooseNextAction ( const double *  pSenses,
double *  pOutActions 
) [virtual]
virtual double GClasses::GQLearner::rewardFromLastAction ( ) [protected, pure virtual]

A reward is obtained when the agent performs a particular action in a particular state. (A penalty is a negative reward. A reward of zero is no reward.) This method returns the reward that was obtained when the last action was performed. If you return UNKNOWN_REAL_VALUE, then the q-table will not be updated for that action.

void GClasses::GQLearner::setActionCap ( int  n) [inline]

This specifies a cap on how many actions to sample. (If actions are continuous, you obviously don't want to try them all.)

void GClasses::GQLearner::setDiscountFactor ( double  d)

Sets the factor for discounting future rewards (often called "gamma").

void GClasses::GQLearner::setLearningRate ( double  d)

Sets the learning rate (often called "alpha"). If state is deterministic and actions have deterministic consequences, then this should be 1. If there is any non-determinism, there are three common approaches for picking the learning rate: 1- use a fairly small value (perhaps 0.1), 2- decay it over time (by calling this method before every iteration), 3- remember how many times 'n' each state has already been visited, and set the learning rate to 1/(n+1) before each iteration. The third technique is the best, but is awkward with continuous state spaces.

virtual void GClasses::GQLearner::setQValue ( const double *  pState,
const double *  pAction,
double  qValue 
) [pure virtual]

This is the complement to GetQValue.

Implemented in GClasses::GIncrementalLearnerQAgent.


Member Data Documentation

double* GClasses::GQLearner::m_pAction [protected]
double* GClasses::GQLearner::m_pSenses [protected]