GClasses
|
This is an implementation of GQLearner that uses an incremental learner for its Q-table and a SoftMax (usually pick the best action, but sometimes randomly pick the action) strategy to balance between exploration vs exploitation. To use this class, you need to supply an incremental learner (see the comment for the constructor for more details) and to implement the GetRewardForLastAction method. More...
#include <GReinforcement.h>
Public Member Functions | |
GIncrementalLearnerQAgent (sp_relation &pObsControlRelation, GIncrementalLearner *pQTable, int actionDims, double *pInitialState, GRand *pRand, GAgentActionIterator *pActionIterator, double softMaxThresh) | |
pQTable must be an incremental learner. If the relation for pQTable has n attributes, then the first (n-1) attributes refer to the sense (state) and action, and the last attribute refers to the Q-value (the current estimate of the utility of performing that action in that state). For actionDims, see the comment for GPolicyLearner::GPolicyLearner. pInitialState is the initial sense vector. If softMaxThresh is 0, it always picks a random action. If softMaxThresh is 1, it always picks the best action. For values in between, it does something in between. | |
virtual | ~GIncrementalLearnerQAgent () |
virtual double | getQValue (const double *pState, const double *pAction) |
See the comment for GQLearner::GetQValue. | |
virtual void | setQValue (const double *pState, const double *pAction, double qValue) |
See the comment for GQLearner::SetQValue. | |
Protected Member Functions | |
virtual void | chooseAction (const double *pSenses, double *pOutActions) |
This method picks the action during training. This method is called by refinePolicyAndChooseNextAction. (If it makes things easier, the agent may actually perform the action here, but it's a better practise to wait until refinePolicyAndChooseNextAction returns, because that keeps the "thinking" and "acting" stages separated from each other.) One way to pick the next action is to call GetQValue for all possible actions in the current state, and pick the one with the highest Q-value. But if you always pick the best action, you'll never discover things you don't already know about, so you need to find some balance between exploration and exploitation. One way to do this is to usually pick the best action, but sometimes pick a random action. | |
Protected Attributes | |
GIncrementalLearner * | m_pQTable |
double * | m_pBuf |
double | m_softMaxThresh |
This is an implementation of GQLearner that uses an incremental learner for its Q-table and a SoftMax (usually pick the best action, but sometimes randomly pick the action) strategy to balance between exploration vs exploitation. To use this class, you need to supply an incremental learner (see the comment for the constructor for more details) and to implement the GetRewardForLastAction method.
GClasses::GIncrementalLearnerQAgent::GIncrementalLearnerQAgent | ( | sp_relation & | pObsControlRelation, |
GIncrementalLearner * | pQTable, | ||
int | actionDims, | ||
double * | pInitialState, | ||
GRand * | pRand, | ||
GAgentActionIterator * | pActionIterator, | ||
double | softMaxThresh | ||
) |
pQTable must be an incremental learner. If the relation for pQTable has n attributes, then the first (n-1) attributes refer to the sense (state) and action, and the last attribute refers to the Q-value (the current estimate of the utility of performing that action in that state). For actionDims, see the comment for GPolicyLearner::GPolicyLearner. pInitialState is the initial sense vector. If softMaxThresh is 0, it always picks a random action. If softMaxThresh is 1, it always picks the best action. For values in between, it does something in between.
virtual GClasses::GIncrementalLearnerQAgent::~GIncrementalLearnerQAgent | ( | ) | [virtual] |
virtual void GClasses::GIncrementalLearnerQAgent::chooseAction | ( | const double * | pSenses, |
double * | pOutActions | ||
) | [protected, virtual] |
This method picks the action during training. This method is called by refinePolicyAndChooseNextAction. (If it makes things easier, the agent may actually perform the action here, but it's a better practise to wait until refinePolicyAndChooseNextAction returns, because that keeps the "thinking" and "acting" stages separated from each other.) One way to pick the next action is to call GetQValue for all possible actions in the current state, and pick the one with the highest Q-value. But if you always pick the best action, you'll never discover things you don't already know about, so you need to find some balance between exploration and exploitation. One way to do this is to usually pick the best action, but sometimes pick a random action.
Implements GClasses::GQLearner.
virtual double GClasses::GIncrementalLearnerQAgent::getQValue | ( | const double * | pState, |
const double * | pAction | ||
) | [virtual] |
See the comment for GQLearner::GetQValue.
Implements GClasses::GQLearner.
virtual void GClasses::GIncrementalLearnerQAgent::setQValue | ( | const double * | pState, |
const double * | pAction, | ||
double | qValue | ||
) | [virtual] |
See the comment for GQLearner::SetQValue.
Implements GClasses::GQLearner.
double* GClasses::GIncrementalLearnerQAgent::m_pBuf [protected] |
double GClasses::GIncrementalLearnerQAgent::m_softMaxThresh [protected] |