A Short Introduction To State and State-Action Values - Hado van Hasselt

A Short Introduction To Some Reinforcement Learning Algorithms

By Hado van Hasselt

State and State-Action Values

Previous -- Up -- Next

This section discusses algorithms that store the value of a state, in addition to a value for each action that helps determine which action to choose. In that sense, all the algorithms in this section could be called Actor-Critic methods, although we reserve that name for the version that was coined in the book by Sutton and Barto. Note that here a state-action value need not correspond to the expected future rewards, except in that the action with the highest expected reward should normally also receive the highest state-action value.

All algorithms in this section use the following update rule to update the state values:

V_{t+1}(s_t)     \overset{\beta_t}{\longleftarrow}  r_t + \gamma V_t(s_{t+1})

Note that this equation is easily extended to eligibility traces and will in general become more reliable more quickly than Q values, because the state space is smaller than the combined state-action space. Especially in problems with many actions, this can be a big advantage.

The algorithms in this section:

Cacla also uses state values, but can be found in the section containing the continuous action algorithms.

Quick links:

Previous -- Up -- Next

Contact

My contact data can be found here.