A Short Introduction To Expected Sarsa - Hado van Hasselt

A Short Introduction To Some Reinforcement Learning Algorithms

By Hado van Hasselt

Expected-Sarsa

Previous -- Up -- Next

Expected-Sarsa is a variation of Sarsa. Its equation is:

Q_{t+1}(s_t,a_t) \overset{\alpha_t}{\longleftarrow} r_t + \gamma \sum_a \pi_t(s_{t+1},a) Q_t(s_{t+1},a)

The difference is that Expected-Sarsa weighs the action values in the next state according to the current action selection policy. Because this retains the same bias but reduces the variance, the algorithm can be viewed as an improvement over Sarsa. Because it is very similar to Sarsa, its properties are also similar, so we only list the advantages:

Advantages over Sarsa

Algorithm

The Expected-Sarsa algorithm in schematic form:

Sarsa algorithm

Note that the algorithm is simpler than that of Sarsa and more similar in structure to Q-learning. The algorithm as given does require calculating the policy twice: once before and once after updating the state-action values. Of course, one could also decide to do this only once and reuse the policy that is calculated for the update of the Q-value for the selection of the next action. This results in a slightly different algorithm, which 1) in practice will perform almost exactly the same and 2) also converges theoretically.

Selected relevant publications:

Quick links:

Previous -- Up -- Next

Contact

My contact data can be found here.