Expected-Sarsa is a variation of Sarsa. Its equation is:
The difference is that Expected-Sarsa weighs the action values in the next state according to the current action selection policy. Because this retains the same bias but reduces the variance, the algorithm can be viewed as an improvement over Sarsa. Because it is very similar to Sarsa, its properties are also similar, so we only list the advantages:
The Expected-Sarsa algorithm in schematic form:
Note that the algorithm is simpler than that of Sarsa and more similar in structure to Q-learning. The algorithm as given does require calculating the policy twice: once before and once after updating the state-action values. Of course, one could also decide to do this only once and reuse the policy that is calculated for the update of the Q-value for the selection of the next action. This results in a slightly different algorithm, which 1) in practice will perform almost exactly the same and 2) also converges theoretically.
Selected relevant publications:
My contact data can be found here.