A Short Introduction To QV-learning - Hado van Hasselt

A Short Introduction To Some Reinforcement Learning Algorithms

By Hado van Hasselt

QV-learning

Previous -- Up -- Next

QV-learning is a natural extension of Q-learning and Sarsa to the case where we also use state values. Its equation is:

Q_{t+1}(s_t,a_t) \overset{\alpha_t}{\longleftarrow} r_t + \gamma V_t(s_{t+1})

Neutral characteristics

Advantages

Disadvantages

Algorithm

The QV-learning algorithm in schematic form:

QV-learning algorithm

Comparing this algorithm to Q-learning we see that the state value takes the place of the value of the highest valued action. Amongst other things, this makes the algorithm on-policy and it makes it easier to use eligibility traces.

Selected relevant publications:

  • Quick links:

    Previous -- Up -- Next

    Contact

    My contact data can be found here.