A Short Introduction To Q-learning - Hado van Hasselt
A Short Introduction To Some Reinforcement Learning Algorithms
By Hado van Hasselt
Q-learning
Previous -- Up -- Next
Q-learning is perhaps the most well known reinforcement learning algorithm. Its equation is:
Neutral characteristics
It is off-policy (this means that its Q values approximate the optimal Q values, regardless of exploration).
It learns state-action values (Q values).
Advantages
Tabular Q-learning can be shown to reach optimal solutions, even under continued exploration.
Being the oldest of the considered algorithms, there has been much research and succesful applications with Q-learning.
Disadvantages
Has been shown to sometimes diverge when function approximators are used.
Cannot handle continuous action spaces.
Has no natural extension to eligibility traces.
Algorithm
The Q-learning algorithm in schematic form:
Compare with the similar, but different algorithms: QV-learning , Sarsa and Expected-Sarsa .
Selected relevant publications:
C. J. C. H. Watkins (1989), "Learning from Delayed Rewards".
C. J. C. H. Watkins and P. Dayan (1992), "{Q}-Learning", Machine Learning Journal, volume 8, number 3/4, Special Issue on Reinforcement Learning, may 1992.
Quick links:
Previous -- Up -- Next
Contact
My contact data can be found here .