Actor-Critic learning as we define it here is an algorithm that uses state values to update state-dependent action values. Its equation for the action values is:
We can also write this update in our normal notation that extends to function approximators, where we note that this update looks a little bit counter-intuitive:
Neutral characteristics
It is on-policy.
Learns preference values that do not hold explicit information on the expected discounted rewards.
Advantages
Using state values often speeds up learning.
State values are easily extendable to eligibility traces.
Disadvantages
Cannot handle continuous action spaces.
Algorithm
The Actor-Critic algorithm in schematic form:
Selected relevant publications:
A.G. Barto, R.S. Sutton and C. Anderson (1983). Neuron-like adaptive elements that can solve difficult learning control problems, IEEE Transactions on Systems, Man, and Cybernetics, SMC-13: 834-846, 1983.
R.S. Sutton and A.G. Barto (1998), Reinforcement Learning: An Introduction, MIT Press, 1998.