A Short Introduction To Actor Critic - Hado van Hasselt

A Short Introduction To Some Reinforcement Learning Algorithms

By Hado van Hasselt

Actor-Critic Learning

Previous -- Up -- Next

Actor-Critic learning as we define it here is an algorithm that uses state values to update state-dependent action values. Its equation for the action values is:

P_{t+1}(s_t,a_t) = P_t(s_t,a_t) + \alpha_t \Big( r_t + \gamma V_t(s_{t+1}) - V_t \Big)

We can also write this update in our normal notation that extends to function approximators, where we note that this update looks a little bit counter-intuitive:

P_{t+1}(s_t,a_t) \overset{\alpha_t}{\longleftarrow} r_t + \gamma V_t(s_{t+1}) - V_t(s_t) + P_t(s_t,a_t)

Neutral characteristics

Advantages

Disadvantages

Algorithm

The Actor-Critic algorithm in schematic form:

Actor-Critic algorithm

Selected relevant publications:

Quick links:

Previous -- Up -- Next

Contact

My contact data can be found here.