A Short Introduction To Actor Critic Learning Automata - Hado van Hasselt

A Short Introduction To Some Reinforcement Learning Algorithms

By Hado van Hasselt

Actor-Critic Learning Automaton (Acla)

Previous -- Up -- Next

Acla is an algorithm that uses state values to update state-dependent action values. For the update to these action values, one observes the update to the state value of the last state. If this update increased the value of the state, the action that was performed was a good action and the update is:

P_{t+1}(s_t,a_t) \overset{\alpha_t}{\longleftarrow} 1

If the state value decreases, the action was not such a good idea, and the update is:

P_{t+1}(s_t,a_t) \overset{\alpha_t}{\longleftarrow} 1

Note that both updated preserve the property that the values of the actions are between 0 and 1, if they are initialised in this interval.

Neutral characteristics

Advantages

Disadvantages

Algorithm

The Acla algorithm in schematic form:

Acla algorithm

In the publication below, it is shown that in at least some cases, Acla performs a lot better than similar algorithms such as Q-learning and Sarsa. The only algorithm that was shown to be able to perform better on the selected task is the Cacla algorithm.

Selected relevant publications:

Quick links:

Previous -- Up -- Next

Contact

My contact data can be found here.