A Short Introduction To Some RL Algorithms, Notation - Hado van Hasselt

A Short Introduction To Some Reinforcement Learning Algorithms

By Hado van Hasselt

Notation

Previous -- Up -- Next

On this website, we use the following notation conventions. Whenever a value (or function) is updated, we use the following notation to indicate such an update:

A_{t+1}(x) \overset{\alpha}{\longleftarrow} B_t

This equation means that the value of A(x), which is dependent on some input x, is updated towards some value B. The subscripts indicate temporal steps, making this a discrete time formulation. The alpha is a learning rate parameter: 0 \leq \alpha \leq 1, that indicates how large to step towards B is.

If the values of A(x) for all possible inputs x are stored in a table, one can understand the notation above to be equivalent to the following update:

A_{t+1}(x) = (1 - \alpha) A_t(x) + \alpha B_t

As another option, A(x) could be a parametrised function that is dependent on the input x and on some parameters w. Then, the notation can be understand to be shorthand for the following update on each of the parameters of this function:

w_{t+1} = w_t + \alpha \Big( B_t - A_t(x,\mathbf{w}) \Big) \frac{ \partial A_t(x,\mathbf{w}) }{ \partial w_t }

This update can be interpreted as a gradient descent update on the squared difference between the output A(x,w) and the target output B. The learning rate parameter again regulates the size of the update. For instance, a neural network is often used to store values and in this case the parameters w would correspond to the weights of this network. However, other function approximators can of course also be used. On this page, we will use the same general notaton from the first equation to fill in for any of these options.

Quick links:

Previous -- Up -- Next

Contact

My contact data can be found here.