In this section we only consider algorithms that store an approximation of the expected reward of each state-action pair. This approximation is also called a Q value, and is usually denoted Q(s,a) for a given state-action pair. Similarly, in control theory one often talks about the value J(x,u), where x is the state and u is the action.
The algorithms in this section:
My contact data can be found here.