Cacla can be viewed as the continuous version of the Acla algorithm. The idea is that again state values are stored in a table or function approximation. However, instead of also storing state-action values, Cacla stores a single value for each state: an approximation for the optimal action. Of course, in problems with multi-dimensional actions, this can be a vector instead of a value.
The idea and implementation of Cacla is very simple. You get the output of your actor. Then, you explore around this value (for instance with gaussian exploration). Then, if the value of the state increases after performing the action, you update your actor towards this action:
If
then
If the state value decreases, the action was not such a good idea, and you do not update the actor.
The Cacla algorithm in schematic form:
Comparing the algorithm to Acla shows definite similarities. However, note that Cacla does not update when the state value decreases. This is different from Acla, but was shown to be a better choice for Cacla, since updating away from the action that was selected does not guarantee that you are updating towards an action that is better than the current output of the actor. For a more detailed discussion, see the second publication below. The second publication also shows variations of Cacla and compares the algorithm to other continuous action algorithms.
The output of the actor of Cacla can in principle be any continuous action value or action vector. However, also when these values get rounded to a limited, finite action space, Cacla can outperform discrete algorithms such as Q-learning. See the first publication below for details. See also this page from my recent book chapter. In that chapter, I compare Cacla to state-of-the-art algorithms on a double pole balancing task. The algorithms are the policy-gradient algorithm called natural actor critic (NAC) and the evolutionary strategy algorithm called CMA-ES. Cacla outperforms both those algorithms by a wide margin, quickly reaching better performance levels than either at lower amounts of experience and at lower computational costs. This shows that Cacla is a promising algorithm for continuous domains.
One potential drawback of the Cacla algorithm - and of all other non-linear optimization algorithms - is that the algorithm can get stuck in a relatively poor local optimum. The chance this happens can be reduces by using more than one actor. For details on how this can be done, see Chapter 7 of my dissertation and Section 3.2.4 from the aforementioned book chapter.
Selected relevant publications:
My contact data can be found here.