| Home | My publications | Source code | Dissertation |
You can now download the chapter on reinforcement learning in continuous spaces that I wrote for the upcoming "Reinforcement Learning: State of the Art" book from my publications. This chapter surveys the available literature on learning in Markov decision processes (MDPs) with continuous states and/or continuous actions and presents some new results.
In addition to a PDF, the chapter is also available in HTML, to be viewed online. (Follow that link to view the abstract.)
Update: The chapter can also be found on the Springer website, along with the rest of the book, for those of you who have access beyond the paywall. (I will also keep the chapter online on my publications page, for unrestricted access.) Incidentally, the full title of the book is:
"Reinforcement Learning: State of the Art", Adaptation, Learning, and Optimization, 2012, Volume 12, Editors Marco A. Wiering and Martijn van Otterlo
Our paper on Best-Match equations was accepted for publication at JMLR. The abstract is given below and a preprint of the paper can be found in my publications.
by Harm van Seijen, Shimon Whiteson, Hado van Hasselt and Marco Wiering
Abstract This article presents and evaluates best-match learning, a new approach to reinforcement learning that trades off the sample efficiency of model-based methods with the space efficiency of model-free methods. Best-match learning works by approximating the solution to a set of best-match equations, which combine a sparse model with a model-free Q-value function constructed from samples not used by the model. We prove that, unlike regular sparse model-based methods, best-match learning is guaranteed to converge to the optimal Q-values in the tabular case. Empirical results demonstrate that best-match learning can substantially outperform regular sparse model-based methods, as well as several model- free methods that strive to improve the sample efficiency of temporal-difference methods. In addition, we demonstrate that best-match learning can be successfully combined with function approximation.
On January 17, 2011, I have defended my dissertation "Insights in Reinforcement Learning".
I am a researcher in artificial intelligence. In particular, I specialize in reinforcement learning, which can be considered a subfield of machine learning. In reinforcement learning, the goal is to construct and analyze algorithms that can automatically learn good policies of behavior, based on reinforcement signals that are observed from the environment the learning agent operates in.
Examples of settings where reinforcement learning can be useful include games, robotics and economic settings. An important difference between reinforcement learning and most other machine learning techniques is that it is assumed there are no examples of good behavior. In other words, the agent is not told what it should do in any particular situation or what the precise value of an action is. Rather, it must distill this from the general reinforcement signal.
My own research mostly focusses on model-free temporal-difference learning algorithms. These algorithms do not try to model the environment in order to find good policies, but rather they try to construct a good estimate of the value of each action in each state. I have analyzed existing algorithms, such as the well-known Q-learning algorithm, and have proposed improvements and alternatives for these algorithms.
My dissertation is called "Insights in Reinforcement Learning, formal analysis and empirical evaluation of temporal-difference algorithms". As the title suggests, it discusses temporal-difference learning methods such as Q-learning and Sarsa, as well as new algorithms and policy-based ensembles of algorithms. In the dissertation, I discuss how to apply these algorithms to MDPs with continuous state and action spaces and introduces a temporal-difference algorithm called Cacla that can be used in fully continuous domains. I demonstrate that Cacla can outperform the current state of the art, such as CMA-ES and natural actor-critic (NAC), in problems with continuous actions. See the paper on Cacla or chapter 7 of my dissertation for more on reinforcement learning in problems with continuous states and actions.
Other contributions include the analysis that Q-learning suffers from an overestimation bias. See the my Double Q-learning paper for more on this. Furthermore, I analyze and present other temporal-difference methods and ensemble methods that can be used to combine multiple such algorithms into more robust combined learning algorithms. For more information on any of these topics and more, see my dissertation.
A very good introduction to the field of reinforcement learning is the book by Sutton and Barto, even though by now it is slightly dated (1998).
I am currently employed as postdoctoral researcher at CWI (Centrum Wiskunde & Informatica, which translates to Center for Mathematics and Computer Science). I have defended my Ph.D. thesis at the University of Utrecht on the 17th of January 2011. I have conducted my Ph.D. research mainly under supervision of Marco Wiering.
| Name: | Hado van Hasselt |
| Date of Birth: | September 12th, 1979 |
| Nationality: | Dutch |
| 2010 | - | present | Post-doctoral researcher at CWI |
| 2006 | - | 2011 | Ph.D. student |
| 2001 | - | 2005 | Several student assistentships |
| Assisting the course Imperative Programming - Java (twice). | |||
| Assisting the course Mathematics for AI (twice). | |||
| Assisting the course Mathematics for Neural Networks (twice). | |||
| 1999 | - | 2003 | Several helpdesk jobs. |
| 2000 | - | 2006 | Cognitive Artificial Intelligence (Cognitieve Kunstmatige Intelligentie). |
| At the University of Utrecht. |
My publications can be found on my publications page.
My contact data can be found at CWI.