RSP

Research Semester Programme Machine Learning TheorySeminar++

Overview

Seminar++ meetings consist of a one-hour lecture building up to an open problem, followed by an hour of brainstorming time. The meeting is intended for interested researchers including PhD students. These meetings are freely accessible without registration. Cookies and tea will be provided in the half-time break.

This lecture is part of a series of 8.

Odysseas Kanavetas

Odysseas Kanavetas
Assistant Professsor at the University of Leiden.

Asymptotically optimal control for Markov Decision Processes (MDP) under side constraints

Abstract: After a brief review of the multi-armed bandit (MAB) problem and its online machine learning applications, we present our work on the model with side constraints. The constraints represent circumstances in which bandit activations are restricted by the availability of certain resources that are replenished at a constant rate.

Then, we consider the problem of adaptive control for Markov Decision Processes (MDP), under side constraints, when there is incomplete information for the transition probabilities and its rewards. Under suitable irreducibility assumptions for the MDP we establish a lower bound for the regret. An open problem is to construct adaptive policies that maximize the rate of convergence of realized rewards to that of the optimal (non adaptive) policy under complete information. We also discuss applications for queuing control problems and reliability models.