# Research Semester Programme Machine Learning TheorySeminar++

**When**

07-06-2023 from 15:00 to 17:00
**Where**

L016, CWI

## Overview

Seminar++ meetings consist of a one-hour lecture building up to an open problem, followed by an hour of brainstorming time. The meeting is intended for interested researchers including PhD students. These meetings are freely accessible without registration. Cookies and tea will be provided in the half-time break.

This lecture is part of a series of 8.

Odysseas Kanavetas

Assistant Professsor at the University of Leiden.

### Asymptotically optimal control for Markov Decision Processes (MDP) under side constraints

**Abstract:** After a brief review of the multi-armed bandit (MAB) problem and its online machine learning applications, we present our work on the model with side constraints. The constraints represent circumstances in which bandit activations are restricted by the availability of certain resources that are replenished at a constant rate.

Then, we consider the problem of adaptive control for Markov Decision Processes (MDP), under side constraints, when there is incomplete information for the transition probabilities and its rewards. Under suitable irreducibility assumptions for the MDP we establish a lower bound for the regret. An open problem is to construct adaptive policies that maximize the rate of convergence of realized rewards to that of the optimal (non adaptive) policy under complete information. We also discuss applications for queuing control problems and reliability models.