- Decision problems; Bandit problems; Markov decision proceses; Exact MDP algorithms and computational complexity; Stochastic approximation, asymptotic convergence and PAC learning; Function approximation, neural networks; Bayesian reinforcement learning; Regret bounds; Hierarchical reinforcement learning; Multi-agent RL and human-AI sytems; RL in society

- Logic, sets, graphs, complexity, search, sorting, dynamic programming, stacks, trees, matrices, graphs.
- Literature:
- Michael T. Goodrich & Roberto Tamassia : Data Structures and Algorithms in Java, 5th Ed John Wiley & Sons ( ISBN: 978-0-470-39880-7 ).
- Judith L. Gersting : Mathematical Structures for Computer Science, 6th Ed W H Freeman & Co ( ISBN: 0-7167-6864-X )

- Probability and measure theory; Subjective probability; Decision problems; Conjugate priors; Estimation; Hypothesis testing; (Linear models); Sequential sampling; Experiment design; Markov decision proceses; Reinforcement learning; Approximate Dynamic Programming; Bandit problems and Regret; Learning with expert advice; Stochastic optimisation; (Learning in games)
*Enroll and select your preferred period for the course*

With Peter Damaschke. This year I am extending the course to include some modern developments in combinatorial bandit optimization

Piazza homepage for homework assignments and feedback

Course schedule and other details

With Peter Ljunlof. This year I am modernising the course with the inclusion of probabilistic inference and planning and reinforcement learning; a high-level overview of some of the topics in the decision theory course

- Bayesian methods for population modelling [PDF Slides]
- Hierarchical Bayes, Empirical Bayes
- Constrained geometric estimation and hypothesis testing [PDF]
- Optimization under constraints -- Lagrangian methods -- Linearisation algorithms -- Hypothesis testing
- Dynamic programming [PDF]
- Sequential decision making -- Markov decision processes -- Value functions -- Shortest-path problems -- Episodic, finite, infinite horizon problems -- Dynamic programming -- Backwards Induction -- Policy Iteration

- An overview of hypothesis testing [PDF]
A general overview of hypothesis testing is given. The Bayesian and distribution-free framework to multiple hypothesis testing and to null hypothesis testing are discussed. Some practical algorithms are introduced, together with associated performance bounds.

- Online statistical estimation for vehicle control: A
tutorial[PDF]
This tutorial examines simple physical models of vehicle dynamics and overviews methods for parameter estimation and control. Firstly, techniques for the estimation of parameters that deal with constraints are detailed. Secondly, methods for controlling the system are explained. Thirdly, we discuss trajectory optimization.

- Reinforcement learning and decision theory
- Machine learning and statistics
- Fairnes, accountability, transparency and privacy in machine learning
- Human-AI symbiosis
- AI for motor racing

- Helper-AI
- Fair decision making
- Inverse reinforcement learning for helper-ai design, Raphael Duroselle, 2017.
- Recommendation system for
workers and tasks, Sebastian Bellevik and Philip Ekman,
2017.
*Presented at Recsys VAMS WS 2017* - Decision making
under uncertainty: a robust optimization, Emannouil
Androulakis, 2014.
*Presented at NIPS RL WS 2014* - Learning to play games from multiple imperfect teachers, John Karlsson, 2014.
- Large-scale content extraction from heterogeneous source, Daniel Langkilde, 2014.Presented at SAIS 2015
- Probabilistic inverse reinforcement
learning in unknown environments, Aristide Tossou, 2012.
*Presented at UAI 2013* *Model class priors in reinforcement learning*, Florian Barras 2012.*Raceline optimization*, Bart van der Poel. 2008*Intelligent Agent Querying*, Tim Doolan. 2008*Generalisation in reinforcement learning*, Wouter Joseman. 2008.

- Analysis of Networks: Privacy in Bayesian Networks and Problems in Lattice Models, Zuhe Zhang, University of Melbourne, 2017. Supervisor: Benjamin Rubinstein.
- Machine Learning for Intelligent Agents, Nikolaos Tziortziotis, 2015, University of Ioannina / EPFL, with Konstantinos Blekas
- Aggregating Information from the Crowd: ratings, recommendations and predictions, Florent Garcin, 2014, EPFL, with Boi Faltings
- Coordination and sampling in distributed constraint optimization Brammert Ottens, 2012, EPFL, with Boi Faltings.
- Image categorisation through Boosting using cost-minimising strategies for data labelling - Christian Savu-Krohn, 2009, University of Leoben, with Peter Auer.