Decision theory forms the basis of modern artificial intelligence and economics. This course gives a firm foundation to decision theory from mainly a statistical, but also a philosophical perspective. The course has two aims:
The course may be split in two parts.
In the first part, we introduce the concepts of subjective probability and utility, and how they can be used to represent and solve decision problems. We then cover the estimation of unknown parameter and hypothesis testing. Finally, we discuss sequential sampling, sequential experiments, and more generally, sequential decision making.
The second part focuses on recent research on decision making under uncertainty, and in particular reinforcement learning and learning with expert advice. First, we examine a few representative statistical models. We then give an overview of algorithms for making optimal decisions using these models. Finally, we look at the problem of learning to act by following expert advice, which a field that has recently found many applications in online advertising, game-tree search and optimisation.
Mathematical background: Good knowledge of basic calculus and probability is required. Mathematical maturity is necessary.
Computer science background: Basic courses in algorithms and datastructures are necessary for being able to develop the main algorithms. Programming experience relating to numerical problems is valuable for the homework exercises and the project. Your programs can be in any cross-platform portable language, but a level of programming maturity is expected.
Complementary courses: Previous exposure to artificial intelligence, machine learning, optimisation, or statistics, is partially beneficial but is not necessary. You are encouraged to take such courses either before or after this course in order to have a more rounded view of the field.
Continous assessment, based on exercises, course participation and a project. The exercises require a mixture of basic applied mathematics (calculus and probability), programming and some thought. The exercises are designed so that by the end of exercise set 8, you have a basic set of algorithms that you can use in the project.
The project is organised as a competition. Students form 2-person teams. Each team creates an environment and an agent. These are presented at the end of the course. Everybody's agents are tested on everybody's environments, and the results are also shown at the end of the course.
As this is an advanced course, we do not follow a particular book, but pick and mix from a number of sources. Roughly, the initial part of the course concentrates on De Groot, while the second part starts with Puterman and then some parts of Bertsekas and Tsitsiklis. If we have time, we'll also cover some parts of the Cesa-Bianchi and Lugosi book.
Gives a good overview of Bayesian statistics and decision theory, and an introduction to Markov decision processes from a statistical perspective. Most reinforcement learning problems can be reduced to MDPs of this type.
This early work gives a more verbose (but rigorous) alternative to de Groot. Most basic ideas of statistical decision theory are laid out in this book, including minimax problems.
Gives an excellent overview of Monte Carlo methods, including theory and many useful examples.
Examines various types of Markov decision processes and gives detailed proofs of basic dynamic programming algorithms, as well as important cases not treated elsewhere.
This work connects basic MDP theory to reinforcement learning, in the case where we perform approximate dynamic programming via simulation. Many algorithms are described and analysed (in the limit).
Gives an overview of the problem of reinforcement learning, and many important basic algorithms. Includes no analysis, but it is recommended for a first approach to the subject.
This is a good reference book for modern reinforcement learning algorithms. Useful for advanced students.
Considers the case where we do not necessarily have an explicit, or estimated environment model, and where we may be acting against an adversarial environment. This book requires patience, but is very rewarding.