Course for 2016: http://www.cse.chalmers.se/research/lab/courses/algorithms-for-machine-learning-tda-231/

DIT380/TDA231: Algorithms for Machine Learning and Inference

January-March 2011.


Machine Learning means to infer hypotheses from given data or experience, which results in some model that generalizes the data and hopefully draws correct conclusions. In this sense, algorithms (implemented as computer programs) that generate such hypotheses can "learn" from data and improve their performance. "Learning" algorithms do not contain direct instructions how to solve a problem, rather, we only tell them how to adapt themselves to new data. This is appropriate for complex problems where we lack complete knowledge and even an explicit specification. Machine learning methods are commonly used in classification tasks (recognizing objects, understanding situations, predicting future data, etc.) and in expert systems (e.g., for diagnosis). A commercially and scientifically important area of application is Data Mining, where such algorithms are used to detect relevant information and patterns in large databases. Another major field is Bioinformatics.

You won't see many implementation details of specific learning algorithms in this course. (We suppose that you are already enough familiar with the programming aspects of Computer Science.) Rather, the course deals with the principal and general issues of Machine Learning: Why do these algorithms "learn"? What is behind them? We also refer to existing implementations that you can use "from the shelves". The really important thing is to know how to apply and interpret them in a proper way. Nevertheless, you should notice that most of the algorithms work in a rather simple fashion and could be implemented from scratch using only standard programming skills (and enough time).


Course Goals

After the course you should


Messages


Practical Information

Prerequisites: Some elementary knowledge of discrete mathematics, combinatorics, probability theory, mathematical statistics, and calculus is essential. A basic course in algorithms and data structures is also a useful prerequisite, but you need more the "algorithmic thinking" rather than specific algorithms. Here is a little prerequisite self-test.

Instructor:
Peter Damaschke (ptr, room 6478)

Assistants:
Azam Sheikh Muhammad (azams, room 6453), Bassel Mannaa (bassel)

Lectures:
Tuesday, 10:00-11:45, room HC1
Thursday, 13:15-15:00, room HC2

Consultation time: by appointment, in office 6478. Book a consultation time if you have any course-related questions or difficulties you want to discuss, alone or in small groups.

Possible changes in the schedule will be announced.


Examination and Grading Criteria

To pass the course, you must do (1) the compulsory exercises during the course, and (2) a take-home exam with summarizing questions. Details will be announced. Some practical exercises may be part of the hand-ins, but no programming exercises will be given. We do not use a formal point system but apply the following grading criteria:

5/VG: Solutions are correct and clear, subject to minor difficulties.
4/G: Solutions are mainly correct, but with some substantial difficulties.
3/G: Solutions show a basic understanding, but exercises have been managed only after several attempts, or mistakes or gaps remain.
U: The understanding is in general insufficient, major parts of the exercises have not been solved.

Improvement of your grade is possible later on, by an individual follow-up assignment, however you must really show an improvement to reach the next higher level.


Rules and Policies

Read them carefully and take them very seriously.

Exercise Submission

Solutions to the theoretical hand-in exercises must be submitted individually (not in groups) by email to "ptr..." The subject line should contain the course code and exercise number. Please send only plain text or PDF attachments, no other formats. Write your name and personal number also on the submitted attachments (as we may want to print them).


Literature

The course does not exactly follow a particular book, but it can be helpful to read on the side "Mitchell: Machine Learning (McGraw-Hill 1997)", ISBN 0-07-042807-7, which should be available at Cremona. Chapter 2-8 were earlier used as the course book. The book is somewhat outdated, but still it gives well readable introductions to the main topics.
Other books that may be of interest are: Russell, Norvig: Artificial Intelligence. (A good general AI book with several chapters related to this course. It has also been used in the AI course.)

Bishop: Pattern Recognition and Machine Learning. (See here.)

Fayyad et al: Advances in Knowledge Discovery and Data Mining.

Michie, Spiegelhalter, Taylor: Machine Learning, Neural and Statistical Classification. (This text is freely available.)

Gr"unwald: The Minimum Description Length Principle. (More specialized and advanced! Selected chapters are freely available.)


Contents

Lecture 1. Scope of machine learning, typical problems, basic framework, inductive bias. Figures.

Lecture 2. Version space, classification, learning by queries, candidate elimination. Figures.

Lecture 3. Remarks on the inductive bias, computational issues, noisy data, Minimum Description Length (MDL) principle, the decision tree "language". Figures. More figures.

Lecture 4. Description length of decision trees vs. Boolean expressions, ID3 algorithm, overfitting, validation. Figures.

Lecture 5. Ensemble methods. Probability and learning: entropy again, Bayesian learning, prior and posterior probabilities, MAP and ML hypotheses.

Lecture 6. Bayesian learning as an extension of concept learning, small examples, regression and ML hypotheses, about learning probabilistic concepts.

Lecture 7. Bayesian classifiers: Optimal, Gibbs, and Naive; the space of linear separators. Figures.

Lecture 8. Training perceptrons, inductive bias and training of sigmoid neural networks. Figures.

Lecture 9. Support vector machines. Sample size estimates, probably almost correct (PAC) learning. Figures.

Lecture 10. VC dimension, mistake bounds, boosting again. Figures.

Lecture 11. Perceptron mistake bound, neareast neighbor learning, discretization of a numerical attribute. Figures.

Lecture 12. Learning relevant attributes.