Information-Theoretic Learning (ITL)
Leiden University, Spring Semester 2020
General Information
The URL of this webpage is www.cwi.nl/~pdg/teaching/inflearn. Visit this page regularly for changes, updates, etc.This course is on an interesting but complicated subject. It is given at the master's or advanced bachelor's level. Although the only background knowledge required is elementary probability theory, the course does require serious work by the student. The course load is 6 ECTS. Click here (studiegids) for a general course description.
Many thanks are due to Steven de Rooij (Leiden University) who prepared a significant proportion of the exercises.
Lectures and Exercise Sessions
Because of the Corona Emergency, lectures take place remotely, via Blackboard, Kaltura Captura, each Tuesday from 14.15--16.00. The lectures are immediately followed by a virtual mini-exercise session held by Tyron Lardy. Presumably, there will abe no lecture on April 14 and May 5, but due to the Corona emergency, these plans may change. The last official lecture is scheduled for May 19, and the final exam will be held on Friday, June 5th, in the morning from 10:15 to 13:15. The exam will be sent to you by email and/or be made available on blackboard. It is an Open Book/Internet Exam, so you are free to look up things on the internet. However, I will request you to subscribe to an 'honour's code' that you do not communicate with each other during the exam.
Homework Assignments
Weekly Homework: At every lecture. The assignment is made available on this webpage. Homework is obligatory and must be turned in 24 hours before the beginning of the next lecture, i.e. six days after the assignment was handed out: on Mondays, at 14:15. You can turn in your homework digitally via blackboard or by e-mailing it to Tyron (photo's of handwritten homework are o.k.). Include your name on the first page of the pdf you hand in. Include both the number of the homework set as well as your name in the name of the pdf file. For example: ITL_HW4_Lardy.pdf After the lecture, there is (approximately) 30 minutes homework session, during which the homework will be explained and discussed by Tyron. Turning in homework in time is required, see below. Homework solutions will be made available the evening or morning before the homework is discussed.Credit
6 ECTS points.Examination form
In order to pass the course, one must obtain a sufficient grade (5.5 or higher) on both of the following two:- An open-book written examination (to be held June 5th).
- Homework. Each student must hand in solutions to homework assignments at the beginning of the lecture after the homework was handed out. Discussing the problems in the group is encouraged, but every participant must write down her or his answers on her or his own. The final homework grade will be determined as an average of the weekly grades.
The final grade will be determined as the average of the two grades, with the homework counting 40% and the final exam counting 60%.Literature
We will mainly use various chapters of the following source: P. Grünwald. The Minimum Description Length Principle, MIT Press, 2007. Some additional hand-outs will be made available free of charge as we go. For the third week, this is Luckiness and Regret in Minimum Description Length Inference, by Steven de Rooij and Peter Grünwald, Handbook of the Philosophy of Science, Volume 7: Philosophy of Statistics, 2011. This paper gives an overview of the part of this course that will be concerned with the relation between statistics, machine learning and data compression, as embodied in MDL learning.Course Schedule
Lecture contents are subject to change at any time for any reason. A more precise schedule, with links to all exercises, will be determined as we go.- February 4: introduction
- General introduction: learning, regularity, data compression. Kolmogorov Complexity; deterministic vs. purely random vs. ``stochastic'' sequences.
- Literature: Chapter 1 up to Section 1.5.1.
- February 11: Preparatory Probability Theory and Statistics.
- Basics of Probability Theory
- Maximum Likelihood and Bayesian Inference; Bayes Predictive Distribution
- Literature: Chapter 2, Section 2.2, 2.5.2, Section 4.4, Example 8.1.
- First Set of Homework Exercises.
- February 18: data compression without probability
- Learning of context-free grammars from example sentences.
- Basics of Lossless Coding. Prefix Codes.
- Bernoulli distributions, maximum likelihood.
- Literature: Chapter 2, Section 2.1. Chapter 3, Section 3.1, Handout, Section 1.
- Second Set of Homework Exercises
- February 25: Codes and Probabilities (the most important lecture!)
- The Kraft inequality. The most important insight of the class: the correspondence between probability distributions and code length functions. The information inequality, entropy, relative entropy (Kullback-Leibler divergence). Shannon's coding theorem.
- Coding integers: probability vs. two-stage coding view.
- Literature: Chapter 3 (3.2,3.3,3.4)
- Third Set of Homework Exercises
- March 3rd:
- Coding with the help of the Bernoulli model, using index codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano two-part codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano Bayes mixture codes.
- Markov Models (Chains): Definition, Maximum Likelihood.
- Literature: Chapter 5 until 5.6.
- Fourth Set of Homework Exercises
- Solutions Fourth Set of Homework Exercises
- March 10th and 17th: no lectures.
- March 24th: Universal Coding
- March 31st:
- NML vs. Bayes universal code for parametric models. Jeffreys prior.
- Jeffreys' prior as a uniform prior on the space of distributions equiped with the KL divergence.
- Literature: Chapter 6, Section 6.1 and 6.2; Chapter 7, 7.1 until 7.3.1; Chapter 8, 8.1 and 8.2
- Slides for Online Lecture
- Part 1 and Part 2 of on-line video.
- Sixth Set of Homework Exercises
- April 7: Simple Refined MDL, Prequential Plugin Codes
- Simple Refined MDL with its many interpretations
- Prequential Interpretation of Simple Refined MDL
- Prequential Plug-in Code
- NML regret, complexity as number of distinguishable distributions
- Literature: Chapter 9; Chapter 14, Section 14.1 and 14.2, esp. the box on page 426.
- Part 1 and Part 2 of video recorded lecture.
- Slides for Online Lecture
- Seventh Set of Homework Exercises
- April 14: General Refined MDL, Prediction with MDL, Issues with Universal Codes/MDL
- April 21: Excursion: Safe Testing I.
- April 28: Excursion: Safe Testing II.
- May 5th: No Lecture! (liberation day)
- May 12th: Maximum Entropy
- May 19th: MaxEnt and MDL, Overview, Wrap Up
- Robustness Property of Exponential Families
- Maximum Entropy and Minimum Description Length. The zero-sum coding game.
- Back to Safe Testing: General Existence of Non-Trivial S-Values. The uniformly most powerful S-Value at a given sample size.
- Overview of whole course. What is it all good for?
- Literature: Chapter 19, 19.1-19.3, 19.5.
- Slides for Online Lecture
- Friday June 5th, 10.15-13.15: Exam (to be made at home, hand-in via email) Example examination.