(cache)Information-Theoretic Learning (ITL)

Information-Theoretic Learning (ITL)

Leiden University, Spring Semester 2020

General Information

Lecturer	Prof. Dr. Peter Grünwald, Leiden University, Mathematical Institute, and Centrum Wiskunde & Informatica (CWI), Amsterdam
Teaching assistant	Tyron Lardy
Contact:	send email to: `tyronlardy` at `live.nl`.

The URL of this webpage is www.cwi.nl/~pdg/teaching/inflearn. Visit this page regularly for changes, updates, etc.

This course is on an interesting but complicated subject. It is given at the master's or advanced bachelor's level. Although the only background knowledge required is elementary probability theory, the course does require serious work by the student. The course load is 6 ECTS. Click here (studiegids) for a general course description.

Many thanks are due to Steven de Rooij (Leiden University) who prepared a significant proportion of the exercises.

Lectures and Exercise Sessions

Because of the Corona Emergency, lectures take place remotely, via Blackboard, Kaltura Captura, each Tuesday from 14.15--16.00. The lectures are immediately followed by a virtual mini-exercise session held by Tyron Lardy. Presumably, there will abe no lecture on April 14 and May 5, but due to the Corona emergency, these plans may change. The last official lecture is scheduled for May 19, and the final exam will be held on Friday, June 5th, in the morning from 10:15 to 13:15. The exam will be sent to you by email and/or be made available on blackboard. It is an Open Book/Internet Exam, so you are free to look up things on the internet. However, I will request you to subscribe to an 'honour's code' that you do not communicate with each other during the exam.

Homework Assignments

Weekly Homework: At every lecture. The assignment is made available on this webpage. Homework is obligatory and must be turned in 24 hours before the beginning of the next lecture, i.e. six days after the assignment was handed out: on Mondays, at 14:15. You can turn in your homework digitally via blackboard or by e-mailing it to Tyron (photo's of handwritten homework are o.k.). Include your name on the first page of the pdf you hand in. Include both the number of the homework set as well as your name in the name of the pdf file. For example: ITL_HW4_Lardy.pdf After the lecture, there is (approximately) 30 minutes homework session, during which the homework will be explained and discussed by Tyron. Turning in homework in time is required, see below. Homework solutions will be made available the evening or morning before the homework is discussed.

Credit

6 ECTS points.

Examination form

In order to pass the course, one must obtain a sufficient grade (5.5 or higher) on both of the following two:

An open-book written examination (to be held June 5th).
Homework. Each student must hand in solutions to homework assignments at the beginning of the lecture after the homework was handed out. Discussing the problems in the group is encouraged, but every participant must write down her or his answers on her or his own. The final homework grade will be determined as an average of the weekly grades.

The final grade will be determined as the average of the two grades, with the homework counting 40% and the final exam counting 60%.

Literature

We will mainly use various chapters of the following source: P. Grünwald. The Minimum Description Length Principle, MIT Press, 2007. Some additional hand-outs will be made available free of charge as we go. For the third week, this is Luckiness and Regret in Minimum Description Length Inference, by Steven de Rooij and Peter Grünwald, Handbook of the Philosophy of Science, Volume 7: Philosophy of Statistics, 2011. This paper gives an overview of the part of this course that will be concerned with the relation between statistics, machine learning and data compression, as embodied in MDL learning.

Course Schedule

Lecture contents are subject to change at any time for any reason. A more precise schedule, with links to all exercises, will be determined as we go.

February 4: introduction
- General introduction: learning, regularity, data compression. Kolmogorov Complexity; deterministic vs. purely random vs. ``stochastic'' sequences.
- Literature: Chapter 1 up to Section 1.5.1.
February 11: Preparatory Probability Theory and Statistics.
- Basics of Probability Theory
- Maximum Likelihood and Bayesian Inference; Bayes Predictive Distribution
- Literature: Chapter 2, Section 2.2, 2.5.2, Section 4.4, Example 8.1.
- First Set of Homework Exercises.
February 18: data compression without probability
- Learning of context-free grammars from example sentences.
- Basics of Lossless Coding. Prefix Codes.
- Bernoulli distributions, maximum likelihood.
- Literature: Chapter 2, Section 2.1. Chapter 3, Section 3.1, Handout, Section 1.
- Second Set of Homework Exercises
February 25: Codes and Probabilities (the most important lecture!)
- The Kraft inequality. The most important insight of the class: the correspondence between probability distributions and code length functions. The information inequality, entropy, relative entropy (Kullback-Leibler divergence). Shannon's coding theorem.
- Coding integers: probability vs. two-stage coding view.
- Literature: Chapter 3 (3.2,3.3,3.4)
- Third Set of Homework Exercises
March 3rd:
- Coding with the help of the Bernoulli model, using index codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano two-part codes.
- Coding with the help of the Bernoulli model, using Shannon-Fano Bayes mixture codes.
- Markov Models (Chains): Definition, Maximum Likelihood.
- Literature: Chapter 5 until 5.6.
- Fourth Set of Homework Exercises
- Solutions Fourth Set of Homework Exercises
March 10th and 17th: no lectures.
March 24th: Universal Coding
- Now it really gets exciting!
- Regret, Minimax Regret, NML Universal Code for finite and countable models.
- Asymptotic expansion of KL divergence
- Literature: Chapter 4, 4.1-4.3; Chapter 6, Section 6.1 and 6.2; Chapter 7, 7.1 and 7.2; Chapter 8, 8.1 and 8.2
- Slides for Online Lecture
- Part 1 and Part 2 of on-line video.
- Fifth Set of Homework Exercises
- Solutions Fifth Set of Homework Exercises
March 31st:
- NML vs. Bayes universal code for parametric models. Jeffreys prior.
- Jeffreys' prior as a uniform prior on the space of distributions equiped with the KL divergence.
- Literature: Chapter 6, Section 6.1 and 6.2; Chapter 7, 7.1 until 7.3.1; Chapter 8, 8.1 and 8.2
- Slides for Online Lecture
- Part 1 and Part 2 of on-line video.
- Sixth Set of Homework Exercises
April 7: Simple Refined MDL, Prequential Plugin Codes
- Simple Refined MDL with its many interpretations
- Prequential Interpretation of Simple Refined MDL
- Prequential Plug-in Code
- NML regret, complexity as number of distinguishable distributions
- Literature: Chapter 9; Chapter 14, Section 14.1 and 14.2, esp. the box on page 426.
- Part 1 and Part 2 of video recorded lecture.
- Slides for Online Lecture
- Seventh Set of Homework Exercises
April 14: General Refined MDL, Prediction with MDL, Issues with Universal Codes/MDL
- General Refined MDL
- MDL Prediction/Model Selection/Estimation/Mixed 1-part/2-part Codes
- Issues: undefined NML or Jeffreys prior, Horizon (In)Dependence
- Slides for Online Lecture
- video recorded lecture.
- Eighth Set of Homework Exercises
- Literature: Chapter 6, Section 6.4; Chapter 11, Section 11.4; Chapter 14, Section 14.1, 14.2, 14.3
April 21: Excursion: Safe Testing I.
- P-value/Hypothesis testing interpretation of Simple Refined MDL
- A special improper prior: the Right Haar Measure
- video recorded lecture.
- Ninth Set of Homework Exercises
- Solutions Ninth Set of Homework Exercises
- Slides for Online Lecture
- Literature: the material covered in the lecture is from the paper safe testing by Rianne de Heide, Wouter Koolen and myself. This paper is very technical and you don't have to read/understand it for the final examination; you should study the slides though, they are mostly self-contained. Apart from the slides, Section 14.2. is relevant again.
April 28: Excursion: Safe Testing II.
- The Reproducibility Crisis. Optional Stopping and Optional Continuation. The three Classical Theories of Testing.
- Slides for Online Lecture
- video recorded lecture.
- Tenth Set of Homework Exercises
- Solutions Tenth Set of Homework Exercises
- Literature: see April 21.
May 5th: No Lecture! (liberation day)
May 12th: Maximum Entropy
- Maximum Entropy Principle
- How to find MaxEnt distributions
- Exponential Families and Maximum Entropy
- Canonical and Mean-Value Parameterization
- Examples
- Literature: Chapter 18, Section 18.1-18.4; Chapter 19, 19.1-19.3, Section 19.5.1.
- Slides for Online Lecture
- video recorded lecture.
- 11th (Final!) Set of Homework Exercises
- Solutions Eleventh Set of Homework Exercises
May 19th: MaxEnt and MDL, Overview, Wrap Up
- Robustness Property of Exponential Families
- Maximum Entropy and Minimum Description Length. The zero-sum coding game.
- Back to Safe Testing: General Existence of Non-Trivial S-Values. The uniformly most powerful S-Value at a given sample size.
- Overview of whole course. What is it all good for?
- Literature: Chapter 19, 19.1-19.3, 19.5.
- Slides for Online Lecture
Friday June 5th, 10.15-13.15: Exam (to be made at home, hand-in via email) Example examination.