Derive yourself a Kalman filter
If you have already tried to understand Kalman filters, you may very likely have found yourself dumbfounded, as I was when I did. Yet Kalman filters are actually quite simple. In this post, I will try to explain them as I would have liked them to be explained to me. What follows is a quite personal take on them and may hence fail to use the usual vocabulary of the field, or lack exhaustiveness, sorry about that.
Introduction
The classical example of the use of a Kalman filter is the following. Say you want to program a remote piloting interface for a small robot. This robot is moving around and we want to track its position. To track the position of this robot we have two possible sources of information:
- We have access to some continuous measurement of the position of the robot (say GPS)
- We also know the starting position of the robot and the movements that should have been done so far ("We have commanded the wheels to move x centimeters in such or such direction."). From this two things, we can compute the position where the robot should currently stand.
Now this two sources of information may disagree, and we are left with the question of how to merge them into one. One may wish to simply average all the estimators of the position we have access to, but a more rigorous analysis is possible.
Formal setting
Lets consider a dynamical system with state evolving in discrete steps, and that at each step, we can get some measure of the state. Then Kalman filters are an algorithm allowing you estimate the state of your system at step , given measurements ... , provided that your system follows the following constraints.
- Affine system evolution + Gaussian noise: where is a matrix, is a vector representing user input and is a centered random Gaussian variable (of appropriate dimension).
- Linear measurements + Gaussian noise: where is a matrix, and a centered random Gaussian variable.
- Initial state is either normally distributed or fixed (if we abuse notation, we will say that a fixed value is distributed normally, but with a nul variance.)
The Great and Central Theorem of Kalman Filters
To compute the estimation of the system's state at each step, Kalman filters rely on the following simple Theorem:
Theorem
Provided that the assumptions from the previous section hold, then for any , is normally distributed given . And there exist a formula to compute both the variance and the mean of this distribution at each step.
A remark here is that it is fairly easy to conclude that and are normally distributed (by induction on , and the fact that all operations preserve Gaussianness, see below), what is less obvious is the conditional distribution of knowing . This conditional expectation however stems directly from the Gaussianness of .
From now on, upper case symbols will denote random variable and their lower case counterpart will denote a realization of this random variable.
This section will be dedicated to the proof of the previous theorem and to outlining the formulas of the Kalman filter. We will first list several "Gaussianness preserving" operations, and their effect on the variance and mean, and we will then show that the posterior distribution of knowing ... is Gaussian and give its moment by successively applying operations listed before.
It is worth reminding here that a Gaussian distribution only has a probability density if its covariance matrix is definite (ie. invertible).
Some syntaxic sugar
To be able to better reason about these Gaussianness preserving operations in what follows, let me first introduce some syntaxic sugar.
Let and be two random variables. We can define as being a function which, to a value of associate the random variable , ie. the random variable which takes values in the same domain as but with probability:
For any function we can define by: . Indeed, as is a random variable, most properties of random variables can be lifted to these "conditioned random variables". For example, we can straightforwardly define characteristic functions, covariance, expectancy. This quantities will also be functions of .
As well, we say that is multinormal if for all , is.
Gaussianness preserving operations
- Affine function: Let , be random variable, of dimension , so that is multi-normally distributed with mean and covariance matrix . Let be a matrix, and be an -dimensional vector. Then: is normally distributed, with mean and variance .
- Sum of two independent multi normal variables Let , , , be random variables. and of dimension , so that and are normally distributed and independent (for every value of ), with means and and covariances and . Then is normally distributed, with mean and covariance .
- Conditional distribution of two jointly Gaussian variable. Let , , be random variables, so that and are jointly Gaussian with mean and covariance : Then is also Gaussian, with mean and covariance : where "" denotes the generalized inverse.
Building the recursive estimate.
Let n be an integer. Let's assume that is Gaussian, with mean and covariance .
We know that , hence: As this quantity is only made by successively applying affine functions and summing independent Gaussian variables, we have that is normally distributed with moments:
Now, let's find out the joint distribution of . We have:
,
This expression is as previously the result of successively applying an affine function and summing independent Gaussian variables. Hence are jointly Gaussian the mean and covariance of this joint distribution are:
They simplify to:
Given that are jointly Gaussian, per our third property, the posterior is as well, and when , its moments are:
Recap
Here we have obtained the following recursive formulas for the moments of the posterior at step (I leave intermediate variables for readability):
These formulas are the usual one of Kalman filters, and, going back to our robot example, give a rigorous way of estimating the position of our robot. If the dynamic of your system is not linear, then you can use what is called an "extended Kalman filter", which is just a fancy name for linearizing the dynamic of your system (i.e. taking equal to the Jacobian) and applying the usual Kalman filter.