The location of a robot is subject to uncertainty due to wheel-slip and encoder noise. We learned in the past how the variance in position can be derived from the variance of the robot’s drive train using the error propagation law and the forward kinematics of the robot. One can see that this is error is continuously increasing unless the robot has additional observations, e.g., of a static object with known location. This update can be formally done using Bayes’ rule, which relates the likelihood to be at a certain position given that the robot sees a certain feature to the likelihood to see this feature at the hypothetical location. For example, a robot that drives towards a wall will become less and less certain of its position (action update) until it encounters the wall (perception update). It can then use its sensor model that relates its observation with possible positions. Its real location must be therefore somewhere between its original belief and where the sensor tells it to be. Bayes’ rule allows us to perform this location for discrete locations and discrete sensor error distributions. This is inconvenient as we are used to represent our robot’s position with a 2D Gaussian distribution. Also, it seems much easier to just change the mean and variances of this Gaussian instead of updating hundreds of variables. The goals of this lecture are

  • to introduce a technique known as the Kalman filter to perform action and perception updates exclusively using Gaussian distributions.
  • to formally introduce the notion of a feature map.
  • to develop an example that puts everything we learned so far together: forward kinematics, error propagation and feature estimation.

Probabilistic Map based localization

In order to localize a robot using a map, we need to perform the following steps

  1. Calculate an estimate of our new position using the forward kinematics and knowledge of the wheel-speeds that we sent to the robot until the robot encounters some uniquely identifiable feature.
  2. Calculate the relative position of the feature (a wall, a landmark or beacon) to the robot.
  3. Use knowledge of where the feature is located in global coordinates to predict what the robot should see.
  4. Calculate the difference between what the robot actually sees and what it believes it should see.
  5. Use the result from (4) to update its belief by weighing each observation with its variance.
Steps 1-2 are based on the lectures on “Forward Kinematics” and “Line detection “. Step 3 uses again simple forward kinematics to calculate the position of a feature stored in global coordinates in a map in robot coordinates. Step 4 is a simple subtraction of what the sensor sees and what the map says. Step 5 introduces the Kalman filter. Its derivation is involved, but its intuition is simple: why just averaging between where I think I am and what my sensors tell me, if my sensors are much more reliable and should carry much higher weight?

Sensor Fusion using the Kalman filter

The Kalman filter is an optimal way to fuse observations that follow a Gaussian distribution. The Kalman filter has an update and a prediction step. The update step uses a dynamical model of the system (such as the forward kinematics of your robot) and the prediction step uses a sensor model (such as the error distribution calibrated from its sensors). The Kalman filter does not only update the state of the system (the robot’s position) but also its variance. For this, it requires knowledge of all the variances involved in the system (e.g., wheel-slip and sensor error) and uses them to weigh each measurement accordingly. Before providing the equations for the Kalman filter, we will make use a simple example that explains what “optimal” means in this context.

Optimal Sensor Fusion

 Let \hat{q_1} and \hat{q_2} be two different estimates of a random variable and \sigma^2_1 and \sigma^2_2 their variances, respectively. Let q be the true value. This could be the robot’s position, e.g. The observations have different variances when they are obtained by  different means, say using odometry for \hat{q_1} and by using the location of a known feature for \hat{q_2}. We can now define the weighted mean-square error
S=\displaystyle\sum_{i=1}^{n}\frac{1}{\sigma_i} (q-\hat{q_i})^2
that is, S is the sum of the errors of each observation \hat{q_i} weighted by its standard deviation \sigma_i. Each error is weighted  with its standard deviation to put more emphasis on observations whose standard deviation is low. Minimizing  S yields the following optimal expression for q:
We have now derived an expression for fusing two observations with different errors that provably minimizes the error between our estimate and the real value!

The Kalman Filter

Although we have introduced the problem above as fusing two observations of the same quantity and weighing them by their variance, we can also interpret the equation above as an update step that calculates a new estimate of an observation based on its old estimate and a measurement. Remember step (4) from above: \hat{q_2}-\hat{q_1} is nothing else as the difference between what the robot actually sees and what it thinks it should see. This term  is known as innovation in Kalman lingo. We can now rewrite our result from above into
Here, \hat{x}_k is the state we are interested in at time k, K_{k+1}=\frac{\sigma_1^2}{\sigma_1^2+\sigma_2^2} the Kalman gain, and \tilde{y}_{k+1}=\hat{q_2}-\hat{q_1}  the innovation. Unfortunately, there are few systems that allow us to directly measure the information we are interested in. Rather, we obtain a sensor measurement z_k that we need to convert into our state somehow. You can think about this the other way round too and predict your measurement z_k from your state x_k. This is done using the observation model H_k, so that
\tilde{y}_{k}=z_k-H_k x_k
In our example H_k was just the identity matrix; in a robot position estimation problem H_k is a function that would predict how a robot would see a certain feature. As you can see, all the weighing based on variances is done in the Kalman gain K.
The perception update step shown above, also known as prediction step is only half of what the Kalman filter does. The first step is the update step, which corresponds to the action update we already know. In fact, the variance update in the Kalman filter is exactly the same as we learned during error propagation. Before going into any more details on the Kalman filter, it is time for a brief disclaimer: the Kalman filter only works for linear systems. Forward kinematics of even the simplest robots are mostly non-linear, and so are observation models that relate sensor observations and the robot position. Non-linear systems can be dealt with the Extended Kalman Filter.

Extended Kalman Filter

In the extended Kalman filter, the state transition and observation models need not be linear functions of the state but may instead be differentiable functions. The action update step looks as follows:

\hat{\boldsymbol{x}}_{k'|k-1} = f(\hat{\boldsymbol{x}}_{k-1|k-1}, \boldsymbol{u}_{k-1})

Here f() is a function of the old state \boldsymbol{x}_{k-1} and control input \boldsymbol{u}_{k-1}. This is nothing else as the odometry update we are used to, where f() is a function describing the forward kinematics of the robot, \boldsymbol{x}_k its position and \boldsymbol{u}_k the wheel-speed we set.

We can also calculate the covariance matrix of the robot position

\boldsymbol{P}_{k'|k-1} = \nabla_{x,y,\theta}f \boldsymbol{P}_{k-1|k-1}\nabla_{x,y,\theta}f^T + \nabla_{\Delta_{r,l}}f\boldsymbol{Q}_{k-1}\nabla_{\Delta_{r,l}}f

This is nothing else as the error propagation law applied to the odometry of the robot with \boldsymbol{Q}_k the covariance matrix of the wheel-slip and the Jacobian matrices of the forward kinematic equations f() with respect to the robot’s position (indicated by the index x,y,\theta) and with respect to the wheel-slip of the left and right wheel.

The perception update (or prediction) step looks as follows:

\hat{\boldsymbol{x}}_{k|k'} = \hat{\boldsymbol{x}}_{k'|k-1} + \boldsymbol{K}_{k'}\tilde{\boldsymbol{y}}_{k'}
\boldsymbol{P}_{k|k'} = (I - \boldsymbol{K}_{k'} {\boldsymbol{H}_{k'}}) \boldsymbol{P}_{k'|k-1}

At this point the indices k should start making sense. We are calculating everything twice: once we update from k-1 to an intermediate result k' during the action update, and obtain the final result after the perception update where we go from k' to k.

We need to calculate three additional variables:

  1. The innovation \tilde{\boldsymbol{y}}_{k}=\boldsymbol{z}_{k}-h(\hat{\boldsymbol{x}}_{k|k-1})
  2. The covariance of the innovation \boldsymbol{S}_{k}={\boldsymbol{H}_{k}}\boldsymbol{P}_{k|k-1}{\boldsymbol{H}_{k}^\top}+\boldsymbol{R}_{k}
  3. The (near-optimal)  Kalman gain \boldsymbol{K}_{k}=\boldsymbol{P}_{k|k-1}{\boldsymbol{H}_{k}^\top}\boldsymbol{S}_{k}^{-1}
Here h() is the observation model and \boldsymbol{H} its Jacobian. How these equations are derived is involved (and is one of the fundamental results in control theory), but the idea is the same as introduced above: we wish to minimize the error of the prediction.


Whereas the update step is equivalent to forward kinematics and error propagation we have seen before, the observation model and the “innovation” require an example. We will show how a mobile robot equipped with a laser scanner can correct its position estimate by relying on unreliable odometry, unreliable sensing, but a correct map, in an optimal way.

1.Prediction Update

We assume for now that the reader is familiar with calculating \hat{\boldsymbol{x}}_{k'|k-1}=f(x,y,\theta)^T and its variance \boldsymbol{P}_{k'|k-1}. Here, \boldsymbol{Q}_{k-1}, the covariance matrix of the wheel-slip error,  is given by

\boldsymbol{Q}_{k-1}=\left[\begin{array}{cc}k_r|\Delta s_r & 0\\0 & k_l|\Delta s_l|\end{array}\right]

where \Delta s is the wheel movement of the left and right wheel and k are constants. See also the odometry lab for detailed derivations of these calculations and how to estimate k_r and k_l.  The state vector \boldsymbol{\hat{x}_{k'|k-1}} is a 3×1 vector, the covariance matrix \boldsymbol{P_{k'|k-1}} is a 3×3 matrix, and \nabla_{\Delta_{r,l}} that is used during error propagation is a 3×2 matrix. See the error propagation lecture for details on how to calculate \nabla_{\Delta_{r,l}}.


Let us now assume that we can detect line features \boldsymbol{z}_{k,i}=(\alpha_i,r_i)^T, where \alpha and r are the angle and distance of the line from the coordinate system of the robot. These line features are subject to variances \sigma_{\alpha,i} and \sigma_{r,i}, which make up the diagonal of \boldsymbol{R}_{k}. See the line detection lecture for a derivation of how angle and distance as well as their variance can be calculated from a laser scanner. The observation is a 2×1 matrix.

3. Measurement Prediction

We assume that we can uniquely identify the lines we are seeing and retrieve their real position from a map. This is much easier for unique features, but can be done also for lines by assuming that our error is small enough and we therefore can search through our map and pick the closest lines. As features are stored in global coordinates, we need to transpose them into how the robot would see them. In practice this is nothing but a list of lines, each with an angle and a distance, but this time with respect to the origin of the global coordinate system. Transposing them into robot coordinates is straightforward. With  \hat{\boldsymbol{x}}_{k}=(x_{k},y_{k},\theta_k)^T and m_i=(\alpha_i,r_i) the corresponding entry from the map, we can write

h(\hat{\boldsymbol{x}}_{k|k-1})=\left[\begin{array}{c}\alpha_{k,i}\\r_{k,i}\end{array}\right]=h(\boldsymbol{x},m_i)=\left[\begin{array}{c}\alpha_i-\theta\\r_i-(x cos(\alpha_i)+y sin(\alpha_i)\end{array}\right]

and calculate its Jacobian \boldsymbol{H}_{k} as the partial derivatives of \alpha to x,y,\theta in the first row, and the partial derivatives of r in the second. How to calculate h() to predict the radius at which the robot should see the feature with radius r_i from the map is illustrated in the figure below.

Example on how to predict the distance to a feature the robot would see given its estimated position and its known location from a map.

4. Matching

We are now equipped with a measurement \boldsymbol{z}_k and a prediction h(\hat{\boldsymbol{x}}_{k|k-1}) based on all features stored in our map. We can now calculate the innovation


which is simple the difference between each feature that we can see and those that we predict from the map. The innovation is again a 2×1 matrix.

5. Estimation

We now have all the ingredients to perform the perception update step of the Kalman filter:

\hat{\boldsymbol{x}}_{k|k'} = \hat{\boldsymbol{x}}_{k'|k-1} + \boldsymbol{K}_{k'}\tilde{\boldsymbol{y}}_{k'}
\boldsymbol{P}_{k|k'} = (I - \boldsymbol{K}_{k'} {\boldsymbol{H}_{k'}}) \boldsymbol{P}_{k'|k-1}

It will provide us with an update of our position that fuses our odometry input and information  that we can extract from features in the environment in a way that takes into account their variances. That is, if the variance of your previous position is high (because you have no idea where you are), but the variance of your measurement is low (maybe from a GPS or a symbol on the Ratslife wall), the Kalman filter will put more emphasis on your sensor. If your sensors are poor (maybe because you cannot tell different lines/walls apart), more emphasis will be on the odometry.

As the state vector is a 3×1 vector and the innovation a 2×1 matrix, the Kalman gain must be a 3×2 matrix. This can also be seen when looking at the covariance matrix that must come out as a 3×3 matrix, and knowing that the Jacobian of the observation function is a 2×3 matrix. We can now calculate the covariance of the innovation and the Kalman gain using



Take home lessons

  • The Extended Kalman Filter is the optimal way to fuse observations of different random variables that are Gaussian distributed.
  • The derivation of the Kalman filter equations is involved, but bases on minimizing the least-square error between prediction and real value.
  • Possible random variables could be the estimate of your robot position from odometry and observations of static beacons with known location (but uncertain sensing) in the environment.
  • In order to take advantage of the approach, you will need differentiable functions that relate measurements to state variables as well as an estimate of the covariance matrix of your sensors.



  • Noah DeMarco

    I still have no idea :(