Kalman exercise

Intro

Quick explanation of how a Kalman filter works just to make sure I understood it and I don't forget it, and also for others to learn.

Combining 2 sensor measurements

Before we understand how it works let's look at a simpler problem. Suppose we have 2 sensors, each one with its own average $\mu$ and variance $\sigma^2$.

sensor a: $(\mu_a,\sigma_a^2)$
sensor b: $(\mu_b,\sigma_b^2)$

Could we combine those 2 measurements in any way so we get a much more precise measurement? Yes indeed, those readings represent Gaussians and combining them means multiplying them, the result is given by the following formula: $$(\mu_{new},\sigma_{new}^2)=\Bigg(\mu_a + \frac{\sigma_a^2 (\mu_b – \mu_a)} {\sigma_a^2 + \sigma_b^2}, \sigma_a^2 – \frac{\sigma_a^4} {\sigma_a^2 + \sigma_b^2}\Bigg)$$ In the literature we usually find the above formula written this way: $$(\mu_{new},\sigma_{new}^2)=(\mu_a + \color{fuchsia}k(\mu_b – \mu_a), \sigma_a^2 – \color{fuchsia}k\sigma_a^2), \quad \textrm{where} \quad \color{fuchsia}k=\frac{\sigma_a^2}{\sigma_a^2 + \sigma_b^2} $$ and $\color{fuchsia}k$ is called the Kalman gain. $\mu_{new}$ is more clear if we express it this way: $$\mu_{new}= (1-\color{fuchsia}k)\mu_a + \color{fuchsia}k(\mu_b) $$ This reminds us of a linear interpolation $(1-t)a+(t)b$, hence k will determine the weight of each sensor. Note that if our two sensors were IMUs (Inertial measurements Units) they would return a vector with 6 components $(x,y,z,w_x,w_y,w_z)$. In this case the average $\mu$ would be a 6 component vector and the variance $\sigma$ would be a 6x6 matrix.

In a rocket we'd need to read and combine these measurements every instant $t$ : $(\mu(t),\sigma^2(t))$

What if we can only carry one sensor?

That is what happened exactly in the Apollo rocket, back in the 60s the IMUs were veeery heavy and they could only carry one.

Kalman thought that despite only having one IMU you could still get 2 different readings. One reading would come from IMU itself $$(\mu_{a}(t),\sigma_{a}^2(t))$$ and the second reading could just be extrapolated from the previous reading $$(\mu_{a}(t-1),\sigma_{a}^2(t-1))$$ In the case of a rocket, and thanks to Newtown, our prediction formula $F$ could be as simple as $s=v\dot t$. The most important thing is that $F$ is linear, otherwise the math becomes impossible, and in the practice that means that $F$ could be expressed as a matrix $M$. With those requirements the reading of our virtual sensor $v$ would be: $$(\mu_{v},\sigma_{v}^2(t)) = (M\mu_a(t-1),M\sigma_{a}^{2}(t-1)M^{-1})$$

Putting it all together

We have then 2 sensors, the real one and the virtual one (that extrapolates data from the previous instant):

sensor a: $(\mu_a,\sigma_a^2)$
sensor v: $(\mu_{v},\sigma_{v}^2(t)) = (M\mu_a(t-1),M\sigma_{a}^{2}(t-1)M^{-1})$

now substitute here: $$(\mu_{new},\sigma_{new}^2)=(\mu_a + \color{fuchsia}k(\mu_v – \mu_a), \sigma_a^2 – \color{fuchsia}k\sigma_a^2), \quad \textrm{where} \quad \color{fuchsia}k=\frac{\sigma_a^2}{\sigma_a^2 + \sigma_v^2} $$ And you should obtain the same equations Wikipedia gives you

Extended Kalman

Remember that we mentioned that the Kalman filter requires the prediction formula to be linear, that meant this function had to be expressed as a matrix $M$. What happens if our prediction formula is not linear? Well, we use Taylor's to hammer it down until it becomes a matrix, and that is called linearization: $$F(x) \approx F(p) + \nabla F|_p \dot (x-p)$$ And then plug that into the regular Kalman filter, rinse and repeat. This has the added cost that we need to compute and evaluate the Jacobian every instant of time.

SLAM (or how self-driving cars work become aware of their surroundings)

So far the state of my system is based only on what I measure from my IMU but, could we do better?

Probably yes! We humans besides using our internal IMUs we use many visual clues/features that we make part of our $state$ to correct our IMU estimates. We also do the opposite, and we use our IMU to better estimate the position of those features/clues. This is exactly what SLAM does, at its core it is just a Kalman filter with the twist that each time it finds a reliable visual clue it treats it as a sensor and makes it part of its state, a thing that allows us to tell our position much better and build a map.

I'll be very glad if you ever build a self-driving car based on this explanation so please let me know :)

Contact/Questions:

<my_github_account_username>$@gmail.com$.