The Univariate Kalman Filter

January 13, 2025

Consider a linear dynamical system observed in Gaussian noise:

$s_{t+1} = as_t + w_t\\ o_{t} = cs_t + v_t\\ w_t \sim N(0,\sigma_w^2)\\ v_t \sim N(0,\sigma_v^2)\\ \mathbf{b}_0 = N(\hat{s}_0, \Sigma_0)\\ s_t \sim \mathbf{b}_t$

with state and observation spaces $S = O= \mathbb{R}$ , discrete time $t=0,1,...$ , and coefficients $a,c \in \mathbb{R}$ . We assume: (i) that $v_t$ and $v_k$ for any $t$ and $k$ are independent; (ii) $w_t$ and $w_k$ for any $t$ and $k$ are independent; (iii) $(v_t)_{t \geq 0}$ and $(w_t)_{t=\geq 0}$ are independent processes; and (iv) $s_0$ is independent of $v_t$ and $w_t$ for any $t$ .

The optimal filtering recursion is:

$\mathbf{b}_{t+1}(s_{t+1}) = \frac{z(o_{t+1} \mid s_{t+1}) \int_{X}p(s_{t+1} \mid s_t)\mathbf{b}_t(s_t)ds_t}{\int_{S}z(o_{t+1} \mid s_{t+1})\int_{S}p(s_{t+1} \mid s_t)\mathbf{b}(s_t)ds_tds_{t+1}}$

and the optimal filtered state estimate at time $t+1$ is

$\hat{s}_{t+1|t+1} = \mathbb{E}[s_{t+1} \mid y_{1:t+1}] = \int_{S}s\mathbf{b}_{t+1}(s)ds$

The univariate Kalman filter is defined as

$\mathbf{b}_{t} = N(\hat{s}_{t|t}, \Sigma_{t|t})\\ \hat{s}_{t|t} = \hat{s}_{t|t-1} + b_t(o_t-c\hat{s}_{t|t-1})\\ \Sigma_{t|t} = \Sigma_{t|t-1} - cb_t\Sigma_{t|t-1}\\ b_t = \frac{c\Sigma_{t|t-1}}{c^2\Sigma_{t|t-1} + \sigma_v^2}\\ \hat{s}_{t|t-1} = a\hat{s}_{t-1|t-1}\\ \Sigma_{t|t-1} = a^2\Sigma_{t-1|t-1} + \sigma^2_w$

where $\mathbf{b}_{t}$ is the posterior at time t and

$\hat{s}_{t|t}$

is the state estimate at time t.

To show that the above equations are correct, I derive the Kalman from the Bayesian recursion above. I start by noting some properties of the system.

Lemma 1: At any time $t \geq 0$ , $s_t$ and $o_t$ are Gaussian random variables.

Proof:

By definition,

$s_1 = as_0 + w_0\\ s_2 = as_1 + w_1 = a(as_0 + w_0) + w_1 = a^2s_0 + aw_0 + w_1\\ s_3 = as_2 + w_2 = a(a^2s_0 + aw_0 + w_1) + w_2 = a^3s_0 + a^2w_0 + aw_1 + w_2\\ \vdots\\ s_t = a^ts_0 + \sum_{k=0}^{t-1}a^{t-k-1}w_k$

Hence, $s_t$ is a linear combination of the random variables $s_0, w_0, w_1,...,w_{t-1}$ . Since all of these variables are individually Gaussian and independent, they are jointly Gaussian. As Gaussians are closed under linear transformations, it follows that $s_t$ is Gaussian.

Now consider the measurement $o_t$ , by definition:

$o_t = cs_t + v_t$

Since, $s_t$ is Gaussian and $v_t$ is Gaussian, $o_t$ is a linear combination of two Gaussians, and is thus also Gaussian.

Using the above lemma, the system can also be written in terms of transition and observation kernels rather than a difference equation:

$s_{t+1} \sim p(s_{t+1} \mid s_t) = \frac{1}{\sigma_w\sqrt[]{2\pi}}\operatorname{exp}\left(-\frac{1}{2}\left(\frac{s_{t+1} - as_t}{\sigma_w}\right)^2\right)\\ o_{t} \sim z(o_{t} \mid s_t) = \frac{1}{\sigma_v\sqrt[]{2\pi}}\operatorname{exp}\left(-\frac{1}{2}\left(\frac{o_t - cs_t}{\sigma_v}\right)^2\right)$ Q.E.D

Another important property of the system is the following.

Lemma: $s_t$ and $o_t$ at any time $t \geq 0$ are jointly Gaussian.

Proof:

By Lemma 1 we know that

$s_t = a^ts_0 + \sum_{k=0}^{t-1}a^{t-k-1}w_k\\ o_t = c\left(a^ts_0 + \sum_{k=0}^{t-1}a^{t-k-1}w_k\right) + v_t$

Since the random variables $(s_0, w_0,w_1,...,w_t,v_0,v_1,...,v_t)$ are jointly Gaussian (see Lemma 1), $s_t$ and $o_t$ are both linear combinations of a set of jointly Gaussian random variables. Let $\mathbf{x}=(s_0, w_0,w_1,...,w_t,v_0,v_1,...,v_t)$ and let $\mathbf{n},\mathbf{m}$ denote the coefficients of the linear combinations for $s_t$ and $o_t$ respectively. Since $\mathbf{n}^T\mathbf{x}$ and $\mathbf{m}^T\mathbf{x}$ are Gaussians, any linear combination of $\mathbf{n}^T\mathbf{x}$ and $\mathbf{m}^T\mathbf{x}$ is a Gaussian. Now consider the vector $(\mathbf{n}^T\mathbf{x}, \mathbf{m}^T\mathbf{x})$ . Since any linear combination of it components is a Gaussian, $(\mathbf{n}^T\mathbf{x}, \mathbf{m}^T\mathbf{x})$ is by definition a multivariate Gaussian. Q.E.D.

Derivation of the univariate Kalman filter

We derive the Kalman filter equations using mathematical induction. We start with $t=1$ . Using Lemmas 1–2 we obtain

$\begin{bmatrix} s_0\\ o_0 \end{bmatrix} \sim N\left( \begin{bmatrix} \hat{s}_0\\ c\hat{s}_0 \end{bmatrix}, \begin{bmatrix} \Sigma_0 & c\Sigma_0\\ c\Sigma_0 & c\Sigma_0 + \sigma_v^2 \end{bmatrix} \right)\\ \begin{bmatrix} s_1\\ o_0 \end{bmatrix} \sim N\left( \begin{bmatrix} a\hat{s}_0\\ c\hat{s}_0 \end{bmatrix}, \begin{bmatrix} a^2\Sigma_0 + \sigma_w^2 & ac\Sigma_0\\ ac\Sigma_0 & c^2\Sigma_0\sigma_v^2 \end{bmatrix} \right)\\ \begin{bmatrix} o_1\\ o_0 \end{bmatrix} \sim N\left( \begin{bmatrix} ac\hat{s}_0\\ c\hat{s}_0 \end{bmatrix}, \begin{bmatrix} c^2(a^2\Sigma_0 + \sigma_w^2) + \sigma_v^2 & ac^2\Sigma_0\\ ac^2\Sigma_0 & c^2\Sigma_0\sigma_v^2 \end{bmatrix} \right)$

We know from probability calculus that the conditional of a multivariate Gaussian is also Gaussian. In particular, applying standard Gaussian transformation rules, we obtain

$p(s_1 \mid o_0) = N(\hat{s}_{1|0}, \Sigma_{1|0})\\ p(o_1 \mid o_0) = N(c\hat{s}_{1|0}, c^2\Sigma_{1|0} + \sigma_v^2)$

where

$\hat{s}_{1|0} = a\hat{x}_0 + \frac{ac\Sigma_0}{c^2\Sigma_0 + \sigma^2_v}(o_0 - c\hat{s}_0) = a\hat{x}_{0|0}\\ \Sigma_{1|0} = a^2\Sigma_0 + \sigma^2_w - \frac{(ac\Sigma_0)^2}{c^2\Sigma_0 + \sigma_v^2} = a^2\Sigma_{0|0} + \sigma_w^2$

and the expectation and variance are

$\mathbb{E}[o_1 \mid o_0] = ac\hat{s}_0 + \frac{ac^2\Sigma_0}{c^2\Sigma_0 + \sigma^2_v}(o_0 - c\hat{s}_0) = c\hat{s}_{1,0}\\ Var(o_1 \mid o_0) = c^2(a^2\Sigma_0 + \sigma^2_w) + \sigma_w^2 - \frac{(ac^2\Sigma_0)^2}{c^2P_0 + \sigma_v^2} = c^2\Sigma_{1|0} + \sigma_w^2$

Furthermore

$\begin{bmatrix} s_1\\ o_1 \end{bmatrix} \mid o_0 \sim N\left( \begin{bmatrix} \hat{s}_{1|0}\\ c\hat{s}_{1|0} \end{bmatrix}, \begin{bmatrix} \Sigma_{1|0} & c\Sigma_{1|0}\\ c\Sigma_{1|0} & c^2\Sigma_{1|0} + \sigma_v^2 \end{bmatrix} \right)$

This means that

$p(s_1 \mid o_0, o_1) = N(\hat{s}_{1|1}, \Sigma_{1|1})$

where

$\hat{s}_{1|1} = \hat{s}_{1|0} + \frac{c\Sigma_{1|0}}{c^2\Sigma_{1|0} + \sigma^2_v}(o_1 - c\hat{x}_{1|0})\\ \Sigma_{1|1} = \Sigma_{1|0} - \frac{(c\Sigma_{1|0})^2}{c^2\Sigma_{1|0} + \sigma_v^2}$

This proves the inductive base case. Now assume by induction that the Kalman filter equations hold for $t=k-1$ . Applying the exact same calculations as above, we obtain

$p(s_k \mid \mathbf{o}_{0:k-1}) = N(\hat{s}_{k|k-1}, \Sigma_{k|k-1})\\ p(s_k \mid \mathbf{o}_{0:k-1}) = N(c\hat{s}_{k|k-1}, c^2\Sigma_{k|k-1} + \sigma_v^2)$

where

$\hat{s}_{k|k-1} = a\hat{s}_{k-1|k-1}$

and

$\Sigma_{k|k-1} = a^2\Sigma_{k-1|k-1} + \sigma_w^2.$

Further

$\begin{bmatrix} s_k\\ o_k \end{bmatrix} \mid \mathbf{o}_{0:k-1} \sim N\left( \begin{bmatrix} \hat{s}_{k|k-1}\\ c\hat{s}_{k|k-1} \end{bmatrix}, \begin{bmatrix} \Sigma_{k|k-1} & c\Sigma_{k|k-1}\\ c\Sigma_{k|k-1} & c^2\Sigma_{k|k-1} + \sigma_v^2 \end{bmatrix} \right)$

which means that

$p(x_k \mid \mathbf{o}_{0:k}) = N(\hat{s}_{k|k}, \Sigma_{k|k}),$

where

$\hat{s}_{k|k} = \hat{s}_{k|k-1} + b_k(o_k-c\hat{s}_{k|k-1})\\ \Sigma_{k|k} = \Sigma_{k|k-1} - cb_k\Sigma_{k|k-1}\\ b_k = \frac{c\Sigma_{k|k-1}}{c^2\Sigma_{k|k-1} + \sigma_v^2}$

Which are exactly the Kalman equations. This completes the induction step. Q.E.D.