HMM Intro#

THis is an intro to Gaussian Hidden Markov Model

Lemma#

We restate two basic useful lemmas here.

Lemma 1 Posterior update#

\[\begin{align*} x &\sim \mathcal{N}(m_0, P_0) \\ y \mid x &\sim \mathcal{N}(H x, R) \end{align*}\]

Then

\[\begin{align*} p(x|y) &\propto p(x) \cdot p(y|x) \\ &\propto \mathcal{N}(m_0, P_0) \cdot \mathcal{N}(H x, R) \\ &= \mathcal{N}(\tilde{m}, \tilde{P}) \end{align*}\]

Where

\[ \tilde{P} = (P_0^{-1} + H^\top R^{-1} H)^{-1} \]
\[ \tilde{m} = \tilde{P} \left( P_0^{-1} m_0 + H^\top R^{-1} y \right). \]

Equivalently

\[ \tilde{P} = P_0 - P_0 H^T (R + HP_0H^T)^{-1} HP_0 \]
\[ \tilde{m} = m_0 + P_0 H^T (R + HP_0H^T)^{-1}(Y - Hm_0) \]

Lemma 1 under HMM setup#

Consider the HMM setup:

\[\begin{align*} x_0 &\sim \mathcal{N}(m_0, P_0) \\ x_t \mid x_{t-1} &\sim \mathcal{N}(A x_{t-1}, Q) \\ y_t \mid x_t &\sim \mathcal{N}(H x_t, R) \end{align*}\]

say we are interested in \(p(x_t | x_{t-1}, y_t)\), which is the optimal proposal for particle filter:

\[ p(x_t | x_{t-1}, y_t) \propto p(x_t | x_{t-1}) p(y_t | x_t) \]
\[ \propto \mathcal{N}(x_t; A x_{t-1}, Q) \cdot \mathcal{N}(y; H x_t, R) \]
\[ = \mathcal{N}(x_t; \tilde{m}_t, \tilde{P}), \]

First way (shorter):

\[ \tilde{P} = (Q^{-1} + H^\top R^{-1} H)^{-1} \]
\[ \tilde{m}_t = \tilde{P} \left( Q^{-1} AX_{t-1} + H^\top R^{-1} y_t \right). \]

Second way:

\[ \tilde{P} = Q - Q H^T (R + HQH^T)^{-1} HQ \]
\[ \tilde{m}_t = AX_{t-1} + Q H^T (R + HQH^T)^{-1}(Y - HAX_{t-1}) \]

This is an example implementation:

import numpy as np

def method1(A, Q, H, R, X_t_minus_1, Y):
    """
    Computes tilde_P and tilde_m_t using Lemma 3.1:
    tilde_P = Q - Q H^T (R + HQH^T)^{-1} HQ
    tilde_m_t = AX_{t-1} + Q H^T (R + HQH^T)^{-1}(Y - HAX_{t-1})
    """
    HQ = H @ Q
    S = R + HQ @ H.T
    S_inv = np.linalg.inv(S)
    
    tilde_P = Q - Q @ H.T @ S_inv @ HQ

    AX = X_t_minus_1 @ A.T  # (N, d)
    HAX = AX @ H.T          # (N, m)
    tilde_m = AX + (Y - HAX) @ (Q @ H.T @ S_inv).T  # (N, d)
    
    return tilde_P, tilde_m

def method2(A, Q, H, R, X_t_minus_1, Y):
    """
    tilde_P = (Q^{-1} + H^T R^{-1} H)^{-1}
    tilde_m_t = tilde_P (Q^{-1} A x_{t-1} + H^T R^{-1} y_t)
    """
    Q_inv = np.linalg.inv(Q)
    R_inv = np.linalg.inv(R)
    tilde_P_inv = Q_inv + H.T @ R_inv @ H
    tilde_P = np.linalg.inv(tilde_P_inv)

    AX = X_t_minus_1 @ A.T                # (N, d)
    tilde_m = (AX @ Q_inv.T + Y @ R_inv @ H) @ tilde_P.T  # (N, d)
    
    return tilde_P, tilde_m

The two methods should agree except for ill-conditioned parameters.

We check if they agree on this setup:

\[\begin{align*} A &= \begin{bmatrix} I_2 & \kappa I_2 \\ 0_{2\times 2} & 0.99 I_2 \end{bmatrix}, \quad H = I_4, \quad Q = \begin{bmatrix} \frac{\kappa^3}{3} I_2 & \frac{\kappa^2}{2} I_2 \\ \frac{\kappa^2}{2} I_2 & \kappa I_2 \end{bmatrix}, \quad R = r I_4 \end{align*}\]
\[\begin{align*} x_0 &\sim \mathcal{N}(m_0, P_0) \\ x_t \mid x_{t-1} &\sim \mathcal{N}(A x_{t-1}, Q) \\ y_t \mid x_t &\sim \mathcal{N}(H x_t, R) \end{align*}\]
\[ p(x_t | x_{t-1}, y_t) \propto p(x_t | x_{t-1}) p(y_t | x_t) \]
\[ = \mathcal{N}(x_t; \tilde{m}_t, \tilde{P}), \]
print('checking if two methods agree on random input')

I2 = np.eye(2)
I4 = np.eye(4)

kappa=0.1
r = 0.1
A = np.block([
    [I2, kappa * I2],
    [np.zeros((2, 2)), 0.99 * I2]
])

Q = np.block([
    [(kappa**3 / 3) * I2, (kappa**2 / 2) * I2],
    [(kappa**2 / 2) * I2, kappa * I2]
])

H = I4
R = r * I4
X_t_minus_1, Y = np.random.randn(1, 4), np.random.randn(4)
output1 = method1(A, Q, H, R, X_t_minus_1, Y)
output2 = method2(A, Q, H, R, X_t_minus_1, Y)

print(output1)
print(output2)
checking if two methods agree on random input
(array([[0.0002079 , 0.        , 0.0024948 , 0.        ],
       [0.        , 0.0002079 , 0.        , 0.0024948 ],
       [0.0024948 , 0.        , 0.04993763, 0.        ],
       [0.        , 0.0024948 , 0.        , 0.04993763]]), array([[ 0.93561215, -2.2784819 , -0.85038407, -0.315009  ]]))
(array([[0.0002079 , 0.        , 0.0024948 , 0.        ],
       [0.        , 0.0002079 , 0.        , 0.0024948 ],
       [0.0024948 , 0.        , 0.04993763, 0.        ],
       [0.        , 0.0024948 , 0.        , 0.04993763]]), array([[ 0.93561215, -2.2784819 , -0.85038407, -0.315009  ]]))

Lemma 2 Marginal Likelihood#

Under same HMM setup, we are interested in marginal \(p(y)\):

\[\begin{align*} x &\sim \mathcal{N}(m_0, P_0) \\ y \mid x &\sim \mathcal{N}(A x + b, Q) \end{align*}\]

Then

\[\begin{align*} p(y) &= \int p(x) \; p(y|x) dx \\ &= \mathcal{N}(A m_0 + b, Q + APA^\top) \end{align*}\]

This is useful for marginal likelihood update.

Conclusion#

With those 2 lemmas, we are equiped to derive the Kalman filter. Again consider HMM setup

\[\begin{align*} \pi_0(x_0) &= \mathcal{N}(x_0; m_0, P_0) \\ \tau(x_t | x_{t-1}) &= \mathcal{N}(x_t; A x_{t-1}, Q) \\ g(y_t | x_t) &= \mathcal{N}(y_t; H x_t, R) \end{align*}\]

At each time step \(t\), the Kalman filter computes 2 distributions.

Given \(y_{1:t-1}\), one obtains the predictive distribution \(p(x_t \mid y_{1:t-1}) = \mathcal{N}(x_t; \hat{m}_t, \hat{P}_t)\), then using the information \(y_t\), one performs update and obtains the posterior \(p(x_t \mid y_{1:t}) = \mathcal{N}(x_t; m_t, P_t)\)

Assuming \(x_{t-1} \mid y_{1:t-1} \sim \mathcal{N}(m_{t-1}, P_{t-1})\), then using Lemma 2 we have

\[\begin{align*} p(x_t \mid y_{1:t-1}) &= \int p(x_t \mid x_{t-1}) \; p(x_{t-1} | y_{1:t-1}) dx_{t-1} \\ &= \int \mathcal{N}(x_t; A x_{t-1}, Q) \; \mathcal{N}(x_{t-1}; m_{t-1}, P_{t-1}) dx_{t-1}\\ &= \mathcal{N}(x_t; A m_{t-1}, A P_{t-1} A^\top + Q) \end{align*}\]

So we set

\[\begin{align*} \hat{m}_t &= A m_{t-1} \\ \hat{P}_t &= A P_{t-1} A^\top + Q \end{align*}\]

This is our prediction / filtering step.

Now given \(x_{t} \mid y_{1:t-1} \sim \mathcal{N}(\hat{m}_t, \hat{P}_t)\), we update with \(y_t\) to obtain \(p(x_t \mid y_{1:t}) = \mathcal{N}(x_t; m_t, P_t)\)

Using Lemma 1 we have

\[\begin{align*} p(x_t \mid y_{1:t}) &\propto p(x_{t} \mid y_{1:t-1}) \; p(y_t | x_t), \\ &\propto \mathcal{N}(x_t; \hat{m}_t, \hat{P}_t) \; \mathcal{N}(y_t; H x_t, R) \\ &= \mathcal{N}(x_t; m_t, P_t) \end{align*}\]

Where

\[\begin{align*} P_t &= \hat{P}_t - \hat{P}_t H^\top (H \hat{P}_t H^\top + R)^{-1} H \hat{P}_t\\ m_t &= \hat{m}_t + \hat{P}_t H^\top (H \hat{P}_t H^\top + R)^{-1} (y_t - H \hat{m}_t) \\ \end{align*}\]

Or equivalently

\[\begin{align*} K_t &= \hat{P}_t H^\top (H \hat{P}_t H^\top + R)^{-1} \\ m_t &= \hat{m}_t + K_t (y_t - H \hat{m}_t) \\ P_t &= (I - K_t H) \hat{P}_t \end{align*}\]