Marginal Likelihood Computation#
Recall Lemma 2#
Under same HMM setup, we are interested in marginal \(p(y)\):
Then
Marginal Likelihood form#
Consider the HMM:
We know the exact marginal likelihood \(p(y_{1:T})\) is given by
We also note that equivalently
Where
Since we have a fully Gaussian HMM setup, we can use Kalman filter to compute \(p(x_t \mid y_{1:t-1}) = \mathcal{N}(x_t; \hat{m}_t, \hat{P}_t)\), where
So that we have
By Lemma 2, this is
Where we let \(S_t = H \hat{P}_t H^\top + R\) for simplicity
Kalman for marginal likelihood#
The full algorithm for computing \(\log p(y_{1:T}) = \sum_{t=1}^T \log p(y_t \mid y_{1:t-1})\) is given by
Input: Starting point \( m_0, P_0\), and the sequence of observations \( y_{1:T} \) for the specific T.
Set \(\hat{m}_0 = m_0, \hat{P}_0 = P_0\)
Filtering:
For \( n = 1, \dots, T \) doPrediction step:
\[\begin{align*} \hat{m}_t &= \theta m_{t-1} \\ \hat{P}_t &= \theta P_{t-1} \theta^\top + Q \end{align*}\]Update step:
\[\begin{align*} S_t &= H \hat{P}_t H^\top + R \\ K_t &= \hat{P}_t H^\top (S_t)^{-1} \\ m_t &= \hat{m}_t + K_t (y_t - H \hat{m}_t) \\ P_t &= (I - K_t H) \hat{P}_t \end{align*}\]End for
Return \( \hat{m}_{1:T}, S_{1:T}\)
And we output
BPF for Marginal Likelihood#
At each t, the weights are given by \(p(y_t | x_t^{(i)})\), where \(x_t^{(i)}\) are the particles at \(t\).
The average weights give un unbiased estimate of marginal likelihood:
Since we have
Where the conditional likelihood
can be estimated by BPF:
Where \(\tilde{x}_n^{(i)}\) are the final resampled outputs of the BPF. So that the full marginal likelihood is given by
As we are working with the log domain, we have
Pseudocode#
Recall setup:
The whole BPF pseudocode is given by
Sample:
\( x_0^{(i)} \sim \pi_0 \) for \( i = 1, \ldots, N. \)
for \( t = 1, \ldots, T \) do
Sample: \( \bar{x}_t^{(i)} \sim \tau_{\theta}(\cdot|x_{t-1}^{(i)}) \quad \text{for} \quad i = 1, \ldots, N. \)
Weight: \( W_t^{(i)} = g_{\theta}(y_t|\bar{x}_t^{(i)}), \) for \( i = 1, \ldots, N \).
Store: \( p^N(y_n|y_{1:n-1}) = \frac{1}{N} \sum_{i=1}^{N} W_t^{(i)} \)
Normalize \( w_t^{(i)} = \frac{W_t^{(i)}}{\sum_{j=1}^{N} W_t^{(j)}}, \)
Resample: Sample \( o_t(1), \dots, o_t(N) \sim \) Multinomial( \(w_t^{(1)}, \dots, w_t^{(N)}\) ), and set \( x_t^{(i)} = \bar{x}_t^{(o_t(i))} \) for \( i = 1, \ldots, N \).
Return \(p^N(y_{1:n}) = \prod_{k=1}^{t} p^N(y_k|y_{1:k-1})\)
# example setup
import numpy as np
def log_p(y, y_pred, R):
'''y: scalar, y_pred: (N,), R: scalar variance'''
diff = y_pred - y # (N,)
mahalanobis = (diff ** 2) / R # (N,)
log_det_R = np.log(R)
log_likelihoods = -0.5 * (mahalanobis + np.log(2 * np.pi) + log_det_R)
return log_likelihoods
def bpf_q2(Y, A, Q, H, R, m0, P0, N=500):
T = len(Y)
W = np.zeros((T, N))
log_marginal_likelihood = 0.0
# Step 1: init particles
x = np.random.normal(loc=m0, scale=np.sqrt(P0), size=N) # (N,)
particle_history = []
for t in range(T):
# Step 2: Propagation
noise = np.random.normal(loc=0.0, scale=np.sqrt(Q), size=N)
x = A * x + noise # (N,)
# Step 3: Weight Update
y = Y[t]
y_pred = H * x # (N,)
log_w = log_p(y, y_pred, R)
# compute average weight using log-sum-exp trick
# log sum exp( log W) - log N
max_log = np.max(log_w)
log_sum = np.log(np.sum(np.exp(log_w - max_log))) + max_log
log_marginal_likelihood += log_sum - np.log(N)
# Normalize log-weights in log-domain
# same as weights /= np.sum(weights)
log_w -= np.log(np.sum(np.exp(log_w - np.max(log_w)))) + np.max(log_w)
weights = np.exp(log_w)
W[t] = weights
# Step 4: Resampling
indices = np.random.choice(N, size=N, p=weights)
x = x[indices]
particle_history.append(x.copy())
return log_marginal_likelihood