Variational Filtering, Rebuilt From the Linear Case

A scalar linear-Gaussian benchmark made the VBF edge-factor implementation auditable before moving to nonlinear filtering.

Series: VBF Experiments, April 2026

This week started by rebuilding the variational Bayesian filtering experiments around a scalar linear-Gaussian state-space model. The goal was not to win on a toy problem. The goal was to make the mechanics testable before asking nonlinear questions:

\[ z_t = z_{t-1} + w_t,\quad w_t \sim \mathcal{N}(0, Q) \]\[ y_t = x_t z_t + v_t,\quad v_t \sim \mathcal{N}(0, R) \]

The filter carried a strict online marginal \(q^F_t(z_t)\) plus an edge/backward conditional \(q^B_t(z_{t-1} \mid z_t)\). That made the posterior edge factor explicit:

\[ q^E_t(z_t,z_{t-1}) = q^F_t(z_t)q^B_t(z_{t-1} \mid z_t) \]

The immediate implementation work created JAX training and evaluation paths, analytic Kalman references, posterior diagnostics, predictive metrics, and plotting under src/vbf/, scripts/, and experiments/linear_gaussian/.

What Had To Be True

The linear-Gaussian benchmark gave us closed-form answers. That allowed several checks which are hard to get in the nonlinear case:

The frozen marginal control was especially important. It showed that the edge/backward conditional could be learned while the filtering marginal remained exact. In the five-seed diagnostic baseline, the frozen marginal row had state NLL 0.401983, coverage 0.900220, and variance ratio 1.000006, matching the exact Kalman marginal.

Linear-Gaussian diagnostics over training

What The Sweeps Found

The scalar report split results into weak-observability and randomized \(Q/R\) regimes.

RegimeStrong referenceLearned rowMain result
nominal sinusoidal \(x_t\)exact Kalman state NLL 0.402self-fed supervised state NLL 0.415supervised residualized filter was close
weak sinusoidalexact Kalman state NLL 1.175MC ELBO state NLL 1.291unsupervised MC ELBO was under-dispersed
zero unobservableexact Kalman state NLL 2.740MC ELBO state NLL 7.010local ELBO collapsed variance badly
randomized \(Q/R\)frozen marginal matched Kalmanregime-local supervised was near referencegeneralization needed explicit regime calibration

The most useful negative result was that direct non-residualized ELBO training was weak. The strong rows depended on preserving analytic structure: residualized updates, frozen exact marginal controls, or oracle/reference calibration. That shaped the later nonlinear work: when results were good, the report had to say whether they were fully unsupervised or assisted by reference information.

ELBO And Predictive Checks

The MC ELBO ablation behaved smoothly in the scalar case. Increasing samples improved state NLL from 0.575477 at one sample and 1000 steps to 0.542554 at 32 samples and 1000 steps. It still did not match exact Kalman state NLL 0.401983.

The predictive head experiment was a separate caution. With only short training it was poor, but at 3000 steps a learned predictive head on oracle belief reached predictive NLL 0.625870, close to exact Kalman predictive NLL 0.600858. A learned head on the ELBO belief reached 0.670554, while the analytic predictive from the same ELBO belief was 0.640331.

The interpretation was simple: the predictive machinery could be learned, but it was not a free substitute for a calibrated belief.

Why This Mattered

The linear-Gaussian work made the later nonlinear reports stricter. The important labels were:

LabelMeaning
unsupervisedtraining used observations, known transition, known observation model, and prior
reference_distilledtraining used grid/reference moments, densities, or rollout targets
oracle_calibratedtraining used reference variance targets or oracle posterior statistics

That labeling became the backbone of the nonlinear series. The scalar benchmark was ready once the report could say: self-fed supervised and frozen-marginal controls are useful diagnostics, but vanilla unsupervised ELBO remains under-dispersed in the regimes that matter.

Source artifacts: