learned reverse process variance

A parameterization for the reverse process variance in DDPMs where the model outputs a vector v per dimension, and the variance is computed as Σ_θ(x_t, t) = exp(v · log β_t + (1-v) · log β̃_t) — an interpolation in log-space between the theoretical upper bound β_t and lower bound β̃_t. This is more stable than predicting variances directly since the valid range between the two bounds is very narrow, and working in log-space ensures the interpolation stays within the permissible interval regardless of the network output.

The model is trained with the hybrid objective L_hybrid = L_simple + 0.001 · L_vlb, using a stop-gradient on μ_θ in the L_vlb term so that the variational lower bound loss only guides the variance prediction while L_simple continues to drive the mean prediction. This decomposition prevents the noisy VLB gradients from destabilizing the mean estimation.

A key secondary benefit of learned variances is that they automatically rescale when using fewer diffusion steps, enabling fast sampling with 10-40x fewer steps without the need for hand-tuned variance schedules. This makes learned variances competitive with or superior to DDIM for accelerated generation. Introduced by Nichol & Dhariwal (2021).

Key Details

Interpolation in log-space between β_t and β̃_t
Trained with hybrid loss using λ=0.001
Stop-gradient on mean in VLB term
Enables fast sampling with automatic variance rescaling
Outperforms DDIM at 50+ steps

concept

Alethograph

Explorer

learned reverse process variance

Key Details

Graph View

Backlinks