Summary

This paper proposes a unified framework for score-based generative modeling grounded in stochastic differential equations. Rather than perturbing data with discrete noise scales, the authors consider a continuum of distributions evolving according to a forward SDE that transforms data into a simple prior. Drawing on Anderson (1982), this forward process is reversed via a reverse-time SDE whose drift depends on the score function ∇_x log p_t(x) — estimated by a neural network via continuous denoising score matching.

The framework unifies two previously distinct model families: SMLD corresponds to a Variance Exploding (VE) SDE, and DDPM corresponds to a Variance Preserving (VP) SDE. The sub-VP SDE achieves superior likelihoods. Beyond unification, the continuous perspective unlocks: a predictor-corrector sampling algorithm combining numerical SDE solvers with score-based MCMC; and the probability flow ODE — a deterministic ODE sharing the same marginals as the SDE, enabling exact likelihood computation, uniquely identifiable encodings, and efficient adaptive-stepsize sampling.

The method also enables controllable generation without retraining (class-conditional generation, inpainting, colorization) by modifying the reverse-time SDE with a conditional score. Combined with architecture improvements (NCSN++, DDPM++), it achieves FID 2.20 and IS 9.89 on CIFAR-10 and the first 1024×1024 samples from a score-based model.

Key Contributions

  • Unified SDE framework encapsulating SMLD and DDPM as VE-SDE and VP-SDE discretizations
  • Reverse-time SDE formulation requiring only the learned score function
  • Predictor-corrector sampling combining SDE solvers with Langevin MCMC correction
  • Probability flow ODE enabling exact likelihood, unique encodings, and efficient sampling
  • Sub-VP SDE achieving state-of-the-art likelihoods (2.99 bits/dim on CIFAR-10)
  • Controllable generation without retraining via conditional reverse-time SDE
  • Record FID 2.20 and IS 9.89 on unconditional CIFAR-10

Methodology

VE-SDE: dx = √(d[σ²(t)]/dt) dw with geometric σ schedule. VP-SDE: dx = -½β(t)x dt + √(β(t)) dw with linear β schedule. Score estimation via continuous weighted denoising score matching (Eq. 7). PC sampling: predictor (Euler-Maruyama/reverse diffusion) + corrector (Langevin MCMC). Probability flow ODE: dx = [f(x,t) - ½g(t)²∇log p_t(x)]dt. Likelihood via Hutchinson trace estimation.

Key Findings

  • PC sampling consistently improves FID over predictor-only or corrector-only methods
  • VE SDEs produce better sample quality; VP/sub-VP produce better likelihoods
  • NCSN++ cont. (deep, VE) achieves FID 2.20 on CIFAR-10
  • Probability flow ODE gives better likelihood bounds than ELBO
  • ODE encoding is uniquely identifiable across different trained models
  • Method scales to 1024×1024 CelebA-HQ images

Important References

  1. Denoising Diffusion Probabilistic Models — DDPM, unified as VP-SDE discretization
  2. Improved Denoising Diffusion Probabilistic Models — Concurrent improvements to DDPM
  3. Flow Straight and Fast — Shows PF-ODE is a special case of rectified flow

Atomic Notes


paper