Abstract
Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise. Crucially, the reverse-time SDE depends only on the time-dependent gradient field (aka, score) of the perturbed data distribution. By leveraging advances in score-based generative modeling, we can accurately estimate these scores with neural networks, and use numerical SDE solvers to generate samples.
Summary
This paper proposes a unified framework for score-based generative modeling grounded in stochastic differential equations. Rather than perturbing data with discrete noise scales, the authors consider a continuum of distributions evolving according to a forward SDE that transforms data into a simple prior. Drawing on Anderson (1982), this forward process is reversed via a reverse-time SDE whose drift depends on the score function ∇_x log p_t(x) — estimated by a neural network via continuous denoising score matching.
The framework unifies two previously distinct model families: SMLD corresponds to a Variance Exploding (VE) SDE, and DDPM corresponds to a Variance Preserving (VP) SDE. The sub-VP SDE achieves superior likelihoods. Beyond unification, the continuous perspective unlocks: a predictor-corrector sampling algorithm combining numerical SDE solvers with score-based MCMC; and the probability flow ODE — a deterministic ODE sharing the same marginals as the SDE, enabling exact likelihood computation, uniquely identifiable encodings, and efficient adaptive-stepsize sampling.
The method also enables controllable generation without retraining (class-conditional generation, inpainting, colorization) by modifying the reverse-time SDE with a conditional score. Combined with architecture improvements (NCSN++, DDPM++), it achieves FID 2.20 and IS 9.89 on CIFAR-10 and the first 1024×1024 samples from a score-based model.
Key Contributions
- Unified SDE framework encapsulating SMLD and DDPM as VE-SDE and VP-SDE discretizations
- Reverse-time SDE formulation requiring only the learned score function
- Predictor-corrector sampling combining SDE solvers with Langevin MCMC correction
- Probability flow ODE enabling exact likelihood, unique encodings, and efficient sampling
- Sub-VP SDE achieving state-of-the-art likelihoods (2.99 bits/dim on CIFAR-10)
- Controllable generation without retraining via conditional reverse-time SDE
- Record FID 2.20 and IS 9.89 on unconditional CIFAR-10
Methodology
VE-SDE: dx = √(d[σ²(t)]/dt) dw with geometric σ schedule. VP-SDE: dx = -½β(t)x dt + √(β(t)) dw with linear β schedule. Score estimation via continuous weighted denoising score matching (Eq. 7). PC sampling: predictor (Euler-Maruyama/reverse diffusion) + corrector (Langevin MCMC). Probability flow ODE: dx = [f(x,t) - ½g(t)²∇log p_t(x)]dt. Likelihood via Hutchinson trace estimation.
Key Findings
- PC sampling consistently improves FID over predictor-only or corrector-only methods
- VE SDEs produce better sample quality; VP/sub-VP produce better likelihoods
- NCSN++ cont. (deep, VE) achieves FID 2.20 on CIFAR-10
- Probability flow ODE gives better likelihood bounds than ELBO
- ODE encoding is uniquely identifiable across different trained models
- Method scales to 1024×1024 CelebA-HQ images
Important References
- Denoising Diffusion Probabilistic Models — DDPM, unified as VP-SDE discretization
- Improved Denoising Diffusion Probabilistic Models — Concurrent improvements to DDPM
- Flow Straight and Fast — Shows PF-ODE is a special case of rectified flow