Summary

This paper reformulates score-based generative modeling as a Schrödinger bridge problem — an entropy-regularized optimal transport problem on path spaces. Standard SGM methods rely on running a forward noising SDE long enough for the terminal distribution to approximate a Gaussian, creating a tension between approximation quality and discretization error. The authors show that standard SGM corresponds to only the first iteration of an Iterative Proportional Fitting (IPF) procedure. Additional IPF iterations progressively reduce the mismatch between the terminal distribution and the prior, enabling accurate generation with shorter time horizons and fewer steps.

The core algorithmic contribution is DSB (Diffusion Schrödinger Bridge), which approximates each IPF iteration by learning neural network drift functions via L² regression losses (mean-matching formulation). Rather than estimating score functions and potentials directly, DSB reduces each IPF half-step to a simple regression problem. The algorithm alternates between learning backward transitions (projecting onto the prior constraint) and forward transitions (projecting onto the data constraint). The paper provides the first quantitative convergence bounds for SGM methods, showing that TV error decomposes into score approximation error, discretization error, and mixing error.

DSB also serves as a continuous-state-space analogue of the Sinkhorn algorithm, opening the door to computational optimal transport between empirical distributions in high dimensions, including dataset-to-dataset interpolation.

Key Contributions

  • Reformulation of SGM as a Schrödinger Bridge / entropy-regularized OT problem
  • DSB algorithm implementing IPF via neural network regression losses
  • Novel IPF iterate representation via forward/backward transition kernels suited to score-based approximation
  • First quantitative convergence bounds for SGM methods (Theorem 1)
  • Convergence theory for IPF in continuous non-compact state spaces
  • Continuous-time IPF formulation on diffusion path measures
  • Dataset-to-dataset interpolation via non-Gaussian priors

Methodology

DSB alternates IPF half-steps: (1) Given forward dynamics, simulate forward from p_data, learn backward drift via L² regression; (2) Given backward dynamics, simulate backward from p_prior, learn forward drift via L² regression. The first iteration (n=0) uses reference drift f(x) = -αx, recovering standard SGM. Each iteration refines both drifts. Networks use U-net architecture from Nichol & Dhariwal (2021), initialized from previous iterations.

Key Findings

  • First DSB iteration exactly recovers standard SGM (Song et al. 2021)
  • Additional iterations consistently improve FID across all step counts
  • Effective generation with as few as N=12 steps with T=0.2
  • Converges to known analytic SB solution in Gaussian settings
  • FID improves monotonically with DSB iterations on MNIST and CelebA
  • Supports non-Gaussian priors for dataset-to-dataset interpolation (EMNIST to MNIST)

Important References

  1. Score-Based Generative Modeling through Stochastic Differential Equations — SGM framework that DSB generalizes
  2. Diffusion Schrödinger Bridge Matching — Improved SB numerics building on DSB
  3. Improved Denoising Diffusion Probabilistic Models — Architecture and techniques used

Atomic Notes


paper