Diffusion Schrödinger Bridge with Applications to Score-Based Generative Modeling

Abstract

Progressively applying Gaussian noise transforms complex data distributions to approximately Gaussian. Reversing this dynamic defines a generative model. When the forward noising process is given by a Stochastic Differential Equation (SDE), Song et al. (2021) demonstrate how the time inhomogeneous drift of the associated reverse-time SDE may be estimated using score-matching. A limitation of this approach is that the forward-time SDE must be run for a sufficiently long time for the final distribution to be approximately Gaussian while ensuring that the corresponding time-discretization error is controlled. In contrast, solving the Schrödinger Bridge (SB) problem, i.e. an entropy-regularized optimal transport problem on path spaces, yields diffusions which generate samples from the data distribution in finite time. We present Diffusion SB (DSB), an original approximation of the Iterative Proportional Fitting (IPF) procedure to solve the SB problem, and provide theoretical analysis along with generative modeling experiments.

Summary

This paper reformulates score-based generative modeling as a Schrödinger bridge problem — an entropy-regularized optimal transport problem on path spaces. Standard SGM methods rely on running a forward noising SDE long enough for the terminal distribution to approximate a Gaussian, creating a tension between approximation quality and discretization error. The authors show that standard SGM corresponds to only the first iteration of an Iterative Proportional Fitting (IPF) procedure. Additional IPF iterations progressively reduce the mismatch between the terminal distribution and the prior, enabling accurate generation with shorter time horizons and fewer steps.

The core algorithmic contribution is DSB (Diffusion Schrödinger Bridge), which approximates each IPF iteration by learning neural network drift functions via L² regression losses (mean-matching formulation). Rather than estimating score functions and potentials directly, DSB reduces each IPF half-step to a simple regression problem. The algorithm alternates between learning backward transitions (projecting onto the prior constraint) and forward transitions (projecting onto the data constraint). The paper provides the first quantitative convergence bounds for SGM methods, showing that TV error decomposes into score approximation error, discretization error, and mixing error.

DSB also serves as a continuous-state-space analogue of the Sinkhorn algorithm, opening the door to computational optimal transport between empirical distributions in high dimensions, including dataset-to-dataset interpolation.

Key Contributions

Reformulation of SGM as a Schrödinger Bridge / entropy-regularized OT problem
DSB algorithm implementing IPF via neural network regression losses
Novel IPF iterate representation via forward/backward transition kernels suited to score-based approximation
First quantitative convergence bounds for SGM methods (Theorem 1)
Convergence theory for IPF in continuous non-compact state spaces
Continuous-time IPF formulation on diffusion path measures
Dataset-to-dataset interpolation via non-Gaussian priors

Methodology

DSB alternates IPF half-steps: (1) Given forward dynamics, simulate forward from p_data, learn backward drift via L² regression; (2) Given backward dynamics, simulate backward from p_prior, learn forward drift via L² regression. The first iteration (n=0) uses reference drift f(x) = -αx, recovering standard SGM. Each iteration refines both drifts. Networks use U-net architecture from Nichol & Dhariwal (2021), initialized from previous iterations.

Key Findings

First DSB iteration exactly recovers standard SGM (Song et al. 2021)
Additional iterations consistently improve FID across all step counts
Effective generation with as few as N=12 steps with T=0.2
Converges to known analytic SB solution in Gaussian settings
FID improves monotonically with DSB iterations on MNIST and CelebA
Supports non-Gaussian priors for dataset-to-dataset interpolation (EMNIST to MNIST)

Important References

Score-Based Generative Modeling through Stochastic Differential Equations — SGM framework that DSB generalizes
Diffusion Schrödinger Bridge Matching — Improved SB numerics building on DSB
Improved Denoising Diffusion Probabilistic Models — Architecture and techniques used

Atomic Notes

paper

Alethograph

Explorer