Abstract
A class of generative models that unifies flow-based and diffusion-based methods is introduced. These models extend the framework proposed in Albergo & Vanden-Eijnden (2023), enabling the use of a broad class of continuous-time stochastic processes called ‘stochastic interpolants’ to bridge any two arbitrary probability density functions exactly in finite time. These interpolants are built by combining data from the two prescribed densities with an additional latent variable that shapes the bridge in a flexible way.
Summary
The stochastic interpolant framework provides a mathematically rigorous unification of flow-based and diffusion-based generative models. A stochastic interpolant is defined as where interpolates between endpoints (, ), controls latent noise (, for ), with marginals , and independent of the data. The density of is proved to be absolutely continuous and strictly positive, satisfying both a transport equation and forward/backward Fokker-Planck equations with tunable diffusion coefficient .
The velocity and score are characterized as unique minimizers of simple quadratic objectives (Theorems 7, 8). This yields three equivalent generative models: a probability flow ODE , a forward SDE with , and a backward SDE. Crucially, the density is the same for all three — the interpolant construction is decoupled from the sampling process.
The framework’s key advantages over your random bridge approach: (1) the latent variable smooths the intermediate density, eliminating spurious modes that appear in pure linear interpolation; (2) the coupling between can be arbitrary (independent, OT, or data-adapted); (3) the Schrödinger bridge is recovered as the solution to a max-min problem over interpolants (Theorem 41, Section 3.4).
Key Contributions
- General stochastic interpolant with rigorous density theory (Theorem 6)
- Velocity and score as unique minimizers of quadratic objectives (Theorems 7, 8)
- Simultaneous derivation of ODE and SDE generative models with tunable
- Likelihood control: SDE models bound KL divergence via velocity+score loss (Theorem 23); ODE models additionally require Fisher divergence control
- Recovery of score-based diffusion as one-sided interpolant with time reparameterization (Section 5.1)
- Recovery of rectified flow as linear interpolant (Section 5.3)
- Recovery of Schrödinger bridge via optimization over interpolant function (Section 3.4, Theorem 41)
- Remark 48 explicitly states: “straight line solutions is a necessary condition for optimal transport, but it is not sufficient” — confirming the rectified-flow-doesn’t-solve-OT result
Methodology
The interpolant bridges and by construction. The velocity is learned by minimizing . The score is learned via . For the spatially linear case , the velocity factorizes as where , , .
Key Findings
- The latent variable smooths intermediate densities, suppressing spurious modes (Figure 4)
- The diffusion coefficient affects sample trajectories but NOT the density — a decoupling not present in standard diffusion models
- For ODE-based generative models: learning alone is insufficient for likelihood control; one must also control the Fisher divergence (unlike SDE models)
- Rectification (Section 5.3) produces straight-line flows but does NOT guarantee optimal transport (Remark 48): “straight line solutions is a necessary condition for optimal transport, but it is not sufficient”
- Optimizing over gradient velocity fields and iterating rectification converges to Brenier’s polar decomposition (Remark 49, citing Liu 2022)
Important References
- Flow Matching for Generative Modeling — Concurrent work arriving at similar conditional objectives via a different derivation
- Score-Based Generative Modeling through Stochastic Differential Equations — Score-based diffusion recovered as one-sided interpolant
- Flow Straight and Fast — Rectified flow recovered as γ=0 linear interpolant; rectification analyzed in Section 5.3