Generalized Schrödinger Bridge Matching

Abstract

Modern distribution matching algorithms for training diffusion or flow models directly prescribe the time evolution of the marginal distributions between two boundary distributions. In this work, we consider a generalized distribution matching setup, where these marginals are only implicitly described as a solution to some task-specific objective function. The problem setup, known as the Generalized Schrödinger Bridge (GSB), appears prevalently in many scientific areas both within and without machine learning. We propose Generalized Schrödinger Bridge Matching (GSBM), a new matching algorithm inspired by recent advances, generalizing them beyond kinetic energy minimization and to account for task-specific state costs. We show that such a generalization can be cast as solving conditional stochastic optimal control, for which efficient variational approximations can be used, and further debiased with the aid of path integral theory. Compared to prior methods for solving GSB problems, our GSBM algorithm better preserves a feasible transport map between the boundary distributions throughout training, thereby enabling stable convergence and significantly improved scalability.

Summary

This paper generalizes the Schrödinger bridge matching framework (specifically DSBM) to handle task-specific state costs beyond the standard kinetic energy minimization. While the standard SB problem seeks the minimum kinetic energy transport between two distributions, the Generalized Schrödinger Bridge (GSB) adds a state cost $V_{t} (X_{t})$ to the objective, enabling the incorporation of physical constraints (obstacles), mean-field interactions (congestion/entropy costs), geometric priors (manifold adherence), and latent-space guidance for image translation.

The key theoretical insight is the decomposition of the GSB into an alternating optimization between two stages: Stage 1 (matching) recovers exactly the same implicit/explicit matching algorithms from bridge matching, conditional flow matching, and entropic action matching; Stage 2 (path optimization) becomes a conditional stochastic optimal control (CondSOC) problem conditioned on endpoint pairs $(x_{0}, x_{1})$ , which GSBM solves via Gaussian path approximation and optionally refines using path integral control.

This paper is highly relevant to Shelley & Mengütürk (2025) through several deep connections: (1) Table 1 of GSBM explicitly shows that setting $V_{t} = 0$ and $σ \to 0$ recovers rectified flow (straight-line interpolation), while $V_{t} = 0$ with $σ > 0$ recovers DSBM (Brownian bridge) — the same limiting hierarchy that the random bridge framework establishes from a different direction; (2) Lemma 3 shows that the CondSOC solution for quadratic $V_{t}$ is a Gaussian path $X_{t} \sim N (c_{t} x_{0} + e_{t} x_{1}, γ_{t}^{2} I_{d})$ whose coefficients recover the Brownian bridge as $α \to 0$ , suggesting that the GSBM Gaussian path approximation generalizes the stochastic interpolant framework; (3) the connection between CondSOC and the HJB PDE (Eq. 18) where $u_{t}^{*} = σ^{2} \nabla lo g Ψ_{t}$ mirrors the score-based drift structure that Shelley & Mengütürk derive through Tweedie’s formula for bridge drifts.

Key Contributions

Introduces the GSBM algorithm: a matching-based method for solving GSB problems that maintains feasibility (transport between boundary distributions) throughout training
Shows that Stage 1 of the alternating optimization reduces to standard matching (implicit or explicit), unifying entropic action matching, bridge matching, and conditional flow matching as special cases
Formulates Stage 2 as conditional stochastic optimal control (CondSOC) and solves it via Gaussian path approximation with spline parameterization
Provides path integral control theory for debiasing the Gaussian approximation via importance sampling
Proves local convergence (monotone decrease of objective) and that the GSB optimum is a fixed point
Demonstrates that DSBM, rectified flow, and flow matching are all special cases when $V_{t} = 0$ (Table 1)
Achieves stable convergence and significant scalability improvements (d > 12K) over prior methods (DeepGSB)

Methodology

GSBM alternates between two stages:

Stage 1 (Matching): Given fixed marginals $p_{t}$ , learn the drift $u_{t}^{θ}$ via either the implicit matching loss (entropic action matching, which returns the unique gradient field with minimal kinetic energy) or the explicit matching loss (bridge matching/flow matching, which regresses onto conditional drifts). Proposition 1 proves these are equivalent at optimality.

Stage 2 (Path Optimization via CondSOC): Given the coupling $p_{0, 1}^{θ}$ from Stage 1, optimize the conditional probability paths $p_{t} (X_{t} ∣ x_{0}, x_{1})$ by solving a stochastic optimal control problem for each pair $(x_{0}, x_{1})$ : $min_{u_{t ∣0, 1}} \int_{0}^{1} E_{p_{t} (X_{t} ∣ x_{0}, x_{1})} [\frac{1}{2} ∥ u_{t} (X_{t} ∣ x_{0}, x_{1}) ∥^{2} + V_{t} (X_{t})] d t$ subject to $d X_{t} = u_{t} d t + σ d W_{t}$ , $X_{0} = x_{0}$ , $X_{1} = x_{1}$ .

The CondSOC is solved via Gaussian path approximation: approximate $p_{t ∣0, 1} \approx N (μ_{t}, γ_{t}^{2} I_{d})$ where $μ_{t}$ and $γ_{t}$ are parameterized as splines with $K \leq 30$ control points. The conditional drift is analytically available: $u_{t} (X_{t} ∣ x_{0}, x_{1}) = \partial_{t} μ_{t} + a_{t} (X_{t} - μ_{t})$ where $a_{t} = \frac{1}{γ _{t}} (\partial_{t} γ_{t} - \frac{σ ^{2}}{2 γ _{t}})$ . Optionally, path integral resampling (Prop. 4) uses importance weights to debias the Gaussian approximation.

Key Findings

GSBM maintains feasible transport maps throughout training (measured by $W_{2} (p_{1}^{θ}, ν)$ ), unlike DeepGSB which only achieves feasibility at convergence
On crowd navigation: GSBM correctly identifies multi-modal optimal paths (e.g., two viable pathways around obstacles on LiDAR surfaces), while DeepGSB produces uni-modal solutions
On AFHQ dog-to-cat translation: GSBM with latent-space $V_{t}$ achieves FID 12.39 vs DSBM’s 14.16, with more semantically meaningful interpolations and faster coupling convergence
On opinion depolarization ( $d = 1000$ ): GSBM achieves closer feasibility to target and nearly half the objective value of DeepGSB
Solving CondSOC adds only 0.5% wallclock time compared to DSBM (Table 5)
Path integral resampling improves performance mainly at low noise, at the cost of ~8% additional runtime
The noise level $σ$ plays a crucial role: at $σ = 0$ the optimal control steers through narrow passages, while at large $σ$ the optimal solution steers around obstacles entirely (the “drunken spider” phenomenon)

Important References

Diffusion Schrödinger Bridge Matching (Shi et al., 2023) — DSBM is the special case of GSBM when $V_{t} = 0$ ; GSBM inherits the alternating Stage 1/Stage 2 structure and forward-backward scheme from DSBM
Flow Straight and Fast (Liu et al., 2023) — Rectified flow recovered as $σ \to 0$ , $V_{t} = 0$ limit of GSBM (Table 1)
Flow Matching for Generative Modeling (Lipman et al., 2023) — The explicit matching loss (Eq. 5) generalizes conditional flow matching; flow matching is recovered when $σ = 0$
Deep Generalized Schrödinger Bridge (Liu et al., 2022) — Prior Sinkhorn-inspired method for GSB; GSBM significantly outperforms on feasibility, stability, and scalability
Stochastic Interpolants A Unifying Framework for Flows and Diffusions (Albergo et al., 2023) — The Gaussian path approximation $X_{t} \sim N (μ_{t}, γ_{t}^{2} I_{d})$ in GSBM is closely related to the stochastic interpolant framework

Atomic Notes

paper

Alethograph

Explorer

Generalized Schrödinger Bridge Matching

Summary

Key Contributions

Methodology

Key Findings

Important References

Atomic Notes

Graph View

Table of Contents

Backlinks