Connection
In score-based diffusion models (Song et al. 2021), the generative process produces a sequence of distributions p_t that should converge to the data distribution p_0 as t → 0. Current convergence analysis (Lee et al. 2023, Chen et al. 2023, Benton et al. 2024) bounds KL divergence or total variation between the generated and target distributions, but does not distinguish between how convergence fails when it does.
The mass preserving condition hierarchy from Herdegen-Liang-Shelley provides a graduated diagnostic framework for precisely this purpose. In the diffusion model setting:
-
tightness failure corresponds to mode dropping: the generated distribution loses mass from some region of the data support, with that mass escaping to regions of low data density. This is detectable by checking whether sup_n |p_n|(K^c) is small for sufficiently large compact K — i.e. whether the generated distributions keep their mass on compact sets.
-
Local mass loss on compacts corresponds to local mode collapse: within a region where data has support, the positive and negative parts of the score estimation error cause mass to redistribute locally. For signed measure differences mu_n = p_n - p_target, the condition lim sup |mu_n|(K) ⇐ |mu|(K) for compact K monitors whether error mass is accumulating on compacts.
-
Global mass loss (lim sup ||mu_n|| ⇐ ||mu||) corresponds to total variation convergence: the strongest guarantee, equivalent to the generated distribution matching the target in total variation.
Recent work on heavy-tailed targets (Dao & Lu 2026) shows that standard Gaussian diffusion models exhibit fundamentally different convergence rates depending on tail behaviour — under polynomial tails, the rate depends explicitly on the tail parameter gamma. This is precisely a tightness issue: heavy-tailed targets make it harder to prevent mass from escaping to infinity during the reverse diffusion process.
Bridged Concepts
From measure convergence
- tightness: the condition that prevents mass escape to infinity; failure = mode dropping
- mass preserving condition: the graduated hierarchy (no mass at infinity / on compacts / globally) provides three levels of convergence quality
- vague convergence: the weakest convergence — only tests against compactly supported functions, so cannot detect mass escape
From Diffusion Bridge / generative models
- score matching: the training objective; score estimation error is the driving force behind convergence failure
- VE-SDE and VP-SDE: the forward processes whose discretisation introduces additional convergence errors
- probability flow ODE: the deterministic sampling path; its convergence properties relate to weak convergence of the path measures
Why It Matters
Current convergence theory for diffusion models provides rate bounds (how fast convergence occurs under ideal conditions) but limited diagnostic tools (what went wrong when convergence fails). The mass preservation framework provides:
- A hierarchy of increasingly strong convergence guarantees, each with a clear operational meaning (mode dropping / local collapse / total variation)
- Computable conditions that can be monitored during training: check whether the generated distribution satisfies tightness on progressively larger compact sets
- A principled way to distinguish between “the model hasn’t converged yet” (vague but not weak convergence — mass still escaping) and “the model has a structural deficiency” (convergence in some weaker mode that will never upgrade to weak convergence)
Potential Directions
- Formulate the convergence of score-based diffusion samplers in terms of vague-to-weak convergence: characterise when the generated distribution sequence is vaguely convergent and identify the tightness/mass-preservation conditions that ensure weak convergence
- Design training-time diagnostics based on the mass preservation hierarchy: monitor tightness violation (total mass outside compact sets) as an early warning for mode dropping
- Connect the heavy-tailed convergence dichotomy (Dao & Lu 2026) to the failure of tightness: characterise precisely which tail conditions ensure the noised distributions remain tight throughout the diffusion process
Evidence
- Herdegen-Liang-Shelley, Table 1: the precise hierarchy relating vague convergence, tightness, and weak convergence
- Convergence of score-based models uses KL and TV bounds (Lee et al. 2023, Benton et al. 2024) but does not separate failure modes
- Dao & Lu 2026: heavy-tailed targets cause fundamentally different convergence rates, corresponding to tightness failure
- tightness: on R, tightness ⇐> no mass at +/- infinity, directly interpretable as “the model keeps mass on the data support”
Suggested Papers
- Convergence of score-based generative modeling for general data distributions — Lee et al. 2023, provides KL-based convergence guarantees that could be reinterpreted through the mass preservation lens
- Wasserstein Convergence Guarantees for a General Class of Score-Based Generative Models — Benton et al. 2024, Wasserstein convergence that connects to the Prokhorov metric framework
- Diffusion Models with Heavy-Tailed Targets — Dao & Lu 2026, the heavy-tail convergence dichotomy that corresponds to tightness failure