Mass preservation conditions as convergence diagnostics for score-based generative models

Connection

In score-based diffusion models (Song et al. 2021), the generative process produces a sequence of distributions p_t that should converge to the data distribution p_0 as t → 0. Current convergence analysis (Lee et al. 2023, Chen et al. 2023, Benton et al. 2024) bounds KL divergence or total variation between the generated and target distributions, but does not distinguish between how convergence fails when it does.

The mass preserving condition hierarchy from Herdegen-Liang-Shelley provides a graduated diagnostic framework for precisely this purpose. In the diffusion model setting:

tightness failure corresponds to mode dropping: the generated distribution loses mass from some region of the data support, with that mass escaping to regions of low data density. This is detectable by checking whether sup_n |p_n|(K^c) is small for sufficiently large compact K — i.e. whether the generated distributions keep their mass on compact sets.
Local mass loss on compacts corresponds to local mode collapse: within a region where data has support, the positive and negative parts of the score estimation error cause mass to redistribute locally. For signed measure differences mu_n = p_n - p_target, the condition lim sup |mu_n|(K) ⇐ |mu|(K) for compact K monitors whether error mass is accumulating on compacts.
Global mass loss (lim sup ||mu_n|| ⇐ ||mu||) corresponds to total variation convergence: the strongest guarantee, equivalent to the generated distribution matching the target in total variation.

Recent work on heavy-tailed targets (Dao & Lu 2026) shows that standard Gaussian diffusion models exhibit fundamentally different convergence rates depending on tail behaviour — under polynomial tails, the rate depends explicitly on the tail parameter gamma. This is precisely a tightness issue: heavy-tailed targets make it harder to prevent mass from escaping to infinity during the reverse diffusion process.

Bridged Concepts

From measure convergence

tightness: the condition that prevents mass escape to infinity; failure = mode dropping
mass preserving condition: the graduated hierarchy (no mass at infinity / on compacts / globally) provides three levels of convergence quality
vague convergence: the weakest convergence — only tests against compactly supported functions, so cannot detect mass escape

From Diffusion Bridge / generative models

score matching: the training objective; score estimation error is the driving force behind convergence failure
VE-SDE and VP-SDE: the forward processes whose discretisation introduces additional convergence errors
probability flow ODE: the deterministic sampling path; its convergence properties relate to weak convergence of the path measures

Why It Matters

Current convergence theory for diffusion models provides rate bounds (how fast convergence occurs under ideal conditions) but limited diagnostic tools (what went wrong when convergence fails). The mass preservation framework provides:

A hierarchy of increasingly strong convergence guarantees, each with a clear operational meaning (mode dropping / local collapse / total variation)
Computable conditions that can be monitored during training: check whether the generated distribution satisfies tightness on progressively larger compact sets
A principled way to distinguish between “the model hasn’t converged yet” (vague but not weak convergence — mass still escaping) and “the model has a structural deficiency” (convergence in some weaker mode that will never upgrade to weak convergence)

Potential Directions

Formulate the convergence of score-based diffusion samplers in terms of vague-to-weak convergence: characterise when the generated distribution sequence is vaguely convergent and identify the tightness/mass-preservation conditions that ensure weak convergence
Design training-time diagnostics based on the mass preservation hierarchy: monitor tightness violation (total mass outside compact sets) as an early warning for mode dropping
Connect the heavy-tailed convergence dichotomy (Dao & Lu 2026) to the failure of tightness: characterise precisely which tail conditions ensure the noised distributions remain tight throughout the diffusion process

Evidence

Herdegen-Liang-Shelley, Table 1: the precise hierarchy relating vague convergence, tightness, and weak convergence
Convergence of score-based models uses KL and TV bounds (Lee et al. 2023, Benton et al. 2024) but does not separate failure modes
Dao & Lu 2026: heavy-tailed targets cause fundamentally different convergence rates, corresponding to tightness failure
tightness: on R, tightness ⇐> no mass at +/- infinity, directly interpretable as “the model keeps mass on the data support”

Alethograph

Explorer