Almost basic convergence metric for comparing unnormalized distributions

Connection

Stanek (2024) introduces a Ky Fan-type metric for almost basic convergence on the space W of right-continuous bounded-variation functions vanishing at -infinity:

d(f, g) = min{eps >= 0 : lambda({x in [-1/eps, 1/eps] : |f(x) - c - g(x)| > eps}) ⇐ eps for some c in R}

The crucial feature is the “up to constants” invariance: the metric measures how close two distribution functions are after allowing an optimal shift by a constant c. In terms of measures, this means d(F^mu, F^nu) = 0 implies mu and nu differ by at most a constant multiple of Lebesgue measure on a null set — they are “the same distribution up to a normalizing constant.”

In Bayesian machine learning, a fundamental computational challenge is comparing distributions known only up to normalizing constants. The posterior p(theta | data) = p(data | theta) p(theta) / Z is typically computed as an unnormalized density p-tilde(theta) proportional to p(theta | data), because the evidence Z = integral p(data | theta) p(theta) d(theta) is intractable. Variational inference, MCMC diagnostics, and model comparison all require comparing such unnormalized densities.

The almost basic convergence metric is naturally suited to this setting: it compares distribution functions up to additive constants, which corresponds to comparing densities up to multiplicative constants (since an additive constant in the distribution function corresponds to a constant shift in the cumulative mass, not a multiplicative rescaling of the density). More precisely, if two measures mu and nu satisfy d(F^mu, F^nu) = 0, then mu((a,b]) = nu((a,b]) for almost all intervals (a,b] — the measures agree on almost all intervals, which is the strongest notion of agreement one can have for unnormalized objects.

Bridged Concepts

From measure convergence

almost basic convergence: metrizable (unlike basic, vague, loose, weak for signed measures), with an explicit Ky Fan-type metric
The metric is separable but NOT complete (Lemma 3.6 in Stanek), and not induced by any norm (Lemma 3.5)
Prokhorov metric: metrizes weak convergence for positive measures; the Ky Fan metric is conceptually analogous but for a weaker convergence notion

From Bayesian ML

Unnormalized posteriors: p-tilde(theta) known up to Z; the ELBO in variational inference optimises a bound on log Z
MCMC diagnostics: comparing chains requires a distance between distributions that is insensitive to unknown normalizing constants
The Ky Fan metric on distribution functions is not standard in the ML literature; KL divergence, Wasserstein distance, and MMD are the common choices, all of which require normalized distributions (or at least tractable density ratios)

Why It Matters

If the almost basic convergence metric could be made practical, it would provide a distribution distance that:

Is defined directly on unnormalized distribution functions without requiring density estimation
Is invariant to the unknown normalizing constant (via the “up to constants” feature)
Is a genuine metric (unlike KL divergence, which is not symmetric)
Is metrizable and generates a separable topology (unlike many other distribution distances for signed measures)

The main obstacles are: (a) the metric space is incomplete, so Cauchy sequences need not converge; (b) the infimum over c in the definition may not be efficiently computable; (c) the metric is defined on distribution functions (1D), so extending to higher dimensions requires choosing coordinate projections or using a Cramer-Wold-type device.

Potential Directions

Investigate efficient computation of the Stanek metric: given empirical distribution functions F_n and G_n, can the optimal constant c* and the metric d(F_n, G_n) be computed in O(n log n) time?
Explore whether the metric can serve as a training objective for variational inference, bypassing the need for the ELBO
Investigate completions of the metric space (W, d) to obtain a Polish space, which would be more suitable for probabilistic applications
Extend to R^d via projections: define d_d(mu, nu) = sup_{v in S^{d-1}} d(F^{mu_v}, F^{nu_v}) where mu_v is the pushforward along direction v

Evidence

Stanek, Lemma 3.4: explicit metric formula for almost basic convergence
Stanek, Lemma 3.5-3.7: metrizable, separable, but not complete and not normable
The “up to constants” feature in the metric definition (the infimum over c in R) is structurally analogous to working with unnormalized log-densities
The classical Ky Fan metric rho(X,Y) = inf{eps: P(d(X,Y) > eps) ⇐ eps} metrizes convergence in probability; Stanek’s variant adapts this to distribution functions with the additional constant-shift invariance
Hofinger 2006: Prokhorov and Ky Fan metrics for uncertainty quantification in inverse problems, establishing precedent for using these metrics in computational settings

Alethograph

Explorer