hockey-stick divergence for Gaussians

The hockey-stick divergence (E_gamma-divergence) between two multivariate Gaussian distributions with equal covariance has an explicit closed-form expression involving the Gaussian Q-function. This formula is the computational workhorse behind the privacy guarantees of the model clipping algorithm in KOLOSKOVA2025.

Explicit Formula (Lemma 3 / Remark 1, Asoodeh et al.)

For m_1, m_2 in R^d and sigma > 0, let N(m, sigma^2 I_d) denote the multivariate Gaussian with mean m and covariance sigma^2 I_d. For gamma >= 1:

E_gamma(N(m_1, sigma^2 I_d) || N(m_2, sigma^2 I_d)) = Q(log(gamma)/beta - beta/2) - gamma * Q(log(gamma)/beta + beta/2)

where beta = ||m_1 - m_2|| / sigma and Q(t) = (1/sqrt(2*pi)) * integral from t to infinity of e^{-u^2/2} du is the Gaussian tail probability.

This can be written more compactly using the theta_gamma function (Definition 1 in ASOODEH2020):

E_gamma(N(m_1, sigma^2 I_d) || N(m_2, sigma^2 I_d)) = theta_gamma(||m_2 - m_1|| / sigma)

where theta_gamma : [0, infinity) → [0, 1] is defined by theta_gamma(r) = E_gamma(N(r*u, I_d) || N(0, I_d)) = Q(log(gamma)/r - r/2) - gamma * Q(log(gamma)/r + r/2) for any unit vector u in R^d. The function theta_gamma is non-decreasing in r, takes value 0 at r = 0, and approaches 1 as r → infinity.

Connection to Lemma A.3 in Koloskova et al.

In KOLOSKOVA2025, Lemma A.3 states (setting gamma = e^epsilon): for mu_1, mu_2 in R^d and sigma > 0,

E_{e^epsilon}(N(mu_1, sigma^2 I) || N(mu_2, sigma^2 I)) = Q(epsilon * sigma / ||mu_1 - mu_2|| - ||mu_1 - mu_2|| / (2sigma)) - e^epsilon * Q(epsilon * sigma / ||mu_1 - mu_2|| + ||mu_1 - mu_2|| / (2sigma))

This is exactly the formula above with gamma = e^epsilon and beta = ||mu_1 - mu_2|| / sigma, noting that log(gamma)/beta = epsilon * sigma / ||mu_1 - mu_2|| and beta/2 = ||mu_1 - mu_2|| / (2*sigma).

Simplified Upper Bound (Lemma A.4, Koloskova et al.)

For practical computation, a simpler upper bound is often used:

E_{e^epsilon}(N(mu_1, sigma^2 I) || N(mu_2, sigma^2 I)) ⇐ 1.25 * exp(-sigma^2 * epsilon^2 / (2 * ||mu_1 - mu_2||^2))

This approximation is useful for quick parameter selection but the exact theta_gamma formula gives tighter guarantees.

Role in Model Clipping Analysis

In the model clipping unlearning algorithm, after each noisy SGD step the model is projected onto a ball of radius C_2. The contraction coefficient for this step’s Markov kernel is:

eta_{e^epsilon}(K_t) = theta_{e^epsilon}(dia(D) / sigma_t) = theta_{e^epsilon}(2 * C_2 / sigma_t)

The key insight is that theta_{e^epsilon}(r) < 1 for any finite r > 0 and epsilon > 0, guaranteeing strict contraction at every iteration. The product of these contraction coefficients across T iterations gives the overall privacy guarantee via the E-gamma divergence contraction theory.

Properties

theta_gamma(0) = 0 for all gamma >= 1 (identical Gaussians have zero divergence)
theta_gamma(r) is strictly increasing in r for gamma > 1
theta_gamma(r) → 1 as r → infinity (approaching maximal divergence)
theta_1(r) = 1 - 2*Q(r/2) (reduces to total variation between Gaussians when gamma = 1)
The formula depends on the means only through their distance ||m_1 - m_2||, reflecting the rotational invariance of the isotropic Gaussian

concept

Alethograph

Explorer

hockey-stick divergence for Gaussians

Explicit Formula (Lemma 3 / Remark 1, Asoodeh et al.)

Connection to Lemma A.3 in Koloskova et al.

Simplified Upper Bound (Lemma A.4, Koloskova et al.)

Role in Model Clipping Analysis

Properties

Graph View

Table of Contents

Backlinks