The hockey-stick divergence (E_gamma-divergence) between two multivariate Gaussian distributions with equal covariance has an explicit closed-form expression involving the Gaussian Q-function. This formula is the computational workhorse behind the privacy guarantees of the model clipping algorithm in KOLOSKOVA2025.

Explicit Formula (Lemma 3 / Remark 1, Asoodeh et al.)

For m_1, m_2 in R^d and sigma > 0, let N(m, sigma^2 I_d) denote the multivariate Gaussian with mean m and covariance sigma^2 I_d. For gamma >= 1:

E_gamma(N(m_1, sigma^2 I_d) || N(m_2, sigma^2 I_d)) = Q(log(gamma)/beta - beta/2) - gamma * Q(log(gamma)/beta + beta/2)

where beta = ||m_1 - m_2|| / sigma and Q(t) = (1/sqrt(2*pi)) * integral from t to infinity of e^{-u^2/2} du is the Gaussian tail probability.

This can be written more compactly using the theta_gamma function (Definition 1 in ASOODEH2020):

E_gamma(N(m_1, sigma^2 I_d) || N(m_2, sigma^2 I_d)) = theta_gamma(||m_2 - m_1|| / sigma)

where theta_gamma : [0, infinity) [0, 1] is defined by theta_gamma(r) = E_gamma(N(r*u, I_d) || N(0, I_d)) = Q(log(gamma)/r - r/2) - gamma * Q(log(gamma)/r + r/2) for any unit vector u in R^d. The function theta_gamma is non-decreasing in r, takes value 0 at r = 0, and approaches 1 as r infinity.

Connection to Lemma A.3 in Koloskova et al.

In KOLOSKOVA2025, Lemma A.3 states (setting gamma = e^epsilon): for mu_1, mu_2 in R^d and sigma > 0,

E_{e^epsilon}(N(mu_1, sigma^2 I) || N(mu_2, sigma^2 I)) = Q(epsilon * sigma / ||mu_1 - mu_2|| - ||mu_1 - mu_2|| / (2sigma)) - e^epsilon * Q(epsilon * sigma / ||mu_1 - mu_2|| + ||mu_1 - mu_2|| / (2sigma))

This is exactly the formula above with gamma = e^epsilon and beta = ||mu_1 - mu_2|| / sigma, noting that log(gamma)/beta = epsilon * sigma / ||mu_1 - mu_2|| and beta/2 = ||mu_1 - mu_2|| / (2*sigma).

Simplified Upper Bound (Lemma A.4, Koloskova et al.)

For practical computation, a simpler upper bound is often used:

E_{e^epsilon}(N(mu_1, sigma^2 I) || N(mu_2, sigma^2 I)) 1.25 * exp(-sigma^2 * epsilon^2 / (2 * ||mu_1 - mu_2||^2))

This approximation is useful for quick parameter selection but the exact theta_gamma formula gives tighter guarantees.

Role in Model Clipping Analysis

In the model clipping unlearning algorithm, after each noisy SGD step the model is projected onto a ball of radius C_2. The contraction coefficient for this step’s Markov kernel is:

eta_{e^epsilon}(K_t) = theta_{e^epsilon}(dia(D) / sigma_t) = theta_{e^epsilon}(2 * C_2 / sigma_t)

The key insight is that theta_{e^epsilon}(r) < 1 for any finite r > 0 and epsilon > 0, guaranteeing strict contraction at every iteration. The product of these contraction coefficients across T iterations gives the overall privacy guarantee via the E-gamma divergence contraction theory.

Properties

  • theta_gamma(0) = 0 for all gamma >= 1 (identical Gaussians have zero divergence)
  • theta_gamma(r) is strictly increasing in r for gamma > 1
  • theta_gamma(r) 1 as r infinity (approaching maximal divergence)
  • theta_1(r) = 1 - 2*Q(r/2) (reduces to total variation between Gaussians when gamma = 1)
  • The formula depends on the means only through their distance ||m_1 - m_2||, reflecting the rotational invariance of the isotropic Gaussian

concept