The hockey-stick divergence (E_gamma-divergence) between two multivariate Gaussian distributions with equal covariance has an explicit closed-form expression involving the Gaussian Q-function. This formula is the computational workhorse behind the privacy guarantees of the model clipping algorithm in KOLOSKOVA2025.
Explicit Formula (Lemma 3 / Remark 1, Asoodeh et al.)
For m_1, m_2 in R^d and sigma > 0, let N(m, sigma^2 I_d) denote the multivariate Gaussian with mean m and covariance sigma^2 I_d. For gamma >= 1:
E_gamma(N(m_1, sigma^2 I_d) || N(m_2, sigma^2 I_d)) = Q(log(gamma)/beta - beta/2) - gamma * Q(log(gamma)/beta + beta/2)
where beta = ||m_1 - m_2|| / sigma and Q(t) = (1/sqrt(2*pi)) * integral from t to infinity of e^{-u^2/2} du is the Gaussian tail probability.
This can be written more compactly using the theta_gamma function (Definition 1 in ASOODEH2020):
E_gamma(N(m_1, sigma^2 I_d) || N(m_2, sigma^2 I_d)) = theta_gamma(||m_2 - m_1|| / sigma)
where theta_gamma : [0, infinity) → [0, 1] is defined by theta_gamma(r) = E_gamma(N(r*u, I_d) || N(0, I_d)) = Q(log(gamma)/r - r/2) - gamma * Q(log(gamma)/r + r/2) for any unit vector u in R^d. The function theta_gamma is non-decreasing in r, takes value 0 at r = 0, and approaches 1 as r → infinity.
Connection to Lemma A.3 in Koloskova et al.
In KOLOSKOVA2025, Lemma A.3 states (setting gamma = e^epsilon): for mu_1, mu_2 in R^d and sigma > 0,
E_{e^epsilon}(N(mu_1, sigma^2 I) || N(mu_2, sigma^2 I)) = Q(epsilon * sigma / ||mu_1 - mu_2|| - ||mu_1 - mu_2|| / (2sigma)) - e^epsilon * Q(epsilon * sigma / ||mu_1 - mu_2|| + ||mu_1 - mu_2|| / (2sigma))
This is exactly the formula above with gamma = e^epsilon and beta = ||mu_1 - mu_2|| / sigma, noting that log(gamma)/beta = epsilon * sigma / ||mu_1 - mu_2|| and beta/2 = ||mu_1 - mu_2|| / (2*sigma).
Simplified Upper Bound (Lemma A.4, Koloskova et al.)
For practical computation, a simpler upper bound is often used:
E_{e^epsilon}(N(mu_1, sigma^2 I) || N(mu_2, sigma^2 I)) ⇐ 1.25 * exp(-sigma^2 * epsilon^2 / (2 * ||mu_1 - mu_2||^2))
This approximation is useful for quick parameter selection but the exact theta_gamma formula gives tighter guarantees.
Role in Model Clipping Analysis
In the model clipping unlearning algorithm, after each noisy SGD step the model is projected onto a ball of radius C_2. The contraction coefficient for this step’s Markov kernel is:
eta_{e^epsilon}(K_t) = theta_{e^epsilon}(dia(D) / sigma_t) = theta_{e^epsilon}(2 * C_2 / sigma_t)
The key insight is that theta_{e^epsilon}(r) < 1 for any finite r > 0 and epsilon > 0, guaranteeing strict contraction at every iteration. The product of these contraction coefficients across T iterations gives the overall privacy guarantee via the E-gamma divergence contraction theory.
Properties
- theta_gamma(0) = 0 for all gamma >= 1 (identical Gaussians have zero divergence)
- theta_gamma(r) is strictly increasing in r for gamma > 1
- theta_gamma(r) → 1 as r → infinity (approaching maximal divergence)
- theta_1(r) = 1 - 2*Q(r/2) (reduces to total variation between Gaussians when gamma = 1)
- The formula depends on the means only through their distance ||m_1 - m_2||, reflecting the rotational invariance of the isotropic Gaussian