The E_gamma-divergence contraction coefficient quantifies the maximum ratio by which a Markov kernel K can preserve hockey-stick divergence between its input distributions. It is the central quantity in the analysis of privacy amplification by iteration for non-convex machine unlearning in KOLOSKOVA2025.
Formal Definition
Given a Markov kernel K : D → P(W) and gamma >= 1, the contraction coefficient under E_gamma-divergence is defined as:
eta_gamma(K) := sup_{mu, nu in P(D), D_f(mu||nu) != 0} D_f(mu K || nu K) / D_f(mu || nu)
where D_f is the f-divergence with f(t) = (t - gamma)^+ - (1 - gamma)^+. This is the strong data processing coefficient: it measures the worst-case fraction of E_gamma-divergence that survives passage through K.
Dobrushin-type Formula (Theorem 2 in Asoodeh et al.)
The key result of ASOODEH2020 is that this contraction coefficient admits a remarkably simple two-point characterization:
eta_gamma(K) = sup_{x1, x2 in D} E_gamma(K(.|x1) || K(.|x2))
for all gamma >= 1. This generalizes Dobrushin’s classical formula for total variation (the gamma = 1 case) to all gamma >= 1. The reciprocity relation eta_gamma(K) = eta_{1/gamma}(K) extends this to gamma < 1.
Contraction for Amplitude-Constrained Gaussian Kernels
For the D-constrained additive Gaussian kernel K(.|x) = N(x, sigma^2 I_d) with bounded domain D subset of R^d, the contraction coefficient is:
eta_gamma(K) = theta_gamma(dia(D) / sigma)
where dia(D) = sup_{x1, x2 in D} ||x2 - x1|| is the diameter of D, and theta_gamma(r) = Q(log(gamma)/r - r/2) - gamma * Q(log(gamma)/r + r/2) with Q(a) = (1/sqrt(2*pi)) * integral from a to infinity of e^{-u^2/2} du.
The function theta_gamma is non-decreasing in r, so smaller dia(D)/sigma ratios yield smaller contraction coefficients (better privacy amplification). This is why model clipping (which constrains dia(D)) and large noise variance sigma^2 both improve privacy.
Role in Koloskova et al. (2025)
In KOLOSKOVA2025, the model clipping algorithm projects model parameters onto a ball of radius C_2 after each noisy SGD step. The resulting Markov kernel at step t has:
- Domain diameter: dia(D) = 2 * C_2 (the diameter of the ball)
- Noise variance: sigma_t
- Contraction coefficient: theta_{e^epsilon}(2 * C_2 / sigma_t)
Theorem 4.2 then applies the product bound from Theorem 1 of Asoodeh et al.:
E_{e^epsilon}(mu_{T+1} || mu’{T+1}) ⇐ E{e^epsilon}(mu_1 || mu’1) * product{t=1}^{T} theta_{e^epsilon}(2 * C_2 / sigma_t)
Since theta_{e^epsilon}(2 * C_2 / sigma_t) < 1 for finite C_2 / sigma_t, the E_gamma-divergence contracts exponentially across iterations, yielding the (epsilon, delta)-unlearning guarantee.