Abstract
We investigate the contraction coefficients derived from strong data processing inequalities for the E_gamma-divergence. By generalizing the celebrated Dobrushin’s coefficient from total variation distance to E_gamma-divergence, we derive a closed-form expression for the contraction of E_gamma-divergence. This result has fundamental consequences in two privacy settings. First, it implies that local differential privacy can be equivalently expressed in terms of the contraction of E_gamma-divergence. Second, it leads to a new information-theoretic technique for analyzing privacy guarantees of online algorithms. In this technique, we view such algorithms as a composition of amplitude-constrained Gaussian channels and then relate their contraction coefficients under E_gamma-divergence to the overall differential privacy guarantees.
Summary
This paper establishes the contraction theory for E_gamma-divergence (the hockey-stick divergence), which is central to Koloskova et al.’s (2025) analysis of certified approximate unlearning for non-convex models. The key contribution is a closed-form formula for the contraction coefficient of a Markov kernel K under E_gamma-divergence, directly generalizing Dobrushin’s classical result for total variation (the gamma = 1 case).
The main result (Theorem 2) shows that for any gamma >= 1, the contraction coefficient under E_gamma-divergence equals the supremum of E_gamma between point-mass outputs of the kernel:
eta_gamma(K) = sup_{x1, x2 in D} E_gamma(K(.|x1) || K(.|x2))
This formula reduces the computation of the contraction coefficient to evaluating E_gamma between pairs of output distributions. For the amplitude-constrained additive Gaussian kernel K(.|x) = N(x, sigma^2 I_d) with domain D, the contraction coefficient is given by:
eta_gamma(K) = theta_gamma(dia(D) / sigma)
where theta_gamma(r) = Q(log(gamma)/r - r/2) - gamma * Q(log(gamma)/r + r/2), with Q being the Gaussian tail probability function. This is the explicit formula used in Koloskova et al.’s Lemma A.3 for bounding hockey-stick divergence for Gaussians.
A critical consequence is Theorem 1: for a Markov chain with n Gaussian-noise-plus-projection steps, the E_gamma-divergence between output distributions starting from different inputs satisfies a product bound over the contraction coefficients of each step. This directly enables privacy amplification by iteration for non-convex unlearning, where each noisy SGD step followed by model clipping acts as a bounded-domain Gaussian kernel whose contraction coefficient is strictly less than 1.
Key Contributions
- Closed-form expression for the E-gamma divergence contraction coefficient eta_gamma(K) generalizing Dobrushin’s formula (Theorem 2), with the reciprocity relation eta_gamma(K) = eta_{1/gamma}(K) for gamma < 1.
- Explicit computation of eta_gamma for the amplitude-constrained additive Gaussian kernel via the theta_gamma function (Proposition 1).
- Product bound for the contraction across n composed Markov kernels (Theorem 1), enabling iterative privacy analysis.
- Equivalence between (epsilon, delta)-LDP and contraction under E_{e^epsilon}-divergence (Theorem 3): a mechanism K is (epsilon, delta)-LDP if and only if eta_{e^epsilon}(K) ⇐ delta.
- Improved bounds for output f-divergence via integral representation against E_gamma contraction coefficients (Corollary 1), provably tighter than the standard eta_TV-based bound.
- Application to central differential privacy of iterative online learning algorithms (Theorem 6), providing privacy guarantees for one-pass SGD and online gradient descent.
Relevance to Koloskova et al. (2025)
This paper provides the mathematical foundation for the model clipping analysis in Certified Unlearning for Neural Networks:
-
Lemma A.3 (Koloskova) directly uses the Gaussian E_gamma formula from Lemma 3 / Remark 1 here: E_gamma(N(m1, sigma^2 I) || N(m2, sigma^2 I)) = theta_gamma(||m2 - m1|| / sigma).
-
Theorem 4.2 (Koloskova) applies Theorem 1 here to bound the hockey-stick divergence after T iterations of model clipping + Gaussian noise, with each step’s contraction coefficient given by theta_{e^epsilon}(2*C_2 / sigma_t).
-
The product structure of contraction coefficients across iterations is precisely what enables privacy amplification by iteration in the non-convex setting.