Summary

This paper establishes the contraction theory for E_gamma-divergence (the hockey-stick divergence), which is central to Koloskova et al.’s (2025) analysis of certified approximate unlearning for non-convex models. The key contribution is a closed-form formula for the contraction coefficient of a Markov kernel K under E_gamma-divergence, directly generalizing Dobrushin’s classical result for total variation (the gamma = 1 case).

The main result (Theorem 2) shows that for any gamma >= 1, the contraction coefficient under E_gamma-divergence equals the supremum of E_gamma between point-mass outputs of the kernel:

eta_gamma(K) = sup_{x1, x2 in D} E_gamma(K(.|x1) || K(.|x2))

This formula reduces the computation of the contraction coefficient to evaluating E_gamma between pairs of output distributions. For the amplitude-constrained additive Gaussian kernel K(.|x) = N(x, sigma^2 I_d) with domain D, the contraction coefficient is given by:

eta_gamma(K) = theta_gamma(dia(D) / sigma)

where theta_gamma(r) = Q(log(gamma)/r - r/2) - gamma * Q(log(gamma)/r + r/2), with Q being the Gaussian tail probability function. This is the explicit formula used in Koloskova et al.’s Lemma A.3 for bounding hockey-stick divergence for Gaussians.

A critical consequence is Theorem 1: for a Markov chain with n Gaussian-noise-plus-projection steps, the E_gamma-divergence between output distributions starting from different inputs satisfies a product bound over the contraction coefficients of each step. This directly enables privacy amplification by iteration for non-convex unlearning, where each noisy SGD step followed by model clipping acts as a bounded-domain Gaussian kernel whose contraction coefficient is strictly less than 1.

Key Contributions

  • Closed-form expression for the E-gamma divergence contraction coefficient eta_gamma(K) generalizing Dobrushin’s formula (Theorem 2), with the reciprocity relation eta_gamma(K) = eta_{1/gamma}(K) for gamma < 1.
  • Explicit computation of eta_gamma for the amplitude-constrained additive Gaussian kernel via the theta_gamma function (Proposition 1).
  • Product bound for the contraction across n composed Markov kernels (Theorem 1), enabling iterative privacy analysis.
  • Equivalence between (epsilon, delta)-LDP and contraction under E_{e^epsilon}-divergence (Theorem 3): a mechanism K is (epsilon, delta)-LDP if and only if eta_{e^epsilon}(K) delta.
  • Improved bounds for output f-divergence via integral representation against E_gamma contraction coefficients (Corollary 1), provably tighter than the standard eta_TV-based bound.
  • Application to central differential privacy of iterative online learning algorithms (Theorem 6), providing privacy guarantees for one-pass SGD and online gradient descent.

Relevance to Koloskova et al. (2025)

This paper provides the mathematical foundation for the model clipping analysis in Certified Unlearning for Neural Networks:

  1. Lemma A.3 (Koloskova) directly uses the Gaussian E_gamma formula from Lemma 3 / Remark 1 here: E_gamma(N(m1, sigma^2 I) || N(m2, sigma^2 I)) = theta_gamma(||m2 - m1|| / sigma).

  2. Theorem 4.2 (Koloskova) applies Theorem 1 here to bound the hockey-stick divergence after T iterations of model clipping + Gaussian noise, with each step’s contraction coefficient given by theta_{e^epsilon}(2*C_2 / sigma_t).

  3. The product structure of contraction coefficients across iterations is precisely what enables privacy amplification by iteration in the non-convex setting.


paper