The contraction coefficient of a Markov kernel K is a quantity that measures how much the kernel contracts a statistical divergence between input distributions. In the context of differential privacy and certified approximate unlearning, it is typically defined with respect to the hockey-stick divergence following Balle et al. (2019) in Privacy Amplification by Mixing and Diffusion Mechanisms. For a Markov operator K : X → P(Y) and epsilon >= 0, the contraction coefficient is:
alpha_epsilon(K) := sup_{x1, x2} E_epsilon(K(x1) || K(x2))
where E_epsilon denotes the hockey-stick divergence (also called the epsilon-divergence). This quantity is exactly the (gamma, epsilon)-Dobrushin coefficient of K, measuring the worst-case hockey-stick divergence between any two output distributions of K. When alpha_epsilon(K) < 1, applying K to any pair of distributions mu, nu causes the hockey-stick divergence to contract: E_epsilon(mu K || nu K) ⇐ alpha_epsilon(K) * E_epsilon(mu || nu). This is the data processing inequality for hockey-stick divergence with a contraction factor.
For the classical Dobrushin coefficient (epsilon = 0), the quantity reduces to alpha_0(K) = sup_{x1,x2} TV(K(x1), K(x2)), the supremum of total variation distances between outputs of K. In the Markov chain literature, the strong Markov contraction lemma of Cohen et al. (1993) and Del Moral et al. (2003) establishes that for any Csiszar f-divergence D, the Dobrushin coefficient satisfies D(mu K || nu K) ⇐ gamma * D(mu || nu) where gamma = alpha_0(K). Balle et al. extend this to the hockey-stick divergence family parameterized by epsilon, obtaining finer-grained control.
The contraction coefficient is the key quantity linking amplification by mixing to the analysis of iterative unlearning algorithms. In Certified Unlearning for Neural Networks (Koloskova et al., 2025), the model clipping for unlearning algorithm defines a Markov kernel K at each iteration (the combination of a gradient step, model clipping to radius C_2, and Gaussian noise addition), and the contraction coefficient alpha_epsilon(K) determines how quickly the hockey-stick divergence between the unlearned and retrained model distributions decays across iterations.
Key Details
- Formal definition: alpha_epsilon(K) = sup_{x1, x2 in X} D_{e^epsilon}(K(x1) || K(x2)), where D_{e^epsilon}(mu || nu) = integral [p_mu - e^epsilon * p_nu]_+ d lambda is the hockey-stick divergence.
- Data processing inequality: E_epsilon(mu K || nu K) ⇐ alpha_epsilon(K) * E_epsilon(mu || nu). This is the multiplicative contraction property used in iterative unlearning proofs.
- For Gaussian kernels: If K(x) = N(x, sigma^2 I_d), the contraction coefficient depends on the sup over pairs x1, x2 of E_epsilon(N(x1, sigma^2 I) || N(x2, sigma^2 I)), which can be bounded using Gaussian tail inequalities.
- Hierarchy with mixing conditions: alpha_0(K) = gamma iff K is gamma-Dobrushin; alpha_epsilon(K) = gamma for specific epsilon iff K is (gamma, epsilon)-Dobrushin. The contraction coefficient for epsilon > 0 provides strictly finer control than the total variation version.
- Iterated contraction: For T iterations with the same kernel K, E_epsilon(mu K^T || nu K^T) ⇐ alpha_epsilon(K)^T * E_epsilon(mu || nu), yielding exponential convergence when alpha_epsilon(K) < 1.
- Connection to Koloskova’s Lemma A.2: The data processing inequality E_epsilon(mu K || nu K) ⇐ sup_{x1,x2} E_epsilon(K(x1) || K(x2)) * E_epsilon(mu || nu) is precisely the mechanism of Lemma A.2 in Koloskova et al. (2025), applied to the Markov kernel contraction of the model-clipping update.