The shifted Renyi divergence is a hybrid distance measure between probability distributions introduced by Feldman et al. (2018) in Privacy Amplification by Iteration - Feldman et al 2018. It interpolates between the infinity-Wasserstein distance W_infinity (a metric notion of distance) and the standard Renyi divergence D_alpha (an information-theoretic divergence). This interpolation is the central technical innovation enabling the proof of privacy amplification by iteration for contractive noisy iterations.
For distributions mu and nu on a Banach space (Z, || . ||), and parameters z >= 0 and alpha >= 1, the z-shifted Renyi divergence of order alpha is defined as:
D_alpha^{(z)}(mu || nu) := inf_{mu’: W_infinity(mu, mu’) ⇐ z} D_alpha(mu’ || nu)
That is, it is the smallest Renyi divergence of order alpha between nu and any distribution mu’ that is within infinity-Wasserstein distance z of mu. At z = 0, it reduces to the standard Renyi divergence D_alpha(mu || nu). As z increases, the shifted divergence can only decrease (monotonicity: for 0 ⇐ z ⇐ z’, D_alpha^{(z’)} ⇐ D_alpha^{(z)}). The “shifting” property states that for any deterministic shift x, D_alpha^{(||x||)}(mu * x || nu) ⇐ D_alpha(mu || nu), where mu * x denotes the distribution of U + x for U ~ mu.
The shifted Renyi divergence plays a critical role in Koloskova et al.’s (2025) Certified Unlearning for Neural Networks proofs, particularly for the gradient clipping for unlearning analysis. The key mechanism is that the shift parameter z tracks the accumulated “metric distance” between processes that has not yet been converted into information-theoretic divergence by noise addition.
Key Details
- Definition (Feldman et al., Definition 8): D_alpha^{(z)}(mu || nu) = inf_{mu’: W_infinity(mu, mu’) ⇐ z} D_alpha(mu’ || nu).
- Noise magnitude function: For a noise distribution zeta on a Banach space, R_alpha(zeta, a) = sup_{||x|| ⇐ a} D_alpha(zeta * x || zeta). For Gaussian noise N(0, sigma^2 I_d) on R^d, R_alpha(N(0, sigma^2 I_d), a) = alpha * a^2 / (2 * sigma^2).
- Shift-reduction lemma (Lemma 20): D_alpha^{(z)}(mu * zeta || nu * zeta) ⇐ D_alpha^{(z+a)}(mu || nu) + R_alpha(zeta, a) for any a >= 0. Adding noise converts metric distance (shift) into information-theoretic divergence.
- Contraction lemma (Lemma 21): For contractive maps psi, psi’ with sup_x ||psi(x) - psi’(x)|| ⇐ s, D_alpha^{(z+s)}(psi(X) || psi’(X’)) ⇐ D_alpha^{(z)}(X || X’). Contractive maps cannot increase the shifted divergence.
- These two lemmas combine inductively to prove the main amplification theorem (Theorem 22) for Contractive Noisy Iterations.
- Balle et al. (2019) in Privacy Amplification by Mixing and Diffusion Mechanisms provide a measure-theoretic generalization via explicit couplings (Theorem 2), replacing the W_infinity-based definition with transport operators.