The shifted Renyi divergence is a hybrid distance measure between probability distributions introduced by Feldman et al. (2018) in Privacy Amplification by Iteration - Feldman et al 2018. It interpolates between the infinity-Wasserstein distance W_infinity (a metric notion of distance) and the standard Renyi divergence D_alpha (an information-theoretic divergence). This interpolation is the central technical innovation enabling the proof of privacy amplification by iteration for contractive noisy iterations.

For distributions mu and nu on a Banach space (Z, || . ||), and parameters z >= 0 and alpha >= 1, the z-shifted Renyi divergence of order alpha is defined as:

D_alpha^{(z)}(mu || nu) := inf_{mu’: W_infinity(mu, mu’) z} D_alpha(mu’ || nu)

That is, it is the smallest Renyi divergence of order alpha between nu and any distribution mu’ that is within infinity-Wasserstein distance z of mu. At z = 0, it reduces to the standard Renyi divergence D_alpha(mu || nu). As z increases, the shifted divergence can only decrease (monotonicity: for 0 z z’, D_alpha^{(z’)} D_alpha^{(z)}). The “shifting” property states that for any deterministic shift x, D_alpha^{(||x||)}(mu * x || nu) D_alpha(mu || nu), where mu * x denotes the distribution of U + x for U ~ mu.

The shifted Renyi divergence plays a critical role in Koloskova et al.’s (2025) Certified Unlearning for Neural Networks proofs, particularly for the gradient clipping for unlearning analysis. The key mechanism is that the shift parameter z tracks the accumulated “metric distance” between processes that has not yet been converted into information-theoretic divergence by noise addition.

Key Details

  • Definition (Feldman et al., Definition 8): D_alpha^{(z)}(mu || nu) = inf_{mu’: W_infinity(mu, mu’) z} D_alpha(mu’ || nu).
  • Noise magnitude function: For a noise distribution zeta on a Banach space, R_alpha(zeta, a) = sup_{||x|| a} D_alpha(zeta * x || zeta). For Gaussian noise N(0, sigma^2 I_d) on R^d, R_alpha(N(0, sigma^2 I_d), a) = alpha * a^2 / (2 * sigma^2).
  • Shift-reduction lemma (Lemma 20): D_alpha^{(z)}(mu * zeta || nu * zeta) D_alpha^{(z+a)}(mu || nu) + R_alpha(zeta, a) for any a >= 0. Adding noise converts metric distance (shift) into information-theoretic divergence.
  • Contraction lemma (Lemma 21): For contractive maps psi, psi’ with sup_x ||psi(x) - psi’(x)|| s, D_alpha^{(z+s)}(psi(X) || psi’(X’)) D_alpha^{(z)}(X || X’). Contractive maps cannot increase the shifted divergence.
  • These two lemmas combine inductively to prove the main amplification theorem (Theorem 22) for Contractive Noisy Iterations.
  • Balle et al. (2019) in Privacy Amplification by Mixing and Diffusion Mechanisms provide a measure-theoretic generalization via explicit couplings (Theorem 2), replacing the W_infinity-based definition with transport operators.

concept