Noisy SGD privacy analysis refers to the study of differential privacy guarantees for stochastic gradient descent algorithms that inject calibrated noise (typically Gaussian) into the gradient updates. The foundational analysis by Feldman et al. (2018) in Privacy Amplification by Iteration - Feldman et al 2018 shows that for convex objectives, the privacy of noisy SGD can be analyzed through the lens of privacy amplification by iteration, yielding substantially tighter guarantees than the standard composition theorem approach.

The Projected Noisy SGD (PNSGD) algorithm operates on a dataset S = {x_1, …, x_n} as follows: starting from w_0 in a convex set K, for each data point x_{t+1}, compute v_{t+1} = w_t - eta * grad_w f(w_t, x_{t+1}) + Z where Z ~ N(0, sigma^2 I_d), then project w_{t+1} = Pi_K(v_{t+1}). When the loss is convex and beta-smooth, the gradient step psi(w) = w - eta * grad f(w) is contractive for eta 2/beta, and projection onto a convex set is also contractive. This makes the entire process a Contractive Noisy Iteration (CNI), enabling the application of Theorem 22 from Feldman et al.

The resulting per-person privacy guarantee states that PNSGD satisfies (alpha, alphaepsilon/(n+1-t))-RDP for the t’th data point, where epsilon = 2L^2/sigma^2. This means data points processed early enjoy privacy proportional to 1/n (the full amplification benefit), while the last data point receives only the single-step guarantee. Variants include Stop-PNSGD (random stopping time, achieving (alpha, 4alphaL^2ln(n)/(n*sigma^2))-RDP) and Skip-PNSGD (random skip point, giving uniform (epsilon, delta)-DP with public data benefits).

Balle et al. (2019) extend this analysis to strongly convex losses, showing that strict contractions yield exponential amplification: epsilon_i = O((1 - 2etabeta*rho/(beta+rho))^{(n-i)/2}), an exponential improvement over Feldman et al.’s O(1/(n-i)) rate. Koloskova et al. (2025) further extend the analysis to non-convex settings via gradient clipping for unlearning and model clipping for unlearning, enabling certified approximate unlearning for deep neural networks.

Key Details

  • Composition baseline: Standard composition gives (alpha, n * alpha * epsilon)-RDP for n steps, each with (alpha, alpha * epsilon)-RDP. Amplification by iteration reduces this to O(alpha * epsilon * ln(n) / n) for Stop-PNSGD.
  • Per-index RDP (Theorem 23): For L-Lipschitz, beta-smooth convex losses with eta 2/beta, sigma > 0, alpha > 1: PNSGD satisfies (alpha, alpha * epsilon / (n+1-t))-RDP for its t’th input, with epsilon = 2L^2/sigma^2.
  • No secret ordering required: Unlike privacy amplification by sampling, the order in which data points are processed need not be random or secret. This is particularly relevant for distributed SGD settings.
  • Utility guarantees: Stop-PNSGD achieves excess population loss F* + O(RL/sqrt(n) * sqrt(1 + d*ln(1.25/delta)/epsilon^2)) for (epsilon, delta)-DP, matching amplification-by-sampling bounds.
  • Connection to unlearning: In Certified Unlearning for Neural Networks, the noisy SGD analysis is applied to fine-tuning on the retain set, with the “neighboring dataset” being the one that includes vs. excludes the forget data.

concept