Summary

This paper provides the first certified unlearning guarantees for projected noisy stochastic gradient descent (PNSGD) in the mini-batch setting. While the companion paper (Langevin Unlearning) analyzes full-batch PNGD through Langevin dynamics, this work addresses the more practical mini-batch case through a novel W∞ distance tracking framework.

The key innovation is proving that PNSGD with cyclic mini-batch sampling induces a unique stationary distribution ν_{D|B} for each dataset-batch pair, and carefully bounding the W∞ distance between adjacent stationary distributions. This yields tighter unlearning guarantees than prior full-batch methods while using significantly fewer gradient computations (2% for mini-batch, 10% for full-batch settings).

Key Contributions

  • First approximate unlearning guarantee for mini-batch noisy SGD under convexity
  • Novel W∞ distance tracking between adjacent PNSGD learning processes
  • Proof that cyclic mini-batch PNSGD converges to a unique stationary distribution
  • Support for sequential unlearning with linear (not exponential) privacy degradation
  • Improved bound via randomized mini-batch sequences using joint convexity of KL divergence

Methodology

Uses projected noisy stochastic gradient descent (PNSGD) with cyclic mini-batch sampling. The learning process converges to ν_{D|B}, and unlearning fine-tunes from the current parameters on D’. The Rényi difference d_α is bounded via: (1) W∞ distance between adjacent stationary distributions (Lemma 3.3), (2) Metric-aware privacy amplification by iteration (Lemma 2.6), (3) Conversion from W∞ to Rényi divergence bounds.

Key Findings

  • Z_B = min(2ηM/(b(1-c^{n/b})), 2R) provides much tighter W∞ bounds than naive 2R
  • Privacy loss ε = O(Z_B² · c^{2Kn/b}) decays exponentially with K unlearning epochs under strong convexity
  • Smaller mini-batch b leads to better privacy decay rate but potentially worse Z_B and utility
  • PNSGD requires only 2% (mini-batch) to 10% (full-batch) of gradient computations vs. baselines
  • Sequential unlearning degrades linearly via W∞ triangle inequality, not exponentially

Important References

  1. Langevin Unlearning — Chien et al. (2024), companion paper using Langevin dynamics analysis
  2. Descent-to-Delete — Neel et al. (2021), full-batch gradient-based unlearning baseline (D2D)
  3. Certified Data Removal from Machine Learning Models — Guo et al. (2020), Newton-step certified removal

Atomic Notes


paper