Abstract
“The right to be forgotten” ensured by laws for user data privacy becomes increasingly important. Machine unlearning aims to efficiently remove the effect of certain data points on the trained model parameters so that it can be approximately the same as if one retrains the model from scratch. We propose to leverage projected noisy stochastic gradient descent for unlearning and establish its first approximate unlearning guarantee under the convexity assumption. Our approach exhibits several benefits, including provable complexity saving compared to retraining, and supporting sequential and batch unlearning. Both of these benefits are closely related to our new results on the infinite Wasserstein distance tracking of the adjacent (un)learning processes.
Summary
This paper provides the first certified unlearning guarantees for projected noisy stochastic gradient descent (PNSGD) in the mini-batch setting. While the companion paper (Langevin Unlearning) analyzes full-batch PNGD through Langevin dynamics, this work addresses the more practical mini-batch case through a novel W∞ distance tracking framework.
The key innovation is proving that PNSGD with cyclic mini-batch sampling induces a unique stationary distribution ν_{D|B} for each dataset-batch pair, and carefully bounding the W∞ distance between adjacent stationary distributions. This yields tighter unlearning guarantees than prior full-batch methods while using significantly fewer gradient computations (2% for mini-batch, 10% for full-batch settings).
Key Contributions
- First approximate unlearning guarantee for mini-batch noisy SGD under convexity
- Novel W∞ distance tracking between adjacent PNSGD learning processes
- Proof that cyclic mini-batch PNSGD converges to a unique stationary distribution
- Support for sequential unlearning with linear (not exponential) privacy degradation
- Improved bound via randomized mini-batch sequences using joint convexity of KL divergence
Methodology
Uses projected noisy stochastic gradient descent (PNSGD) with cyclic mini-batch sampling. The learning process converges to ν_{D|B}, and unlearning fine-tunes from the current parameters on D’. The Rényi difference d_α is bounded via: (1) W∞ distance between adjacent stationary distributions (Lemma 3.3), (2) Metric-aware privacy amplification by iteration (Lemma 2.6), (3) Conversion from W∞ to Rényi divergence bounds.
Key Findings
- Z_B = min(2ηM/(b(1-c^{n/b})), 2R) provides much tighter W∞ bounds than naive 2R
- Privacy loss ε = O(Z_B² · c^{2Kn/b}) decays exponentially with K unlearning epochs under strong convexity
- Smaller mini-batch b leads to better privacy decay rate but potentially worse Z_B and utility
- PNSGD requires only 2% (mini-batch) to 10% (full-batch) of gradient computations vs. baselines
- Sequential unlearning degrades linearly via W∞ triangle inequality, not exponentially
Important References
- Langevin Unlearning — Chien et al. (2024), companion paper using Langevin dynamics analysis
- Descent-to-Delete — Neel et al. (2021), full-batch gradient-based unlearning baseline (D2D)
- Certified Data Removal from Machine Learning Models — Guo et al. (2020), Newton-step certified removal
Atomic Notes
- projected noisy stochastic gradient descent
- Rényi unlearning
- W-infinity distance tracking
- contractive noisy iteration