Projected Noisy Stochastic Gradient Descent (PNSGD) is a learning and unlearning mechanism proposed in CHIEN2024 for certified machine-unlearning. The core idea is to use the same noisy SGD procedure for both training and unlearning, where the unlearning process is simply a continuation of PNSGD on the updated dataset starting from the previously trained model parameters.
The PNSGD update rule at each iteration is:
x_{t}^{j+1} = Pi_{C_R}(x_t^j - eta * g(x_t^j, B^j) + sqrt(2 * eta * sigma^2) * W_t^j)
where Pi_{C_R} is projection onto the ball of radius R, eta is the step size, g is the mini-batch gradient over mini-batch B^j, sigma^2 controls the noise variance, and W_t^j ~ N(0, I_d). The cyclic mini-batch strategy partitions [n] into n/b mini-batches of size b, processing them in a fixed cyclic order within each epoch.
The unlearning guarantee arises from three properties of PNSGD:
-
Unique stationary distribution: For any fixed mini-batch sequence B, the Markov chain induced by PNSGD admits a unique invariant probability measure nu_{D|B} (Theorem 3.1). This means sufficiently long training converges to a well-defined distribution independent of initialization.
-
Small W-infinity distance between adjacent processes: The W-infinity distance between stationary distributions on adjacent datasets is bounded by Z_B = O(eta * M / b), much tighter than the naive 2R bound.
-
Privacy amplification by iteration: The W-infinity bound is converted to a Renyi divergence bound via the contractive noisy iteration framework (Lemma 2.6), yielding the final (alpha, epsilon)-RU guarantee.
The computational benefit compared to retraining from scratch emerges because the unlearning process starts from a distribution close to the target (distance Z_B << 2R), requiring far fewer epochs to converge. Experiments demonstrate 2% and 10% of the gradient computations needed compared to state-of-the-art methods for mini-batch and full-batch settings.
A key limitation is the requirement of (strong) convexity, which ensures the contraction rate c = 1 - eta*m < 1. Extension to nonconvex settings remains open, though ZHANGB2024 addresses this gap via different techniques.