Projected noisy gradient descent (PNGD) and its stochastic variant PNSGD are optimization algorithms that add Gaussian noise to gradient updates and project onto a bounded convex set. These algorithms are central to both differential privacy and certified machine unlearning.
The update rule for PNSGD is: x_{k}^{j+1} = Π_{C_R}(x_k^j - η g(x_k^j, B^j) + √(2ησ²) W_k^j), where g is the mini-batch gradient, B^j is the j-th mini-batch, σ is the noise standard deviation, and Π_{C_R} is projection onto the ball of radius R.
Algorithm
- Initialize x_0 on C_R = {x ∈ ℝ^d : ||x|| ≤ R}
- For each epoch t = 0, …, T-1:
- For each mini-batch j = 0, …, n/b-1:
- Compute mini-batch gradient g(x_t^j, B^j)
- Add Gaussian noise: x_{t,1} = x_t^j - η g + √(2ησ²) W
- Project: x_t^{j+1} = Π_{C_R}(x_{t,1})
- For each mini-batch j = 0, …, n/b-1:
- Output x_T
Key Properties
- Converges to a unique stationary distribution ν_{D|B} for each dataset-batch sequence pair
- The gradient update is (1-ηm)-contractive under m-strong convexity when η ≤ 1/L
- Projection does not increase Rényi divergence (data processing inequality)
- Noise σ controls privacy-utility trade-off: larger σ → stronger privacy, worse utility
- Mini-batch size b controls privacy-complexity trade-off: smaller b → better privacy decay, more iterations per epoch