gradient clipping for unlearning

Gradient clipping for unlearning is a method for achieving certified approximate unlearning in neural networks, introduced by Koloskova et al. (2025) in Certified Unlearning for Neural Networks. The method performs noisy stochastic gradient descent on the retain data (data not to be forgotten), with gradients clipped to a bounded norm before updates are applied and Gaussian noise added at each iteration. This ensures each update step acts as a bounded-sensitivity stochastic post-processing operation, enabling provable (epsilon, delta)-unlearning guarantees via privacy amplification by iteration.

The approach resembles DP-SGD (Abadi et al., 2016) but differs in a critical way: gradient updates are computed exclusively on the retain set, excluding all “private” (forget) data points. The initial model is clipped to radius C_0, and at each iteration, gradients are clipped to radius C_1 before applying updates with Gaussian noise. An optional L2 regularization term (parameter lambda) can be incorporated, which enables exponential decay of the dependence on the initial clipping radius C_0 over iterations.

Unlike prior certified unlearning methods for non-convex settings (e.g., Chourasia & Shah 2023, Chien et al. 2024, Mu & Klabjan 2024), gradient clipping imposes no assumptions on loss function smoothness or convexity. This makes it the first certified unlearning method broadly applicable to deep neural networks without restricting the class of loss functions.

Algorithm

Initialize: x_0 = clip(x_hat, C_0), where x_hat is the original trained model.
For t = 0, …, T-1:
- Compute stochastic gradient g_t on a batch from the retain set D\D_f.
- Clip gradient: clip(g_t, C_1).
- Update: x_{t+1} = x_t - gamma * (clip(g_t, C_1) + lambda * x_t) + noise, where noise ~ N(0, sigma^2 * I_d).
Optionally continue fine-tuning without noise to recover accuracy.

Key Properties

Without regularization (lambda = 0): noise variance sigma^2 = 9log(1/delta) * (C_0 + C_1gammaT)^2 / (epsilon^2 * T). Setting T = C_0/(gammaC_1) minimizes this to sigma^2 = 36gammaC_1C_0log(1/delta) / epsilon^2.
With regularization (lambda > 0): sigma^2 = 72gammalambdalog(1/delta) * (C_0(1-gammalambda)^T + C_1/lambda)^2 / epsilon^2. Setting T = log(lambdaC_0/C_1)/(gammalambda) yields sigma^2 = C_1^2gammalog(1/delta) / (lambdaepsilon^2).
Regularization enables exponential decay of the initial clipping effect, requiring only logarithmic iterations.
Reduces per-iteration noise by a factor of C_0/(C_1*gamma) compared to output perturbation for unlearning.

method

Alethograph

Explorer

gradient clipping for unlearning

Algorithm

Key Properties

Graph View

Table of Contents

Backlinks