model clipping for unlearning

Model clipping for unlearning is an alternative method for achieving certified approximate unlearning in neural networks, introduced alongside gradient clipping for unlearning by Koloskova et al. (2025) in Certified Unlearning for Neural Networks. Instead of clipping individual gradients, model clipping projects the entire model update onto a ball of fixed radius C_2 after each SGD step, then adds Gaussian noise. This ensures bounded sensitivity of each iteration as a whole, enabling formal (epsilon, delta)-unlearning guarantees.

The method interprets the clipping-and-noise step as a stochastic post-processing transformation applied to the model from the previous iteration. Since the gradient is computed solely on the retain data and the clipping operator bounds the output of each update, the argument for privacy amplification follows from the contraction properties of the Markov kernel defined by projection plus Gaussian noise (Asoodeh et al., 2020). The initial model also receives Gaussian noise with variance sigma_0^2 to provide (epsilon_0, delta_0)-unlearning at step 0, which is then amplified over subsequent iterations.

Compared to gradient clipping, model clipping has a different trade-off profile. It requires fewer iterations (only O(log(1/delta)) to achieve target unlearning) but may introduce more noise per iteration when the learning rate is small. If the per-iteration clipping radius C_2 is small relative to the initial clipping radius C_0, model clipping can achieve substantial noise reductions: the per-iteration noise scales as C_2^2 rather than depending on C_0, reducing the noise by a factor of C_0^2/C_2^2 compared to output perturbation.

Algorithm

Initialize: x_0 = x_hat (original trained model), add noise xi_0 ~ N(0, sigma_0^2 * I_d).
For t = 0, …, T-1:
- Compute stochastic gradient g_t on a batch from the retain set.
- Update and clip: x_{t+1} = clip(x_t - gamma*(g_t + lambda*x_t), C_2) + xi_{t+1}, where xi_{t+1} ~ N(0, sigma^2 * I_d).
Optionally continue fine-tuning without noise to recover accuracy.

Key Properties

Achieves (epsilon, delta)-unlearning in T = ln(1.25/delta) iterations with minimum noise sigma^2 = 8C_2^2ln(1.25) / epsilon^2.
Per-iteration noise is independent of the initial model norm C_0, depending only on the clipping radius C_2.
Noise reduction factor over output perturbation: C_0^2 / C_2^2, which can be substantial when C_2 << C_0.
Requires fewer iterations than gradient clipping (logarithmic in 1/delta only), but may need larger per-iteration noise when the learning rate is small.
Proof relies on contraction coefficients of the projection-plus-noise Markov kernel (Asoodeh et al., 2020).

method

Alethograph

Explorer

model clipping for unlearning

Algorithm

Key Properties

Graph View

Table of Contents

Backlinks