output perturbation for unlearning

Output perturbation is a baseline method for achieving certified approximate unlearning that applies the standard Gaussian mechanism directly to a trained model’s parameters. The procedure involves two steps: (1) clip the original model parameters to a bounded region (radius C_0) to ensure bounded sensitivity, and (2) add Gaussian noise calibrated to achieve (epsilon, delta)-unlearning in a single step. This approach is the simplest way to obtain formal unlearning certificates and serves as a baseline against which more sophisticated iterative methods are compared.

The noise required for output perturbation is given by sigma^2 = 8*ln(1.25/delta)*C_0^2 / epsilon^2, following directly from the standard Gaussian mechanism of differential privacy (Dwork & Roth, 2014). While this provides a valid (epsilon, delta)-unlearning guarantee, the required noise magnitude is often very large in practice because it depends quadratically on the clipping radius C_0, which must be large enough to preserve the trained model’s information. This large noise injection can severely degrade model utility.

Output perturbation can be followed by additional fine-tuning on the retain data (without noise or clipping) to partially recover model accuracy. However, as shown in Certified Unlearning for Neural Networks, iterative methods such as gradient clipping for unlearning and model clipping for unlearning substantially reduce the required per-iteration noise by spreading it across multiple steps via privacy amplification by iteration, leading to better accuracy-privacy trade-offs in practice.

Key Details

Formally: x_0 = clip(x_hat, C_0) + xi_0, where xi_0 ~ N(0, 8C_0^2ln(1.25/delta)/epsilon^2 * I_d).
Provides (epsilon, delta)-unlearning in a single step with no iterative refinement.
The required noise scales quadratically with the clipping radius C_0, making it impractical when models have large parameter norms.
Gradient clipping reduces the per-iteration noise by a factor of C_0/(C_1*gamma) compared to output perturbation; model clipping reduces it by a factor of C_0^2/C_2^2.
Despite its simplicity, output perturbation still outperforms retraining from scratch in some compute-limited settings.

method

Alethograph

Explorer

output perturbation for unlearning

Key Details

Graph View

Backlinks