Loss perturbation is a technique for achieving certified data removal guarantees by adding a random linear term b^T w to the training loss, where b is sampled from a suitable distribution (Laplace or Gaussian). This noise masks the information leaked by the gradient residual of the Newton update removal mechanism, ensuring that the density of the unlearned model parameters is bounded relative to the retrained model.
The perturbed loss is: L_b(w; D) = Σ ℓ(w^T x_i, y_i) + (λn/2)||w||₂² + b^T w. The key property is that the noise b shifts the optimal parameters, so any two models trained on adjacent datasets (differing in one point) produce parameter distributions whose density ratio is bounded by e^ε.
Key Details
- Gaussian mechanism: b ~ N(0, cε’/ε)^d, achieves (ε,δ)-certified removal with δ = 1.5·exp(-c²/2)
- Laplace mechanism: p(b) ∝ exp(-(ε/ε’)||b||₂), achieves ε-certified removal (no δ)
- Trade-off: Larger σ (more noise) → supports more removals but degrades accuracy
- Removal budget: σε/c total gradient residual budget; accumulated residual β tracked across removals
- Equivalence to output perturbation: Chaudhuri et al. (2011) showed loss perturbation and output perturbation give similar privacy guarantees
- The noise makes the gradient residual information-theoretically impossible to exploit for membership inference