Certified Data Removal from Machine Learning Models

Abstract

Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.

Summary

This paper establishes the theoretical framework for certified data removal from machine learning models, providing the formal definition of ε-certified removal and (ε,δ)-certified removal inspired by differential privacy. The central mechanism is the Newton update: given trained parameters w* = argmin L(w; D), removing sample (x_n, y_n) yields w⁻ = w* + H_{w*}⁻¹ Δ, where Δ = λw* + ∇ℓ(w*^T x_n, y_n) is the loss gradient at the removed point and H_{w*} is the Hessian.

The key insight is that this Newton step alone does not guarantee certified removal because the gradient residual ||∇L(w⁻; D’)||₂ leaks information about the removed point. To mask this residual, the authors add a random perturbation b^T w to the training loss (loss perturbation), where b ~ N(0, σI)^d. The noise masks the residual, and the density ratio between the perturbed and retrained model is bounded by e^ε.

Key Contributions

Formal definition of ε-certified removal and (ε,δ)-certified removal inspired by DP
Newton update removal mechanism w⁻ = w* + H⁻¹Δ for strongly convex L₂-regularized losses
Gradient residual bound: ||∇L(w⁻; D’)||₂ ≤ 4γC²/(λ²(n-1)) (Theorem 1)
Loss perturbation technique for masking residual information
Data-dependent bounds (Corollary 1) that are orders of magnitude tighter than worst-case
Extension to deep networks via public/private feature extractors

Methodology

Two-phase approach: (1) Training: Minimize L_b(w; D) = Σ ℓ(w^T x_i, y_i) + (λn/2)||w||₂² + b^T w, where b ~ N(0, cε’/σ)^d provides the noise. (2) Removal: Apply Newton update w⁻ = w* + H_{w*}⁻¹ Δ. Track gradient residual norm β, and retrain from scratch if β exceeds the budget σε/c. The Newton update removal mechanism requires computing and inverting the Hessian (O(d²n) offline, O(d³) online).

Key Findings

For least-squares regression, the Newton update is exact (zero gradient residual)
For logistic regression, gradient residual is non-zero but bounded by O(1/λ²n)
Data-dependent bounds are orders of magnitude tighter than worst-case bounds
Removal time is 250x+ faster than retraining on LSUN (0.48s vs 124s)
Combining with DP feature extractors enables removal from non-linear models
High-influence outliers are harder to remove (higher ||H⁻¹Δ||₂)

Important References

Towards Making Systems Forget with Machine Unlearning — Cao & Yang (2015), introduced machine unlearning
Deep Learning with Differential Privacy — Abadi et al. (2016), DP-SGD training
Understanding Black-box Predictions via Influence Functions — Koh & Liang (2017), influence function framework

Atomic Notes

paper

Alethograph

Explorer