Abstract
Good data stewardship requires removal of data at the request of the data’s owner. This raises the question if and how a trained machine-learning model, which implicitly stores information about its training data, should be affected by such a removal request. Is it possible to “remove” data from a machine-learning model? We study this problem by defining certified removal: a very strong theoretical guarantee that a model from which data is removed cannot be distinguished from a model that never observed the data to begin with. We develop a certified-removal mechanism for linear classifiers and empirically study learning settings in which this mechanism is practical.
Summary
This paper establishes the theoretical framework for certified data removal from machine learning models, providing the formal definition of ε-certified removal and (ε,δ)-certified removal inspired by differential privacy. The central mechanism is the Newton update: given trained parameters w* = argmin L(w; D), removing sample (x_n, y_n) yields w⁻ = w* + H_{w*}⁻¹ Δ, where Δ = λw* + ∇ℓ(w*^T x_n, y_n) is the loss gradient at the removed point and H_{w*} is the Hessian.
The key insight is that this Newton step alone does not guarantee certified removal because the gradient residual ||∇L(w⁻; D’)||₂ leaks information about the removed point. To mask this residual, the authors add a random perturbation b^T w to the training loss (loss perturbation), where b ~ N(0, σI)^d. The noise masks the residual, and the density ratio between the perturbed and retrained model is bounded by e^ε.
Key Contributions
- Formal definition of ε-certified removal and (ε,δ)-certified removal inspired by DP
- Newton update removal mechanism w⁻ = w* + H⁻¹Δ for strongly convex L₂-regularized losses
- Gradient residual bound: ||∇L(w⁻; D’)||₂ ≤ 4γC²/(λ²(n-1)) (Theorem 1)
- Loss perturbation technique for masking residual information
- Data-dependent bounds (Corollary 1) that are orders of magnitude tighter than worst-case
- Extension to deep networks via public/private feature extractors
Methodology
Two-phase approach: (1) Training: Minimize L_b(w; D) = Σ ℓ(w^T x_i, y_i) + (λn/2)||w||₂² + b^T w, where b ~ N(0, cε’/σ)^d provides the noise. (2) Removal: Apply Newton update w⁻ = w* + H_{w*}⁻¹ Δ. Track gradient residual norm β, and retrain from scratch if β exceeds the budget σε/c. The Newton update removal mechanism requires computing and inverting the Hessian (O(d²n) offline, O(d³) online).
Key Findings
- For least-squares regression, the Newton update is exact (zero gradient residual)
- For logistic regression, gradient residual is non-zero but bounded by O(1/λ²n)
- Data-dependent bounds are orders of magnitude tighter than worst-case bounds
- Removal time is 250x+ faster than retraining on LSUN (0.48s vs 124s)
- Combining with DP feature extractors enables removal from non-linear models
- High-influence outliers are harder to remove (higher ||H⁻¹Δ||₂)
Important References
- Towards Making Systems Forget with Machine Unlearning — Cao & Yang (2015), introduced machine unlearning
- Deep Learning with Differential Privacy — Abadi et al. (2016), DP-SGD training
- Understanding Black-box Predictions via Influence Functions — Koh & Liang (2017), influence function framework
Atomic Notes
- Newton update removal mechanism
- statistical indistinguishability
- gradient residual
- loss perturbation