Summary

This paper establishes the theoretical framework for certified data removal from machine learning models, providing the formal definition of ε-certified removal and (ε,δ)-certified removal inspired by differential privacy. The central mechanism is the Newton update: given trained parameters w* = argmin L(w; D), removing sample (x_n, y_n) yields w⁻ = w* + H_{w*}⁻¹ Δ, where Δ = λw* + ∇ℓ(w*^T x_n, y_n) is the loss gradient at the removed point and H_{w*} is the Hessian.

The key insight is that this Newton step alone does not guarantee certified removal because the gradient residual ||∇L(w⁻; D’)||₂ leaks information about the removed point. To mask this residual, the authors add a random perturbation b^T w to the training loss (loss perturbation), where b ~ N(0, σI)^d. The noise masks the residual, and the density ratio between the perturbed and retrained model is bounded by e^ε.

Key Contributions

  • Formal definition of ε-certified removal and (ε,δ)-certified removal inspired by DP
  • Newton update removal mechanism w⁻ = w* + H⁻¹Δ for strongly convex L₂-regularized losses
  • Gradient residual bound: ||∇L(w⁻; D’)||₂ ≤ 4γC²/(λ²(n-1)) (Theorem 1)
  • Loss perturbation technique for masking residual information
  • Data-dependent bounds (Corollary 1) that are orders of magnitude tighter than worst-case
  • Extension to deep networks via public/private feature extractors

Methodology

Two-phase approach: (1) Training: Minimize L_b(w; D) = Σ ℓ(w^T x_i, y_i) + (λn/2)||w||₂² + b^T w, where b ~ N(0, cε’/σ)^d provides the noise. (2) Removal: Apply Newton update w⁻ = w* + H_{w*}⁻¹ Δ. Track gradient residual norm β, and retrain from scratch if β exceeds the budget σε/c. The Newton update removal mechanism requires computing and inverting the Hessian (O(d²n) offline, O(d³) online).

Key Findings

  • For least-squares regression, the Newton update is exact (zero gradient residual)
  • For logistic regression, gradient residual is non-zero but bounded by O(1/λ²n)
  • Data-dependent bounds are orders of magnitude tighter than worst-case bounds
  • Removal time is 250x+ faster than retraining on LSUN (0.48s vs 124s)
  • Combining with DP feature extractors enables removal from non-linear models
  • High-influence outliers are harder to remove (higher ||H⁻¹Δ||₂)

Important References

  1. Towards Making Systems Forget with Machine Unlearning — Cao & Yang (2015), introduced machine unlearning
  2. Deep Learning with Differential Privacy — Abadi et al. (2016), DP-SGD training
  3. Understanding Black-box Predictions via Influence Functions — Koh & Liang (2017), influence function framework

Atomic Notes


paper