The Newton update removal mechanism is the foundational algorithm for certified data removal from parametric models. Given a model w* trained on dataset D, it approximates the retrained model w_{D’} (trained on D’ = D \ {(x_n, y_n)}) using a single Newton step: w⁻ = w + H_{w*}⁻¹ Δ, where H_{w*} = ∇²L(w*; D’) is the Hessian of the loss evaluated on the remaining data, and Δ = λw* + ∇ℓ(w*^T x_n, y_n) is the gradient of the loss at the removed point.
This is equivalent to the influence function applied with ε = -1/n, making a direct connection between data deletion and the influence function literature. The mechanism is exact for least-squares regression (where the Hessian is constant) and approximate for logistic regression and other nonlinear losses.
Algorithm
- Precomputation (offline, O(d²n)): Form Hessian H = Σ ∇²ℓ(w*^T x_i, y_i) + λnI
- On removal request (online, O(d³)):
- Compute Δ = λw* + ∇ℓ(w*^T x_n, y_n)
- Compute w⁻ = w* + H⁻¹Δ
- Check gradient residual ||∇L(w⁻; D’)||₂
- Certification: If residual ≤ σε/c, the removal is certified; otherwise retrain
Key Properties
- Gradient residual: ||∇L(w⁻; D’)||₂ ≤ γ(n-1)||H⁻¹Δ||₂² (worst-case)
- Computational cost: O(d³) per removal (dominated by Hessian solve), O(d²n) precomputation
- Batch removal: w^{(-m)} = w* + [H_{w*}^{(m)}]⁻¹ Δ^{(m)}, residual scales as O(m²)
- Limitation: Requires convex loss for theoretical guarantees; approximation for deep networks
- Data-dependent bounds: Can be orders of magnitude tighter than worst-case via Corollary 1
- Used as building block by Fair Machine Unlearning, Fast Model Debias with Machine Unlearning, and Certified Machine Unlearning via Noisy Stochastic Gradient Descent