Newton-step unlearning is the core algorithm proposed in Sekhari et al. (2021) for achieving certified approximate unlearning with high deletion capacity under convex loss functions. The method uses a second-order (Newton-step) approximation to efficiently update a trained model after data deletion requests, without requiring access to the full training dataset.

The algorithm operates in two phases. During training, the learning algorithm A_sc minimizes the empirical loss and additionally computes and stores the Hessian of the empirical loss at the learned model, forming the auxiliary statistics T(S) = {nabla^2 F_hat(w_hat)}. This requires O(d^2) storage, independent of the dataset size n. During unlearning, given delete requests U, the algorithm:

  1. Estimates the Hessian on the remaining data: H_hat = 1/(n-m) * (n * nabla^2 F_hat(w_hat) - sum_{z in U} nabla^2 f(w_hat, z))
  2. Computes a Newton-step update: w_bar = w_hat + 1/(n-m) * H_hat^{-1} * sum_{z in U} nabla f(w_hat, z)
  3. Adds calibrated Gaussian noise: w_tilde = w_bar + nu, where nu ~ N(0, sigma^2 * I_d) with sigma proportional to m^2/n

The key insight is that this Newton step approximates the empirical minimizer on S\U with precision O(m^2/n^2), which is quadratically smaller than the O(m/n) perturbation scale in differential privacy. This reduced noise requirement directly translates to the quadratic improvement in deletion capacity over DP-based approaches.

Algorithm

Input: Delete requests U = {z_j}_{j=1}^m, model w_hat from A_sc, statistics T(S) = {nabla^2 F_hat(w_hat)}, loss function f

  1. Set gamma = 2Mm^2L^2 / (lambda^2 n^2), sigma = (gamma/epsilon) * sqrt(2 ln(1.25/delta))
  2. Compute estimated Hessian H_hat on S\U (Equation 7)
  3. Compute Newton update w_bar (Equation 8)
  4. Sample nu from N(0, sigma^2 * I_d)
  5. Return w_tilde = w_bar + nu

Key Properties

  • Running time: O(d^omega) where omega in [2, 2.38] is the matrix multiplication exponent; independent of dataset size n
  • Storage: O(d^2) for the Hessian, independent of n
  • Deletion capacity: O(n * sqrt(epsilon) / (d * log(1/delta))^{1/4}) for strongly convex losses
  • Assumptions: Requires the loss function to be lambda-strongly convex, L-Lipschitz, and M-Hessian-Lipschitz
  • For convex (not strongly convex) losses, a regularization term (lambda/2)||w||^2 is added to reduce to the strongly convex case
  • For structured problems (e.g., linear SVMs with diagonal Hessians), the running time reduces to O(d)
  • Related to the Newton update removal mechanism of Guo et al. (2020), but does not require the training data for unlearning and provides population risk rather than empirical risk guarantees

method