Newton-step unlearning is the core algorithm proposed in Sekhari et al. (2021) for achieving certified approximate unlearning with high deletion capacity under convex loss functions. The method uses a second-order (Newton-step) approximation to efficiently update a trained model after data deletion requests, without requiring access to the full training dataset.
The algorithm operates in two phases. During training, the learning algorithm A_sc minimizes the empirical loss and additionally computes and stores the Hessian of the empirical loss at the learned model, forming the auxiliary statistics T(S) = {nabla^2 F_hat(w_hat)}. This requires O(d^2) storage, independent of the dataset size n. During unlearning, given delete requests U, the algorithm:
- Estimates the Hessian on the remaining data: H_hat = 1/(n-m) * (n * nabla^2 F_hat(w_hat) - sum_{z in U} nabla^2 f(w_hat, z))
- Computes a Newton-step update: w_bar = w_hat + 1/(n-m) * H_hat^{-1} * sum_{z in U} nabla f(w_hat, z)
- Adds calibrated Gaussian noise: w_tilde = w_bar + nu, where nu ~ N(0, sigma^2 * I_d) with sigma proportional to m^2/n
The key insight is that this Newton step approximates the empirical minimizer on S\U with precision O(m^2/n^2), which is quadratically smaller than the O(m/n) perturbation scale in differential privacy. This reduced noise requirement directly translates to the quadratic improvement in deletion capacity over DP-based approaches.
Algorithm
Input: Delete requests U = {z_j}_{j=1}^m, model w_hat from A_sc, statistics T(S) = {nabla^2 F_hat(w_hat)}, loss function f
- Set gamma = 2Mm^2L^2 / (lambda^2 n^2), sigma = (gamma/epsilon) * sqrt(2 ln(1.25/delta))
- Compute estimated Hessian H_hat on S\U (Equation 7)
- Compute Newton update w_bar (Equation 8)
- Sample nu from N(0, sigma^2 * I_d)
- Return w_tilde = w_bar + nu
Key Properties
- Running time: O(d^omega) where omega in [2, 2.38] is the matrix multiplication exponent; independent of dataset size n
- Storage: O(d^2) for the Hessian, independent of n
- Deletion capacity: O(n * sqrt(epsilon) / (d * log(1/delta))^{1/4}) for strongly convex losses
- Assumptions: Requires the loss function to be lambda-strongly convex, L-Lipschitz, and M-Hessian-Lipschitz
- For convex (not strongly convex) losses, a regularization term (lambda/2)||w||^2 is added to reduce to the strongly convex case
- For structured problems (e.g., linear SVMs with diagonal Hessians), the running time reduces to O(d)
- Related to the Newton update removal mechanism of Guo et al. (2020), but does not require the training data for unlearning and provides population risk rather than empirical risk guarantees