Towards Certified Unlearning for Deep Neural Networks

Abstract

In the field of machine unlearning, certified unlearning has been extensively studied in convex machine learning models due to its high efficiency and strong theoretical guarantees. However, its application to deep neural networks (DNNs), known for their highly nonconvex nature, still poses challenges. To bridge the gap between certified unlearning and DNNs, we propose several simple techniques to extend certified unlearning methods to nonconvex objectives. To reduce the time complexity, we develop an efficient computation method by inverse Hessian approximation without compromising certification guarantees. In addition, we extend our discussion of certification to nonconvergence training and sequential unlearning, considering that real-world users can send unlearning requests at different time points. Extensive experiments on three real-world datasets demonstrate the efficacy of our method and the advantages of certified unlearning in DNNs.

Summary

This paper bridges the gap between certified unlearning — which provides formal (epsilon, delta) indistinguishability guarantees between the unlearned model and a fully retrained model — and deep neural networks whose loss landscapes are inherently nonconvex. Prior certified unlearning methods relied on strong convexity assumptions that do not hold for DNNs.

The authors decompose certified unlearning into two steps: (1) estimating the retrained model via a modified Newton update, and (2) adding calibrated Gaussian noise proportional to the approximation error bound to achieve the certification budget. Two key techniques make this practical for nonconvex objectives: local convex approximation (adding an l2 regularization term to make the Hessian positive definite locally) and a weight norm constraint during training (projected gradient descent with bound C on parameter norms), which together yield a tractable approximation error bound independent of the unlearned set size.

For computational efficiency, the authors adopt the LiSSA (Linear time Stochastic Second-order Algorithm) method to approximate the inverse Hessian, reducing complexity from O(np^2 + p^3) to O(sp^2 + n_u p) where s is the number of recursion samples. They further extend the certification analysis to nonconvergence training (where w* is not the exact minimum) and sequential unlearning (multiple successive unlearning requests), proving that the same approximation error bounds hold in both settings. Experiments on MNIST, CIFAR-10, and SVHN with MLP, AllCNN, and ResNet18 architectures show that their method achieves over 10x speedup compared to retraining while maintaining competitive F1-scores and membership inference attack resistance.

Key Contributions

Extension of certified unlearning from convex to nonconvex objectives via local convex approximation and weight norm constraints, without requiring convexity of the loss function
Efficient inverse Hessian approximation using LiSSA that achieves 470x speedup over exact Hessian computation while preserving certification guarantees
Theoretical analysis of certified unlearning under nonconvergence training and sequential unlearning settings with maintained approximation error bounds
Comprehensive experimental validation across three datasets and three DNN architectures demonstrating practical effectiveness

Methodology

The core framework follows a two-step process based on Theorem 2.3 (connecting certified unlearning to differential privacy):

Approximation step: Estimate the retrained model via a Newton-like update: w_tilde = w* + (n_u / (n - n_u)H) * H_tilde_inv * grad_L(w*, D_u), where H_tilde_inv is the LiSSA-approximated inverse Hessian with local convex regularization (lambda * I added to make eigenvalues positive)
Noise addition step: Add Gaussian noise Y ~ N(0, sigma^2 I) calibrated to the approximation error bound Delta to achieve (epsilon, delta)-certified unlearning

Key technical components:

local convex approximation for certified unlearning: Adding lambda * I to the Hessian to ensure positive definiteness at the cost of a controllable approximation bias
LiSSA for inverse Hessian approximation: Stochastic recursive estimation of (H + lambda I)^{-1} using s i.i.d. Hessian samples
sequential certified unlearning: Processing multiple unlearning requests sequentially while maintaining the same per-step error bounds (Algorithm 2)

Key Findings

Certified unlearning achieves the closest F1-scores to the retrain-from-scratch baseline on the unlearned set across most experimental configurations
The method provides 10x+ speedup over full retraining on DNNs
Local convex approximation distinctly reduces the approximation error bound and slightly reduces real approximation error
The weight norm constraint yields much lower approximation error compared to unconstrained training
Under sequential unlearning, the gradient norm and approximation error bound decrease at each step as parameters approach a local optimum
A tradeoff exists between certification budget tightness and model utility: tighter budgets require more noise, degrading performance

Important References

Certified data removal from machine learning models (Guo et al., 2020) — foundational definition of (epsilon, delta)-certified unlearning
Remember what you want to forget (Sekhari et al., 2021) — algorithms for certified unlearning in convex models
Descent-to-delete (Neel et al., 2021) — gradient-based methods for machine unlearning

Atomic Notes

paper

Alethograph

Explorer

Towards Certified Unlearning for Deep Neural Networks

Summary

Key Contributions

Methodology

Key Findings

Important References

Atomic Notes

Graph View

Table of Contents

Backlinks