Abstract
We study the problem of unlearning datapoints from a learnt model. The learner first receives a dataset S drawn i.i.d. from an unknown distribution, and outputs a model w that performs well on unseen samples from the same distribution. However, at some point in the future, any training datapoint z in S can request to be unlearned, thus prompting the learner to modify its output model while still ensuring the same accuracy guarantees. We initiate a rigorous study of generalization in machine unlearning, where the goal is to perform well on previously unseen datapoints. Our focus is on both computational and storage complexity. For the setting of convex losses, we provide an unlearning algorithm that can unlearn up to O(n/d^{1/4}) samples, where d is the problem dimension. In comparison, in general, differentially private learning (which implies unlearning) only guarantees deletion of O(n/d^{1/2}) samples. This demonstrates a novel separation between differential privacy and machine unlearning.
Summary
This paper initiates the theoretical study of machine unlearning from a population risk (generalization) perspective, as opposed to the empirical risk minimization focus of prior work. The authors formalize certified approximate unlearning via a definition of (epsilon, delta)-unlearning inspired by differential privacy, where an observer cannot distinguish between a model that has had data deleted via the unlearning algorithm and a model retrained from scratch on the remaining data. They introduce the concept of deletion capacity to measure the maximum number of samples that can be unlearned while maintaining test loss guarantees.
A key contribution is establishing a strict separation between differential privacy (DP) and machine unlearning. While any DP algorithm with edit distance m automatically provides (epsilon, delta)-unlearning for m samples, the authors show that DP-based approaches can only guarantee deletion of O(n/sqrt(d)) samples. In contrast, their unlearning algorithms — which leverage the specific samples being deleted — achieve deletion capacity of O(n/d^{1/4}), a quadratic improvement in the dimension dependence. This improvement arises because their Newton-step-based unlearning algorithm approximates the empirical minimizer on the remaining data with precision O(m^2/n^2), requiring only O(m^2/n^2)-scale noise for the unlearning guarantee, compared to the O(m/n)-scale noise needed by DP.
The algorithms store compact data statistics (the Hessian of the empirical loss at the learned model) of size O(d^2), independent of the dataset size n, making them storage-efficient. The paper provides results for both strongly convex and convex loss functions, with the convex case handled by adding regularization and reducing to the strongly convex setting.
Key Contributions
- Formalization of (epsilon, delta)-unlearning (Definition 2): a distributional indistinguishability guarantee inspired by DP but strictly weaker, enabling more efficient algorithms
- Introduction of deletion capacity: the maximum number of deletions while maintaining a population risk bound of 0.01
- Strict separation from DP: unlearning algorithms that are aware of which samples to delete achieve O(n/d^{1/4}) deletion capacity vs. O(n/d^{1/2}) for DP-based approaches — a multiplicative Theta(d^{1/4}) gap
- Upper bound on deletion capacity (Theorem 1): even with access to all remaining data, deletion capacity is at most O(cn) for some c < 1 for strongly convex losses, demonstrating inherent limits
- Storage-efficient algorithms: the unlearning algorithm only needs O(d^2) auxiliary statistics, independent of dataset size
- Newton-step-based unlearning algorithm with running time O(d^omega) independent of n
Methodology
The learning algorithm A_sc minimizes the empirical loss and stores the Hessian of the empirical loss evaluated at the learned model as auxiliary statistics T(S). The unlearning algorithm (Algorithm 1) receives the delete requests U, the learned model, and T(S), then:
- Estimates the Hessian on the remaining dataset S\U using T(S) and the individual Hessians of deleted points
- Computes a Newton-step update to approximate the empirical minimizer on S\U
- Adds Gaussian noise calibrated to achieve the (epsilon, delta)-unlearning guarantee
For convex (not strongly convex) losses, regularization is added to reduce to the strongly convex case.
Key Findings
- Population risk and empirical risk minimization are qualitatively different objectives for unlearning: an algorithm that exactly minimizes empirical risk after deletion can still have poor test loss
- The deletion capacity is inherently limited even with full access to remaining data (Theorem 1)
- The dimension dependence sqrt(d) in the DP deletion capacity lower bound is necessary — no DP-based unlearning approach can remove it
- The unlearning time is independent of n, scaling only as O(d^omega) where omega is the matrix multiplication exponent
Important References
- Calibrating Noise to Sensitivity in Private Data Analysis — foundational differential privacy work whose framework inspires the (epsilon, delta)-unlearning definition
- The Algorithmic Foundations of Differential Privacy — establishes DP theory that provides the baseline deletion capacity bound
- Membership Inference Attacks Against Machine Learning Models — motivates the need for unlearning by demonstrating that training data can be extracted from models