Remember What You Want to Forget Algorithms for Machine Unlearning

Abstract

We study the problem of unlearning datapoints from a learnt model. The learner first receives a dataset S drawn i.i.d. from an unknown distribution, and outputs a model w that performs well on unseen samples from the same distribution. However, at some point in the future, any training datapoint z in S can request to be unlearned, thus prompting the learner to modify its output model while still ensuring the same accuracy guarantees. We initiate a rigorous study of generalization in machine unlearning, where the goal is to perform well on previously unseen datapoints. Our focus is on both computational and storage complexity. For the setting of convex losses, we provide an unlearning algorithm that can unlearn up to O(n/d^{1/4}) samples, where d is the problem dimension. In comparison, in general, differentially private learning (which implies unlearning) only guarantees deletion of O(n/d^{1/2}) samples. This demonstrates a novel separation between differential privacy and machine unlearning.

Summary

This paper initiates the theoretical study of machine unlearning from a population risk (generalization) perspective, as opposed to the empirical risk minimization focus of prior work. The authors formalize certified approximate unlearning via a definition of (epsilon, delta)-unlearning inspired by differential privacy, where an observer cannot distinguish between a model that has had data deleted via the unlearning algorithm and a model retrained from scratch on the remaining data. They introduce the concept of deletion capacity to measure the maximum number of samples that can be unlearned while maintaining test loss guarantees.

A key contribution is establishing a strict separation between differential privacy (DP) and machine unlearning. While any DP algorithm with edit distance m automatically provides (epsilon, delta)-unlearning for m samples, the authors show that DP-based approaches can only guarantee deletion of O(n/sqrt(d)) samples. In contrast, their unlearning algorithms — which leverage the specific samples being deleted — achieve deletion capacity of O(n/d^{1/4}), a quadratic improvement in the dimension dependence. This improvement arises because their Newton-step-based unlearning algorithm approximates the empirical minimizer on the remaining data with precision O(m^2/n^2), requiring only O(m^2/n^2)-scale noise for the unlearning guarantee, compared to the O(m/n)-scale noise needed by DP.

The algorithms store compact data statistics (the Hessian of the empirical loss at the learned model) of size O(d^2), independent of the dataset size n, making them storage-efficient. The paper provides results for both strongly convex and convex loss functions, with the convex case handled by adding regularization and reducing to the strongly convex setting.

Key Contributions

Formalization of (epsilon, delta)-unlearning (Definition 2): a distributional indistinguishability guarantee inspired by DP but strictly weaker, enabling more efficient algorithms
Introduction of deletion capacity: the maximum number of deletions while maintaining a population risk bound of 0.01
Strict separation from DP: unlearning algorithms that are aware of which samples to delete achieve O(n/d^{1/4}) deletion capacity vs. O(n/d^{1/2}) for DP-based approaches — a multiplicative Theta(d^{1/4}) gap
Upper bound on deletion capacity (Theorem 1): even with access to all remaining data, deletion capacity is at most O(cn) for some c < 1 for strongly convex losses, demonstrating inherent limits
Storage-efficient algorithms: the unlearning algorithm only needs O(d^2) auxiliary statistics, independent of dataset size
Newton-step-based unlearning algorithm with running time O(d^omega) independent of n

Methodology

The learning algorithm A_sc minimizes the empirical loss and stores the Hessian of the empirical loss evaluated at the learned model as auxiliary statistics T(S). The unlearning algorithm (Algorithm 1) receives the delete requests U, the learned model, and T(S), then:

Estimates the Hessian on the remaining dataset S\U using T(S) and the individual Hessians of deleted points
Computes a Newton-step update to approximate the empirical minimizer on S\U
Adds Gaussian noise calibrated to achieve the (epsilon, delta)-unlearning guarantee

For convex (not strongly convex) losses, regularization is added to reduce to the strongly convex case.

Key Findings

Population risk and empirical risk minimization are qualitatively different objectives for unlearning: an algorithm that exactly minimizes empirical risk after deletion can still have poor test loss
The deletion capacity is inherently limited even with full access to remaining data (Theorem 1)
The dimension dependence sqrt(d) in the DP deletion capacity lower bound is necessary — no DP-based unlearning approach can remove it
The unlearning time is independent of n, scaling only as O(d^omega) where omega is the matrix multiplication exponent

Important References

Calibrating Noise to Sensitivity in Private Data Analysis — foundational differential privacy work whose framework inspires the (epsilon, delta)-unlearning definition
The Algorithmic Foundations of Differential Privacy — establishes DP theory that provides the baseline deletion capacity bound
Membership Inference Attacks Against Machine Learning Models — motivates the need for unlearning by demonstrating that training data can be extracted from models

Atomic Notes

paper

Alethograph

Explorer

Remember What You Want to Forget Algorithms for Machine Unlearning

Summary

Key Contributions

Methodology

Key Findings

Important References

Atomic Notes

Graph View

Table of Contents

Backlinks