Abstract
Today, computer systems hold large amounts of personal data. Yet while such an abundance of data allows breakthroughs in artificial intelligence, and especially machine learning, its existence can be a threat to user privacy, and it can weaken the bonds of trust between humans and AI. Recent regulations now require that, on request, private information about a user must be removed from both computer systems and from machine learning models — this legislation is more colloquially called “the right to be forgotten”. While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as machine learning models often ‘remember’ the old data. Contemporary adversarial attacks on trained models have proven that we can learn whether an instance or an attribute belonged to the training data. This phenomenon calls for a new paradigm, namely machine unlearning, to make machine learning models forget about particular data. It turns out that recent works on machine unlearning have not been able to completely solve the problem due to the lack of common frameworks and resources. Therefore, this paper aspires to present a comprehensive examination of machine unlearning’s concepts, scenarios, methods, and applications. Specifically, as a category collection of cutting-edge studies, the intention behind this article is to serve as a comprehensive resource for researchers and practitioners seeking an introduction to machine unlearning and its formulations, design criteria, removal requests, algorithms, and applications.
Summary
This survey provides the most comprehensive treatment of machine unlearning as of its publication, covering the full pipeline from motivation and formal definitions to algorithms, evaluation metrics, and applications. The work is motivated by privacy regulations (GDPR, CCPA) that establish the “right to be forgotten,” which requires not just deleting data from databases but ensuring that machine learning models no longer retain information about deleted data.
The survey organises the field around several key axes. First, it presents the unlearning framework: the learning component (data, algorithm, model) and the unlearning component (unlearning algorithm, unlearned model, verification). Design requirements include completeness (indistinguishability from retrained model), timeliness (speed relative to retraining), accuracy, lightweight operation, provable guarantees, model-agnosticism, and verifiability. The authors distinguish multiple types of unlearning requests: item removal, feature removal, class removal, task removal, and stream removal.
The formal definitions section is particularly rigorous. Exact unlearning requires that the distribution of the unlearned model exactly matches that of a model retrained from scratch on the remaining data. Approximate unlearning relaxes this to (epsilon, delta)-bounded divergence, connecting to differential privacy. The survey also covers zero-glance, zero-shot, and few-shot unlearning scenarios where access to the data to be forgotten is restricted.
The taxonomy of unlearning algorithms divides methods into three categories: model-agnostic (differential privacy, certified removal, statistical query learning, decremental learning, knowledge adaptation, MCMC sampling), model-intrinsic (softmax classifiers, linear models, tree-based, Bayesian, DNN-based), and data-driven (data partitioning/SISA, data augmentation, data influence). Applications span recommender systems, federated learning, graph embedding, lifelong learning, LLMs, and generative models.
Key Contributions
- Unified framework: Presents the complete unlearning pipeline covering learning component, removal requests, unlearning algorithms, evaluation metrics, and verification mechanisms
- Formal taxonomy: Organises all unlearning methods into model-agnostic, model-intrinsic, and data-driven categories with systematic comparison (Table 3)
- Comprehensive definitions: Formalises exact unlearning (special and general case), approximate unlearning (epsilon-certified and (epsilon,delta)-certified), and weak unlearning (output-space indistinguishability)
- Evaluation metrics catalogue: Documents accuracy, completeness, unlearn time, relearn time, layer-wise distance, activation distance, JS-Divergence, membership inference, ZRF score, anamnesis index, epistemic uncertainty, and model inversion attacks (Table 6)
- Application survey: Covers unlearning in recommender systems, federated learning, graph embedding, lifelong learning, LLMs, and generative models
- Open research questions: Identifies unified design requirements, unified benchmarking, adversarial machine unlearning, interpretable unlearning, and evolving data streams as key open problems
Methodology
This is a survey paper. The methodology consists of systematic literature review and taxonomic organisation of the machine unlearning field.
Key Findings
- Influence functions are dominant: Understanding a data item’s impact on model parameters is the key to efficient unlearning
- Reachability of model parameters: Whether the original and unlearned models can share parameters given different data is an open question
- Unlearning verification needs independent auditing: Current verification methods (feature injection, membership inference, backdoor attacks) each have limitations
- Federated unlearning is emerging: The federated setting introduces unique challenges (aggregation obscures individual contributions, non-IID data)
- Model repair via unlearning: Unlearning can be used to remove adversarial poisoning, bias, and catastrophic forgetting
- No universal winner: Table 3 shows no single method satisfies all design requirements across all scenarios
- SISA (data partitioning): The only approach that fully supports exact unlearning for item, feature, and class removal across all design requirements
Important References
- Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity — applies spatial specificity concepts relevant to the feature removal unlearning request
- Towards Certified Shortcut Unlearning in Medical Imaging — extends approximate unlearning to pixel-level segmentation via weak unlearning definition from this survey
- Certified Unlearning for Neural Networks — key certified removal mechanism discussed in the model-agnostic section