Exact unlearning (also called perfect unlearning) is the strongest formal definition of machine unlearning, requiring that the distribution of the unlearned model is identical to that of a model retrained from scratch on the remaining data. It is formalised in Nguyen et al. (2025) as follows.
Special case (Definition 1): Given a learning algorithm A(.), a dataset D, and a forget set D_f, the process U(.) is an exact unlearning process iff:
Pr(A(D \ D_f)) = Pr(U(D, D_f, A(D)))
General case (Definition 2): The process U(.) is exact iff for all hypothesis sets T:
Pr(A(D \ D_f) in T) = Pr(U(D, D_f, A(D)) in T)
This definition leads to two notions depending on which space indistinguishability is measured in:
- Distribution of weights: Pr(w_r) = Pr(w_u), where w_r and w_u are the retrained and unlearned model parameters
- Distribution of outputs: Pr(M(X; w_r)) = Pr(M(X; w_u)) for all inputs X (sometimes called “weak unlearning”)
Key Details
- Retraining from scratch is the only method considered to be truly exact unlearning, since it trivially satisfies the distributional equality
- Retraining is computationally expensive and cannot handle batch removal requests efficiently
- The SISA framework (Bourtoule et al.) approximates exact unlearning via data partitioning into shards and slices, enabling efficient partial retraining
- Statistical query learning supports exact unlearning by recomputing statistical queries over remaining data, but does not scale to complex models (deep networks)
- Exact unlearning is contrasted with approximate unlearning, which relaxes the distributional equality to bounded divergence
- certified pixel-level unlearning extends the approximate unlearning framework (not exact) to the pixel-level segmentation setting