exact unlearning

Exact unlearning (also called perfect unlearning) is the strongest formal definition of machine unlearning, requiring that the distribution of the unlearned model is identical to that of a model retrained from scratch on the remaining data. It is formalised in Nguyen et al. (2025) as follows.

Special case (Definition 1): Given a learning algorithm A(.), a dataset D, and a forget set D_f, the process U(.) is an exact unlearning process iff:

Pr(A(D \ D_f)) = Pr(U(D, D_f, A(D)))

General case (Definition 2): The process U(.) is exact iff for all hypothesis sets T:

Pr(A(D \ D_f) in T) = Pr(U(D, D_f, A(D)) in T)

This definition leads to two notions depending on which space indistinguishability is measured in:

Distribution of weights: Pr(w_r) = Pr(w_u), where w_r and w_u are the retrained and unlearned model parameters
Distribution of outputs: Pr(M(X; w_r)) = Pr(M(X; w_u)) for all inputs X (sometimes called “weak unlearning”)

Key Details

Retraining from scratch is the only method considered to be truly exact unlearning, since it trivially satisfies the distributional equality
Retraining is computationally expensive and cannot handle batch removal requests efficiently
The SISA framework (Bourtoule et al.) approximates exact unlearning via data partitioning into shards and slices, enabling efficient partial retraining
Statistical query learning supports exact unlearning by recomputing statistical queries over remaining data, but does not scale to complex models (deep networks)
Exact unlearning is contrasted with approximate unlearning, which relaxes the distributional equality to bounded divergence
certified pixel-level unlearning extends the approximate unlearning framework (not exact) to the pixel-level segmentation setting

concept

Alethograph

Explorer

exact unlearning

Key Details

Graph View

Backlinks