Exact unlearning (also called perfect unlearning) is the strongest formal definition of machine unlearning, requiring that the distribution of the unlearned model is identical to that of a model retrained from scratch on the remaining data. It is formalised in Nguyen et al. (2025) as follows.

Special case (Definition 1): Given a learning algorithm A(.), a dataset D, and a forget set D_f, the process U(.) is an exact unlearning process iff:

Pr(A(D \ D_f)) = Pr(U(D, D_f, A(D)))

General case (Definition 2): The process U(.) is exact iff for all hypothesis sets T:

Pr(A(D \ D_f) in T) = Pr(U(D, D_f, A(D)) in T)

This definition leads to two notions depending on which space indistinguishability is measured in:

  • Distribution of weights: Pr(w_r) = Pr(w_u), where w_r and w_u are the retrained and unlearned model parameters
  • Distribution of outputs: Pr(M(X; w_r)) = Pr(M(X; w_u)) for all inputs X (sometimes called “weak unlearning”)

Key Details

  • Retraining from scratch is the only method considered to be truly exact unlearning, since it trivially satisfies the distributional equality
  • Retraining is computationally expensive and cannot handle batch removal requests efficiently
  • The SISA framework (Bourtoule et al.) approximates exact unlearning via data partitioning into shards and slices, enabling efficient partial retraining
  • Statistical query learning supports exact unlearning by recomputing statistical queries over remaining data, but does not scale to complex models (deep networks)
  • Exact unlearning is contrasted with approximate unlearning, which relaxes the distributional equality to bounded divergence
  • certified pixel-level unlearning extends the approximate unlearning framework (not exact) to the pixel-level segmentation setting

concept