certified approximate unlearning

Certified approximate unlearning is a formal framework for machine unlearning that provides provable guarantees about the removal of data influence from a trained model. Inspired by (epsilon, delta)-differential privacy, it defines when an unlearning algorithm U has successfully removed the influence of a forget set D_f from a model originally trained on the full dataset D.

Formally, an unlearning algorithm U is (epsilon, delta)-unlearning for a training algorithm A if there exists a certifying algorithm A-bar such that for any forget set D_f subset of D and any observation O, the output distributions satisfy: Pr[U(A(D), D, D_f) = O] ⇐ e^epsilon * Pr[A-bar(D\D_f, D, D_f) = O] + delta, and symmetrically in the other direction. This ensures that the unlearned model is statistically indistinguishable from one produced by an algorithm that never had access to the forget data.

The framework unifies several prior definitions. Setting A-bar = A (retrain from scratch) recovers the definitions of Ginart et al. (2019) and Guo et al. (2020). Setting A-bar = U(A(D\D_f), D\D_f, empty set) aligns with Allouah et al. (2025) and Sekhari et al. (2021). The definition naturally supports non-adaptive sequential unlearning, where data points are removed one-by-one without revisiting earlier removals.

The key advantage of this framework over exact unlearning (which requires (0,0)-unlearning) is computational efficiency: exact unlearning typically requires retraining from scratch, while certified approximate unlearning allows efficient post-hoc procedures that trade a small, controlled privacy leakage for significant computational savings.

Key Details

The parameters epsilon >= 0 and delta in [0,1] control the tightness of the guarantee; smaller values mean stronger privacy.
The certifying algorithm A-bar is purely theoretical and does not need to be computed in practice — only its existence is required.
The definition parallels differential privacy by treating the forget set D_f as “private” data and the retain set D\D_f as “public” data.
Common choices for achieving (epsilon, delta)-unlearning include output perturbation for unlearning, gradient clipping for unlearning, and model clipping for unlearning.

concept

Alethograph

Explorer

certified approximate unlearning

Key Details

Graph View

Backlinks