Statistical indistinguishability is the formal guarantee that an unlearned model cannot be distinguished from a model retrained from scratch, analogous to differential privacy but applied to the unlearning setting. It is the gold standard for certified machine unlearning.
An unlearning mechanism M achieves (ε,δ)-statistical indistinguishability with respect to learning algorithm A if for all model sets M ⊆ H_Θ, datasets D ∈ D, and removal sets R ⊆ D with |R| = 1: Pr(M(A(D), D, R) ∈ M) ≤ e^ε Pr(A(D \ R) ∈ M) + δ, and symmetrically in the other direction.
Key Details
- ε-certified removal (Guo et al., 2020): Stronger version with δ = 0, requiring e^{-ε} ≤ P(M(…)∈T)/P(A(D\x)∈T) ≤ e^ε for all measurable T
- Relationship to DP: DP of learning algorithm A is sufficient but not necessary for certified removal
- Newton update mechanism: w⁻ = w* + H_{w*}⁻¹ Δ approximates retraining; gradient residual controls ε
- Loss perturbation: Adding b^T θ noise term (b ~ N(0, σI)^d) masks residual information, achieving (ε,δ)-CR
- Rényi unlearning: Alternative formulation using Rényi divergence, convertible to (ε,δ) via standard conversion
- Multiple removals: Gradient residual scales linearly with T single removals, quadratically with batch of m