Rényi unlearning is a formal definition of approximate machine unlearning that uses Rényi divergence to measure the statistical indistinguishability between the unlearned model and a model retrained from scratch. It provides a natural bridge between differential privacy and certified unlearning.
Given a randomized learning algorithm M and unlearning algorithm U, the pair (M, U) achieves (α, ε)-Rényi Unlearning if for any α > 1 and any adjacent datasets D, D’, the α Rényi difference d_α(ρ, ν’) ≤ ε, where U(M(D), D’) ~ ρ and M(D’) ~ ν’. The Rényi difference is defined as d_α(ν, ν’) = max(D_α(ν||ν’), D_α(ν’||ν)), where D_α is the α-Rényi divergence.
Key Details
- Rényi divergence: D_α(ν||ν’) = (1/(α-1)) log E_{x~ν’}[(ν(x)/ν’(x))^α]
- Rényi difference: d_α(ν, ν’) = max(D_α(ν||ν’), D_α(ν’||ν)) — symmetrized version
- Conversion to (ε,δ)-DP: Standard RDP-to-DP conversion applies, giving (ε,δ)-unlearning
- Weak triangle inequality: d_α(P,R) ≤ (α-1/2)/(α-1) · (d_{2α}(P,Q) + d_{2α}(Q,R)) — doubles the order α each application
- Advantage over (ε,δ): Tighter composition, more natural for iterative algorithms
- Adjacent datasets D, D’ differ by replacing (not just removing) one data point