Fair unlearning is the problem of efficiently removing training data from a model while simultaneously maintaining or improving fairness guarantees. It addresses the intersection of two regulatory requirements: the right to be forgotten (data deletion) and protection against algorithmic discrimination.

The key challenge is that standard certified unlearning methods (Newton step, influence functions) require the loss function to decompose as a sum of independent per-sample terms: L(θ) = (1/n) Σ ℓ(θ, x_i). However, fairness objectives involve comparisons between samples from different demographic groups, creating cross-sample dependencies that break this decomposition. For example, the Equalized Odds regularizer L_fair involves pairwise differences between logits of samples from groups G_a and G_b, making each sample’s influence entangled with all samples in the opposite group.

Key Details

  • Incompatibility: Existing unlearning methods (Guo et al., Neel et al., Izzo et al.) cannot handle non-decomposable fairness losses
  • Fair loss: L(θ, D) = L_BCE(θ, D) + γ · L_fair(θ, D), where L_fair penalizes group disparity
  • Modified Newton step: Δ includes terms from both ERM gradient and fairness gradient, accounting for cross-sample interactions via the Cartesian product set N = G_a × G_b × G_a × G_b
  • Fairness guarantee: AEOD(θ⁻_{D’}) ≤ AEOD(θ_{D’}) + c(1 - f(κ)), where κ = ||θ{D’} - θ⁻{D’}||₂ → 0 as n → ∞
  • Pareto frontier: Accuracy, fairness (γ), and unlearning strength (σ) form a three-way trade-off
  • Worst case: Deletion requests concentrated in minority subgroups can destroy fairness even at small deletion rates

concept