The intersection of algorithmic fairness and machine unlearning: how removing training data influences (unlearning) can be leveraged to mitigate model bias and achieve fairer outcomes, and conversely how unlearning procedures can be designed to maintain or improve fairness guarantees. Covers fairness-aware unlearning algorithms, debiasing via selective data removal, and the interplay between certified unlearning bounds and group-level disparity metrics.

Papers

Primary Sources

Foundational Works

Key Concepts

Two distinct approaches emerge:

  1. Fairness-preserving unlearning (fair unlearning): Modifying the unlearning algorithm to maintain fairness during data removal. Core challenge: convex fairness regularizers create non-decomposable objectives that break standard Newton-step unlearning.

  2. Unlearning-based debiasing (counterfactual debiasing): Using unlearning to remove learned biases by identifying harmful samples via influence on bias and selectively forgetting them.

Both build on the Newton update removal mechanism and statistical indistinguishability framework, while the Chien et al. papers provide an alternative via Langevin unlearning and Rényi unlearning.

Open Questions

  • Can Langevin/PNSGD unlearning extend to fairness-constrained objectives?
  • How do fairness guarantees compose across sequential unlearning requests?
  • Extension of fair unlearning beyond convex models to deep networks
  • Theoretical characterization of the fairness-accuracy-unlearning Pareto frontier

topic