The intersection of algorithmic fairness and machine unlearning: how removing training data influences (unlearning) can be leveraged to mitigate model bias and achieve fairer outcomes, and conversely how unlearning procedures can be designed to maintain or improve fairness guarantees. Covers fairness-aware unlearning algorithms, debiasing via selective data removal, and the interplay between certified unlearning bounds and group-level disparity metrics.
Papers
Primary Sources
- Langevin Unlearning — Chien et al. (NeurIPS 2024 Spotlight). Unifies DP training and certified unlearning via Langevin dynamics. First non-convex unlearning guarantees.
- Certified Machine Unlearning via Noisy Stochastic Gradient Descent — Chien et al. (NeurIPS 2024). First mini-batch PNSGD unlearning guarantees via W∞ distance tracking.
- Fair Machine Unlearning — Oesterling et al. (AISTATS 2024). First fair unlearning method with certified guarantees for Equalized Odds objectives.
- Fast Model Debias with Machine Unlearning — Chen et al. (NeurIPS 2023). Debiasing via counterfactual bias identification, influence ranking, and Newton-step unlearning.
Foundational Works
- Towards Making Systems Forget with Machine Unlearning — Cao & Yang (IEEE S&P 2015). Introduces machine unlearning via summation form.
- Certified Data Removal from Machine Learning Models — Guo et al. (ICML 2020). Newton update mechanism and (ε,δ)-certified removal.
- Approximate Data Deletion from Machine Learning Models — Izzo et al. (AISTATS 2021). Projective residual update with O(k²d) cost.
Key Concepts
Two distinct approaches emerge:
-
Fairness-preserving unlearning (fair unlearning): Modifying the unlearning algorithm to maintain fairness during data removal. Core challenge: convex fairness regularizers create non-decomposable objectives that break standard Newton-step unlearning.
-
Unlearning-based debiasing (counterfactual debiasing): Using unlearning to remove learned biases by identifying harmful samples via influence on bias and selectively forgetting them.
Both build on the Newton update removal mechanism and statistical indistinguishability framework, while the Chien et al. papers provide an alternative via Langevin unlearning and Rényi unlearning.
Open Questions
- Can Langevin/PNSGD unlearning extend to fairness-constrained objectives?
- How do fairness guarantees compose across sequential unlearning requests?
- Extension of fair unlearning beyond convex models to deep networks
- Theoretical characterization of the fairness-accuracy-unlearning Pareto frontier