A convex fairness regularizer is a penalty term added to the training loss that encourages group fairness (e.g., Equalized Odds) while maintaining convexity of the overall objective. Convexity is essential for provable unlearning guarantees, as certified removal mechanisms rely on unique optima and bounded gradient residuals.
The Equalized Odds regularizer from Berk et al. (2017), as used in fair machine unlearning, is defined as: L_fair(θ, D) = (1/(n_a n_b) Σ_{i∈G_a} Σ_{j∈G_b} 𝟙[y_i = y_j](⟨x_i, θ⟩ - ⟨x_j, θ⟩))², where G_a and G_b are the two demographic subgroups. This penalizes the squared difference in logits between subgroups conditioned on the same label, directly optimizing for Equalized Odds.
Key Details
- Full fair loss: L(θ, D) = L_BCE(θ, D) + γ · L_fair(θ, D), where γ controls the fairness-accuracy trade-off
- Non-decomposability: L_fair involves pairwise comparisons via Cartesian product N = G_a × G_b × G_a × G_b, breaking the sum-of-losses structure
- Equalized Odds variant: Conditions on label y via indicator 𝟙[y_i = y_j]
- Demographic Parity variant: L_{Demo.Par.}(θ, D) = (1/(n_a n_b) Σ_{i∈G_a} Σ_{j∈G_b} (⟨x_i, θ⟩ - ⟨x_j, θ⟩))² — removes label conditioning
- Equality of Opportunity variant: L_{Eq.Opp.}(θ, D) — conditions only on true positives (y=1)
- Convexity preserved: The regularizer is convex in θ since it is a squared linear function of θ