Summary

This paper establishes the first formal connection between segmentation mask refinement and certified machine unlearning. The core insight is that correcting a coarse (dilated) segmentation mask to a finer one is set-theoretically isomorphic to “forgetting” the spurious pixels introduced by dilation — the dilation artefacts that cause models to learn shortcuts from background features (e.g., surgical rulers, ink markings, gel bubbles in dermoscopy images).

Building on this isomorphism, the authors define Certified Pixel-Level Unlearning, which projects standard (epsilon, delta)-indistinguishability onto the conditional probability space of pixel-wise predictions. They introduce Global Spurious Mutual Information (S_global) as a rigorous metric that captures worst-case information leakage from spurious features into predictions, and prove that a certified unlearning operator strictly upper-bounds S_global up to additive certification error O(epsilon) + O(delta log(1/delta)).

Empirically, the framework is validated on a synthetic dataset and the ISIC 2018 melanoma detection benchmark, using gradient clipping and model clipping from Koloskova et al. (2025) alongside NegGrad-Seg. Results show that unlearning with only 10% fine-grained labels matches or exceeds retrain-from-scratch performance, while being far more sample-efficient.

Key Contributions

  • Unlearning Isomorphism: Proves correcting dilated masks is set-theoretically isomorphic to forgetting specific training samples (the dilation artefacts), enabling direct application of certified unlearning theory to segmentation refinement
  • Certified Pixel-Level Unlearning: New definition adapting (epsilon, delta)-indistinguishability to the conditional pixel-wise output space, preventing vacuous solutions common in standard definitions
  • Global Spurious Mutual Information (S_global): Task-relevant metric quantifying worst-case shortcut reliance; proven to be strictly upper-bounded by certified unlearning
  • Formal guarantees without unrealistic assumptions: Unlike Saab et al. (2022), does not require conditional independence of spurious features given the mask (shown to be violated in Appendix B)
  • Sample efficiency: Achieves competitive debiasing with only 10% fine-grained annotations

Methodology

The pipeline operates in three stages:

  1. Pre-training: Train segmentation model on coarse (bounding box) annotations
  2. Unlearning: Apply certified unlearning operators to “forget” the dilation artefacts (pixels incorrectly labelled as foreground by coarse masks), using a small set of fine-grained masks to define the forget set D_f
  3. Evaluation: Measure downstream classification performance via average pixel pooling

Two unlearning algorithms are adapted:

Transfer learning (public frozen encoder + small trainable decoder) reduces the parameter space requiring certification.

Key Findings

  • Models trained on coarse masks exhibit strong shortcut dependence (AUROC drops on images with spurious features)
  • Unlearning with 10% fine-grained labels yields best improvements across most spurious feature categories
  • Two-phase unlearning trajectory: initial performance degradation (penalising shortcut weights) followed by rapid recovery (relearning from retain set)
  • Certified operators (Koloskova et al.) achieve stable improvements with low variance (std 0.022) vs. retrain-from-scratch (std = 0.042)
  • NegGrad-Seg, while not certified, does not destroy model utility and induces shortcut unlearning
  • Model collapse observed when trying to certify full-scale models (melanoma detection) — a key limitation motivating future work on robust certified unlearning for non-convex losses

Important References

  1. Certified Unlearning for Neural Networks — provides the (epsilon, delta)-certified unlearning algorithms (model clipping, gradient clipping) used in this work
  2. Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity — prior work on spatial specificity for shortcut reduction; this paper relaxes its conditional independence assumption
  3. Remember What You Want to Forget Algorithms for Machine Unlearning — foundational definition of certified unlearning that this work extends to pixel-level

Atomic Notes


paper