Application of machine unlearning to image segmentation tasks, particularly in medical imaging. Addresses the problem of models learning spurious correlations (shortcuts) from coarse segmentation masks by formally connecting mask refinement to the “forgetting” of dilation artefacts. Extends certified unlearning from classification to pixel-level predictions, enabling provable reduction of shortcut learning without expensive fine-grained annotations.
Papers Analyzed
Key Concepts and Connections
The central theoretical chain is:
- spatial specificity (Saab 2022): Finer annotations reduce I(S;Y|Y_tilde) — but requires expensive masks
- unlearning isomorphism (main paper): Mask refinement = forgetting dilation artefacts — no new annotations needed beyond a small retain set
- certified pixel-level unlearning (main paper): Projects (epsilon,delta)-indistinguishability to pixel-wise conditional output space
- global spurious mutual information (main paper): Certified unlearning provably upper-bounds this metric, formally guaranteeing shortcut reduction
The certified unlearning algorithms form a progression:
- Newton-step unlearning (Sekhari 2021): Exact Hessian-based, convex losses only
- randomized gradient smoothing (Zhang Z. 2022): Hessian-free via noise smoothing
- local convex approximation for certified unlearning (Zhang B. 2024): Extends to non-convex DNNs via l2 regularization
- gradient clipping for unlearning / model clipping for unlearning (Koloskova 2025): Handles non-convex losses with DP-style privacy amplification
- PNSGD for certified unlearning (Chien 2024): Unified SGD-based approach with sequential unlearning
Open Questions
- Model collapse: Certified operators can collapse large models (observed on melanoma detection). Can more robust certified algorithms avoid this?
- Assumption relaxation: The disjoint support assumption (spurious features don’t overlap with pathology) fails for some artefacts (e.g., hair overlapping with lesions)
- Multi-class scaling: Performance degrades in the 3-class (background/benign/malignant) setting with aggressive unlearning (90%)