NegGrad-Seg is an adaptation of the NegGrad+ algorithm from Kurmanji et al. (2023) for pixel-level segmentation unlearning, introduced in Towards Certified Shortcut Unlearning in Medical Imaging.
In the original NegGrad+, the model fine-tunes on the retain set D_r while performing gradient ascent on the forget set D_f. NegGrad-Seg translates this to the segmentation setting by reweighting the loss function to account for non-overlapping pixel regions across fine- and coarse-grained masks.
Algorithm
- Define the forget set D_f at the pixel level: pixels labelled as foreground by the coarse mask Y^(r1) but background by the fine mask Y^(r2) (the dilation artefacts)
- Define the retain set D_r: pixels with consistent labels across both mask granularities (D^(r2))
- Reweight the loss: set the weight for non-overlapping regions (forget set) to w_{D_f} = 1, inducing gradient ascent on those pixels
- Fine-tune the pre-trained model using this reweighted loss
Key Properties
- Not certified: NegGrad-Seg does not provide (epsilon, delta)-indistinguishability guarantees
- Does not destroy model utility: Unlike certified operators that can cause model collapse, NegGrad-Seg preserves learned representations while inducing shortcut unlearning
- Setting w_{D_f} = 1 reduces to standard fine-tuning on the retain set only
- Draws on the concept of catastrophic forgetting from transfer learning literature — deliberately inducing forgetting of the spurious associations
- Empirically shows consistent improvement over initial bounding-box training across both binary and multi-class segmentation