NegGrad-Seg

NegGrad-Seg is an adaptation of the NegGrad+ algorithm from Kurmanji et al. (2023) for pixel-level segmentation unlearning, introduced in Towards Certified Shortcut Unlearning in Medical Imaging.

In the original NegGrad+, the model fine-tunes on the retain set D_r while performing gradient ascent on the forget set D_f. NegGrad-Seg translates this to the segmentation setting by reweighting the loss function to account for non-overlapping pixel regions across fine- and coarse-grained masks.

Algorithm

Define the forget set D_f at the pixel level: pixels labelled as foreground by the coarse mask Y^(r1) but background by the fine mask Y^(r2) (the dilation artefacts)
Define the retain set D_r: pixels with consistent labels across both mask granularities (D^(r2))
Reweight the loss: set the weight for non-overlapping regions (forget set) to w_{D_f} = 1, inducing gradient ascent on those pixels
Fine-tune the pre-trained model using this reweighted loss

Key Properties

Not certified: NegGrad-Seg does not provide (epsilon, delta)-indistinguishability guarantees
Does not destroy model utility: Unlike certified operators that can cause model collapse, NegGrad-Seg preserves learned representations while inducing shortcut unlearning
Setting w_{D_f} = 1 reduces to standard fine-tuning on the retain set only
Draws on the concept of catastrophic forgetting from transfer learning literature — deliberately inducing forgetting of the spurious associations
Empirically shows consistent improvement over initial bounding-box training across both binary and multi-class segmentation

method

Alethograph

Explorer

NegGrad-Seg

Algorithm

Key Properties

Graph View

Table of Contents

Backlinks