SCRUB+R (SCRUB and Rewind) is an extension of the SCRUB unlearning algorithm proposed in Kurmanji et al. (2023). It adds a “rewinding” procedure designed to calibrate the forget set error to be “just high enough” rather than maximally high, addressing the vulnerability of standard SCRUB to Membership Inference Attacks (MIAs) in user privacy applications.
The motivation arises from a subtle tension: in removing biases (RB) and resolving confusion (RC) applications, maximally high error on the forget set is desirable. However, for user privacy (UP), an uncharacteristically high error on deleted examples makes them identifiable to an attacker, creating vulnerability to MIAs. The ideal unlearning for UP produces forget set error matching that of a model retrained from scratch without the forget data — high enough to indicate the data was not seen, but not so high as to be suspicious.
SCRUB+R achieves this calibration through checkpoint selection. During SCRUB training, model checkpoints are saved at each epoch. A validation set is constructed to match the distribution of the forget set (e.g., if the forget set contains only class 0 examples, the validation set contains only class 0 examples from held-out data). The error on this validation set serves as a reference point for the desired forget error, since it approximates the error of a model that never trained on the forget data. The checkpoint whose forget set error is closest to this reference is selected as the final unlearned model.
Algorithm
- Train SCRUB as usual, saving a checkpoint at each epoch
- Construct a validation set matching the forget set’s class distribution
- At the end of training, measure the error on the constructed validation set (this is the target forget error)
- Select the checkpoint whose forget set error is closest to the validation set error
- Return this rewound checkpoint as the unlearned model
Key Properties
- Improved MIA defense: SCRUB+R successfully defends against both basic MIAs and the LiRA attack, performing comparably to the Retrain oracle
- Selective triggering: the rewinding is most useful for selective unlearning (small forget sets); for class-level unlearning, the forget set contains examples from only one class, so the remaining examples of that class in the retain set naturally calibrate the error
- No additional training cost: rewinding only requires evaluating saved checkpoints, not additional training
- Validation set construction: requires held-out data matching the forget set distribution, which may not always be available
- Empirically, the forget error of SCRUB+R closely matches that of the Retrain oracle across CIFAR-10 experiments