The feature injection test (FIT) is an evaluation metric for approximate data deletion methods that measures how well a deletion method removes a model’s “knowledge” of sensitive, highly predictive features present in the deleted data. Unlike L² parameter distance, which measures global similarity to the retrained model, FIT specifically tests whether localized correlations learned from the deleted subset have been forgotten.
The test works by: (1) appending an extra feature to the data that is 1 for all deleted points and 0 for all retained points, (2) training a model on this augmented dataset — the model learns a large weight w* on this injected feature, (3) applying the deletion method, (4) measuring the ratio θ^{approx}[d+1] / w* — closer to 0 means better deletion.
Key Details
- Motivation: L² distance can be small while the model retains sensitive knowledge — FIT captures this gap
- Metric: FIT = θ^{approx}[d+1] / w*, where 0 is perfect and 1 is no deletion
- Key finding: PRU achieves near-zero FIT for large groups and sparse data, while influence functions fail
- Interpretation: Captures the privacy-relevant question of whether the model still encodes group-specific information
- Limitation: Only tests for one specific type of sensitive information (injected feature)
- Applicable to both linear and logistic regression settings