feature injection test

The feature injection test (FIT) is an evaluation metric for approximate data deletion methods that measures how well a deletion method removes a model’s “knowledge” of sensitive, highly predictive features present in the deleted data. Unlike L² parameter distance, which measures global similarity to the retrained model, FIT specifically tests whether localized correlations learned from the deleted subset have been forgotten.

The test works by: (1) appending an extra feature to the data that is 1 for all deleted points and 0 for all retained points, (2) training a model on this augmented dataset — the model learns a large weight w* on this injected feature, (3) applying the deletion method, (4) measuring the ratio θ^{approx}[d+1] / w* — closer to 0 means better deletion.

Key Details

Motivation: L² distance can be small while the model retains sensitive knowledge — FIT captures this gap
Metric: FIT = θ^{approx}[d+1] / w*, where 0 is perfect and 1 is no deletion
Key finding: PRU achieves near-zero FIT for large groups and sparse data, while influence functions fail
Interpretation: Captures the privacy-relevant question of whether the model still encodes group-specific information
Limitation: Only tests for one specific type of sensitive information (injected feature)
Applicable to both linear and logistic regression settings

concept

Alethograph

Explorer

feature injection test

Key Details

Graph View

Backlinks