The feature injection test (FIT) is an evaluation metric for approximate data deletion methods that measures how well a deletion method removes a model’s “knowledge” of sensitive, highly predictive features present in the deleted data. Unlike L² parameter distance, which measures global similarity to the retrained model, FIT specifically tests whether localized correlations learned from the deleted subset have been forgotten.

The test works by: (1) appending an extra feature to the data that is 1 for all deleted points and 0 for all retained points, (2) training a model on this augmented dataset — the model learns a large weight w* on this injected feature, (3) applying the deletion method, (4) measuring the ratio θ^{approx}[d+1] / w* — closer to 0 means better deletion.

Key Details

  • Motivation: L² distance can be small while the model retains sensitive knowledge — FIT captures this gap
  • Metric: FIT = θ^{approx}[d+1] / w*, where 0 is perfect and 1 is no deletion
  • Key finding: PRU achieves near-zero FIT for large groups and sparse data, while influence functions fail
  • Interpretation: Captures the privacy-relevant question of whether the model still encodes group-specific information
  • Limitation: Only tests for one specific type of sensitive information (injected feature)
  • Applicable to both linear and logistic regression settings

concept