Global Spurious Mutual Information (S_global) is a metric introduced in Towards Certified Shortcut Unlearning in Medical Imaging that quantifies the worst-case information leakage from spurious features (e.g., surgical markers, rulers, ink markings) into model predictions within segmentation tasks.

The standard conditional mutual information I(S; Y | Y_hat) between spurious features S and pathology Y given predictions Y_hat is inadequate for segmentation because the vast majority of pixels are background, causing the metric to be dominated by trivially independent empty regions. S_global addresses this by explicitly targeting information leakage within predicted foreground regions.

Formally, for a random mask variable Z:

S_global(Z) := max_{k in {1,…,K}} sum_{i,j} P(Z_ij = k) * I(S_ij, Y_ij | Z_ij = k)

where the maximum is taken over non-background classes, weighted by the probability of predicting each class at each location.

Key Details

  • Captures worst-case leakage across non-background classes, reflecting the clinical requirement that pathology detection be independent of spurious correlations
  • Minimising S_global formally corresponds to shortcut removal from positive predictions
  • Certified Reduction Guarantee (Corollary 3.7): For an (epsilon, delta)-certified unlearning operator U, |S_global(U[Y^(r1)]) - S_global(Y^(r2))| O(epsilon) + O(delta log(1/delta))
  • The bound shows the unlearned model’s shortcut reliance tracks that of the ideal fine-mask model up to additive certification error
  • Empirically validated via Conditional Mutual Information (CMI) evolution plots showing sharp CMI drop during unlearning

concept