Abstract
A common failure mode of neural networks trained to classify abnormalities in medical images is their reliance on spurious features, which are features that are associated with the class label but are non-generalizable. In this work, we examine if supervising models with increased spatial specificity (i.e., information about the location of the abnormality) impacts model reliance on spurious features. We first propose a data model of spurious features and theoretically analyze the impact of increasing spatial specificity. We find that two properties of the data are impacted when we increase spatial specificity: the variance of the positively-labeled input pixels decreases and the mutual information between abnormal and spurious pixels decreases, both of which contribute to improved model robustness to spurious features. We empirically examine the impact of supervising models with increased spatial specificity on two medical image datasets known to have spurious features: pneumothorax classification on chest x-rays and melanoma classification from dermoscopic images. We find that while models supervised with binary labels have near-random robust performance (robust AUROC of 0.46), increasing spatial specificity to bounding box detection and image segmentation achieves a robust AUROC of 0.72 and 0.82, respectively, on the pneumothorax classification task.
Summary
This paper introduces the concept of spatial specificity as a principled framework for understanding and mitigating spurious feature reliance in medical image classification. The core idea is that the granularity of spatial annotations used during training directly controls the degree to which a model can exploit background (spurious) features. Image-level labels provide no spatial constraint, allowing the model to use any pixel in the image; bounding boxes restrict attention to a region around the pathology; and pixel-level segmentation masks provide the tightest constraint, limiting the model to the pathology itself.
The authors formalise this intuition using a generative data model in which spurious feature masks and true pathology masks are drawn from a joint distribution conditioned on the global disease label. Under this model, they prove that as the spatial annotation becomes finer, the mutual information between spurious features and the predicted label decreases. This theoretical result assumes conditional independence of the spurious feature given the annotation mask — an assumption later relaxed by Towards Certified Shortcut Unlearning in Medical Imaging.
Empirically, the authors validate their framework on chest X-ray classification (pneumothorax detection, where chest tubes are a known spurious correlate) and dermatology (melanoma detection using ISIC, where background artefacts such as rulers, ink markings, and dark corners correlate with diagnosis). Results show consistent improvements in robust AUROC when moving from coarser to finer spatial annotations, with the largest gains appearing in subgroups that do not exhibit the spurious correlation.
Key Contributions
- Introduces the spatial specificity framework: a hierarchy of annotation granularity (image-level, bounding box, segmentation mask) that systematically reduces spurious feature reliance
- Provides theoretical analysis showing that finer spatial annotations reduce mutual information between spurious features and model predictions
- Proposes robust AUROC as an evaluation metric that measures model performance specifically on subgroups lacking the spurious correlation
- Demonstrates empirical gains on chest X-ray (CANDID pneumothorax) and dermatology (ISIC melanoma) benchmarks
Methodology
- Generative data model with separate spurious feature mask S and pathology mask Y, linked through global label y and spurious label s, drawn from joint distribution P_{y,s}
- Spatial specificity parameter r controls annotation granularity: r=1 is binary labels, r=n is full segmentation
- Two theoretical measures: (1) variance measure — variance of positively-labelled input pixels decreases with r; (2) MI measure — mutual information I(S_ij; Y_ij | Y_tilde_ij = 1) decreases with r
- Two assumptions: non-overlapping spurious and abnormal regions, and conditional independence S_ij independent of Y_tilde_ij given Y_ij
- Semi-supervised and contrastive learning methods (Seg-SSL, Seg-SPL) explored to reduce annotation costs
- Evaluated on SIIM-ACR Pneumothorax (12,047 CXRs) and ISIC 2018 melanoma (2,594 dermoscopic images with Bissoto et al. trap splits)
Key Findings
- CXR: Binary-ERM robust AUROC 0.46, BBox-ERM 0.72, Segmentation-ERM 0.82 (36 robust AUROC points improvement)
- ISIC: Binary-ERM robust AUROC 0.36, Segmentation-ERM 0.73 (37 points improvement)
- Spurious features with highest spatial overlap with lesions (hair, gel bubbles) show largest performance gaps
- Bounding boxes do not help much on ISIC because they cover ~50% of the image, providing insufficient spatial specificity
- Semi-supervised (Seg-SSL) and supervised pixel-level similarity (Seg-SPL) each provide ~5 point robust AUROC lift in low-data regimes
- As few as 300 segmentation masks still improve robust AUROC by 20 points over binary training
- The conditional independence assumption (S_ij independent of Y_tilde_ij | Y_ij) enables clean theory but is shown to be violated in practice by Towards Certified Shortcut Unlearning in Medical Imaging
Important References
- A Case for Reframing Automated Medical Image Classification as Segmentation — extends this work’s ideas into a general segmentation-for-classification framework
- Debiasing Skin Lesion Datasets and Models Not So Fast — characterises the artefact-bias problem in skin lesion datasets that motivates this work
- Towards Certified Shortcut Unlearning in Medical Imaging — builds on spatial specificity by connecting it to certified machine unlearning with formal guarantees