Summary

This paper makes a systematic case for reframing medical image classification tasks as segmentation problems. The authors argue that the conventional preference for classification over segmentation was driven by the historically higher labelling cost of segmentation masks. With recent advances in label-efficient training (self-supervised learning, foundation models like Segment Anything, semi-supervised methods), this cost differential is shrinking, making it worthwhile to reconsider the classification-vs-segmentation trade-off.

The theoretical analysis uses an information-theoretic framework. The authors show via KL divergence bounds (Proposition 1) that pixel-level supervision in segmentation creates more separable class distributions than image-level supervision in classification. Because segmentation’s positive and negative class distributions are defined at the pixel level, they share fewer features, making the discrimination function easier to learn. This also explains why segmentation is more robust to spurious correlations: background features are less correlated with pixel-level labels than with image-level labels, so the model has less incentive to exploit them.

The experimental evaluation spans three medical imaging datasets: CANDID (chest X-ray pneumothorax detection, n=19,237), ISIC (skin lesion melanoma classification, n=2,750), and SPINE (cervical spine fracture detection from CT, n=2,018). The authors implement multiple “summarizing functions” to convert segmentation masks into classification labels, ranging from simple threshold-based rules to trained neural network classifiers. Results consistently show that even the simplest rule-based summarizing function (threshold + pixel count) outperforms traditional classification, with benefits magnified in limited-data regimes, on rare subgroups, and in the presence of spurious correlations.

Key Contributions

  • Information-theoretic analysis: Proves via KL divergence that segmentation supervision creates more separable class distributions than classification, with the benefit scaling inversely with target size (Proposition 1)
  • segmentation-for-classification framework: Defines and evaluates multiple methods for converting segmentation outputs to classification labels via “summarizing functions” g(.)
  • Comprehensive empirical evaluation: Demonstrates benefits across three medical imaging datasets in both limited and abundant data regimes
  • Semi-supervised segmentation-for-classification: Shows that existing semi-supervised segmentation methods (including a novel “boosted” variant using classification labels) directly improve classification performance
  • Robustness to spurious correlations: Segmentation-for-classification achieves AUROC of 0.84 vs. 0.58 for classification on the pneumothorax-without-chest-tube subgroup

Methodology

  • Summarizing functions: Rule-based (threshold + pixel count) and trained (FC layer, pooling + FC, full classification network on masks or embeddings)
  • Datasets: CANDID (pneumothorax), ISIC (melanoma), SPINE (cervical fracture)
  • Training: Y-Net architecture for joint segmentation and classification backbone; standard supervised classification as baseline; multitask learning as additional baseline
  • Evaluation: Mean AUROC, robust AUROC on spurious-correlation-absent subgroups, recall on rare subgroups, location-conditioned recall

Key Findings

  • Simple rule-based segmentation-for-classification outperforms both classification and multitask networks across all datasets and data regimes
  • Performance benefits are largest with limited training data (10% labelled data): 10.0%, 6.9%, and 2.8% improvement in mean AUROC for CANDID, ISIC, and SPINE respectively
  • Segmentation-for-classification with only 10% segmentation labels exceeds classification trained on 100% image labels
  • Smaller targets yield larger benefits: SPINE (0.6% target-to-image ratio) gets the largest boost; ISIC (26.3% ratio) gets the smallest
  • Location robustness: classification recall drops up to 22.5% across lung regions, while segmentation-for-classification drops at most 7.5%
  • Semi-supervised training further improves segmentation-for-classification, with the “boosted” variant (using classification labels to filter pseudo-masks) performing best
  • Higher per-image labelling cost of segmentation is the primary trade-off

Important References

Atomic Notes


paper