A Case for Reframing Automated Medical Image Classification as Segmentation

Abstract

Image classification and segmentation are common applications of deep learning to radiology. While many tasks can be framed using either classification or segmentation, classification has historically been cheaper to label and more widely used. However, recent work has drastically reduced the cost of training segmentation networks. In light of this recent work, we reexamine the choice of training classification vs. segmentation models. First, we use an information theoretic approach to analyze why segmentation vs. classification models may achieve different performances on the same dataset and task. We then implement methods for using segmentation models to classify medical images, which we call segmentation-for-classification, and compare these methods against traditional classification on three retrospective datasets (n=2,018-19,237). We use our analysis and experiments to summarize the benefits of using segmentation-for-classification, including: improved sample efficiency, enabling improved performance with fewer labeled images (up to an order of magnitude lower), on low-prevalence classes, and on certain rare subgroups (up to 161.1% improved recall); improved robustness to spurious correlations (up to 44.8% improved robust AUROC); and improved model interpretability, evaluation, and error analysis.

Summary

This paper makes a systematic case for reframing medical image classification tasks as segmentation problems. The authors argue that the conventional preference for classification over segmentation was driven by the historically higher labelling cost of segmentation masks. With recent advances in label-efficient training (self-supervised learning, foundation models like Segment Anything, semi-supervised methods), this cost differential is shrinking, making it worthwhile to reconsider the classification-vs-segmentation trade-off.

The theoretical analysis uses an information-theoretic framework. The authors show via KL divergence bounds (Proposition 1) that pixel-level supervision in segmentation creates more separable class distributions than image-level supervision in classification. Because segmentation’s positive and negative class distributions are defined at the pixel level, they share fewer features, making the discrimination function easier to learn. This also explains why segmentation is more robust to spurious correlations: background features are less correlated with pixel-level labels than with image-level labels, so the model has less incentive to exploit them.

The experimental evaluation spans three medical imaging datasets: CANDID (chest X-ray pneumothorax detection, n=19,237), ISIC (skin lesion melanoma classification, n=2,750), and SPINE (cervical spine fracture detection from CT, n=2,018). The authors implement multiple “summarizing functions” to convert segmentation masks into classification labels, ranging from simple threshold-based rules to trained neural network classifiers. Results consistently show that even the simplest rule-based summarizing function (threshold + pixel count) outperforms traditional classification, with benefits magnified in limited-data regimes, on rare subgroups, and in the presence of spurious correlations.

Key Contributions

Information-theoretic analysis: Proves via KL divergence that segmentation supervision creates more separable class distributions than classification, with the benefit scaling inversely with target size (Proposition 1)
segmentation-for-classification framework: Defines and evaluates multiple methods for converting segmentation outputs to classification labels via “summarizing functions” g(.)
Comprehensive empirical evaluation: Demonstrates benefits across three medical imaging datasets in both limited and abundant data regimes
Semi-supervised segmentation-for-classification: Shows that existing semi-supervised segmentation methods (including a novel “boosted” variant using classification labels) directly improve classification performance
Robustness to spurious correlations: Segmentation-for-classification achieves AUROC of 0.84 vs. 0.58 for classification on the pneumothorax-without-chest-tube subgroup

Methodology

Summarizing functions: Rule-based (threshold + pixel count) and trained (FC layer, pooling + FC, full classification network on masks or embeddings)
Datasets: CANDID (pneumothorax), ISIC (melanoma), SPINE (cervical fracture)
Training: Y-Net architecture for joint segmentation and classification backbone; standard supervised classification as baseline; multitask learning as additional baseline
Evaluation: Mean AUROC, robust AUROC on spurious-correlation-absent subgroups, recall on rare subgroups, location-conditioned recall

Key Findings

Simple rule-based segmentation-for-classification outperforms both classification and multitask networks across all datasets and data regimes
Performance benefits are largest with limited training data (10% labelled data): 10.0%, 6.9%, and 2.8% improvement in mean AUROC for CANDID, ISIC, and SPINE respectively
Segmentation-for-classification with only 10% segmentation labels exceeds classification trained on 100% image labels
Smaller targets yield larger benefits: SPINE (0.6% target-to-image ratio) gets the largest boost; ISIC (26.3% ratio) gets the smallest
Location robustness: classification recall drops up to 22.5% across lung regions, while segmentation-for-classification drops at most 7.5%
Semi-supervised training further improves segmentation-for-classification, with the “boosted” variant (using classification labels to filter pseudo-masks) performing best
Higher per-image labelling cost of segmentation is the primary trade-off

Important References

Reducing Reliance on Spurious Features in Medical Image Classification with Spatial Specificity — prior work by overlapping authors establishing spatial specificity framework
Debiasing Skin Lesion Datasets and Models Not So Fast — characterises artefact biases in skin lesion datasets used in evaluation
Towards Certified Shortcut Unlearning in Medical Imaging — extends segmentation-for-classification ideas to certified unlearning

Atomic Notes

segmentation-for-classification

paper

Alethograph

Explorer

A Case for Reframing Automated Medical Image Classification as Segmentation

Summary

Key Contributions

Methodology

Key Findings

Important References

Atomic Notes

Graph View

Table of Contents

Backlinks