debiasing strategies for skin lesion analysis

Debiasing strategies for skin lesion analysis encompass methods aimed at preventing deep neural networks from exploiting spurious correlations between visual artefacts and diagnostic labels in dermoscopy images. Bissoto et al. (2020) provided the most systematic analysis of this problem, identifying 7 artefact types and evaluating the state-of-the-art debiasing method Learning Not To Learn (LNTL).

The 7 Artefact Types

Dark corners (vignetting) — shadows at image edges from the dermoscope
Hair — patient hair overlying the lesion
Gel borders — edges of contact gel used in dermoscopy
Gel bubbles — air bubbles in the contact gel
Rulers — measurement rulers placed next to lesions
Ink markings/staining — surgical markings or dye on the skin
Patches — adhesive patches applied to the skin

Debiasing Approaches

Data-level approaches:

Normalized background: Replace background pixels with pixel-average training image to eliminate background information (Bissoto et al., 2020)
Trap sets: Construct train/test splits with amplified and reversed artefact-label correlations to test bias reliance
Data augmentation: Add or remove artefacts during training to reduce their predictive value

Model-level approaches:

Learning Not To Learn (LNTL) (Kim et al., 2019): Feature extractor with reversed gradients from bias classification heads — found insufficient by Bissoto et al. for entangled artefacts
Domain confusion loss (Alvi et al., 2018): Classifier per known bias domain with confusion loss to discourage bias encoding
Spatial specificity (Saab et al., 2022): Use finer spatial annotations to exclude background artefacts from training signal
segmentation-for-classification (Hooper et al., 2023): Train segmentation networks that inherently focus on pathology regions
Certified unlearning (Towards Certified Shortcut Unlearning in Medical Imaging): Formally “forget” spurious associations with provable guarantees

Key Insights

Individual artefact-label correlations are weak, but models exploit cumulative weak correlations
Networks detect artefacts with 80-98% AUC even on heavily occluded images, indicating deep feature-level encoding
Background removal alone is insufficient because artefacts overlap with foreground (hair, ink on lesion)
Gradient-reversal approaches (LNTL) fail because artefact features are entangled with diagnostic features in learned representations
The most promising direction is combining spatial constraints (segmentation) with formal guarantees (certified unlearning)

concept

Alethograph

Explorer

debiasing strategies for skin lesion analysis

The 7 Artefact Types

Debiasing Approaches

Key Insights

Graph View

Table of Contents

Backlinks