This repository contains the code for creating and analyzing ISIC MultiAnnot++, a large public multi-annotator skin lesion segmentation dataset for images from the ISIC Archive. The final dataset contains 17,684 segmentation masks spanning 14,967 dermoscopic images, where 2,394 dermoscopic images have 2-5 segmentations per image, making it the largest publicly available skin lesion segmentation (SLS) dataset.
The segmentations contain metadata corresponding to the annotators' skill levels as well as the tool used to perform the segmentation, enabling several kinds of research including, but not limited to, annotator-specific preference modeling for segmentation and annotator metadata analysis.
- Total Images: 14,967 dermoscopic images
- Total Segmentations: 17,684 segmentation masks
- Multi-annotator Images: 2,394 images with 2-5 segmentations per image
- Annotator Metadata: Skill levels (expert/novice) and segmentation tools
- Source: ISIC Archive
IMAplusplus/
├── dataset_creation/ # Dataset creation and preprocessing
│ ├── create_dataset.py # Creates dataset metadata and anonymizes annotators
│ ├── move_dataset.py # Copies images and masks to target directory
│ ├── constants.py # Tool and skill level mappings
│ └── config.yaml # Configuration for dataset creation
│
├── dataset_analysis/ # Dataset quality assurance and visualization
│ ├── mask_qa.py # Validates masks for quality issues
│ ├── other_datasets_overlap.py # Visualizes overlap with other datasets
│ ├── imaplusplus_annotator_overlap.py # Visualizes annotator overlap
│ └── config.yaml # Configuration for analysis
│
├── multiannotator_analysis/ # Multi-annotator analysis and metrics
│ ├── create_multiannotator_subset.py # Creates subset with multiple annotations
│ ├── create_consensus_masks.py # Generates STAPLE and majority voting masks
│ ├── compute_IAA_metrics.py # Computes inter-annotator agreement metrics
│ ├── compute_image_level_metrics.py # Aggregates metrics per image
│ ├── visualization_scripts/ # Scripts for generating visualizations
│ └── config.yaml # Configuration for analysis
│
├── utils/ # Utility functions
│ ├── data.py # Data loading utilities
│ ├── metrics.py # Metric computation functions
│ └── md5.py # MD5 hash utilities
│
├── output/ # Generated outputs
│ ├── metadata/ # CSV metadata files
│ ├── seg_masks/ # Segmentation masks
│ └── visualizations/ # Generated plots and figures
│
└── overall_script.sh # Main pipeline script
The codebase requires Python 3.x and the following packages:
pip install pandas numpy scikit-image SimpleITK medpy omegaconf loguru tqdm matplotlib upsetplotKey dependencies:
- pandas: Data manipulation and CSV handling
- numpy: Numerical operations
- scikit-image: Image processing and mask operations
- SimpleITK: STAPLE consensus mask computation
- medpy: Medical image metrics (Dice, Jaccard, Hausdorff distance)
- omegaconf: Configuration management
- loguru: Logging
- tqdm: Progress bars
- matplotlib: Plotting
- upsetplot: UpSet plots for set visualization
Run the complete pipeline using the provided script:
bash overall_script.shThis script executes the following steps in order:
-
Dataset Creation (
dataset_creation/)- Creates anonymized metadata from raw ISIC data
- Maps annotators to anonymized IDs (A00, A01, ...)
- Maps tools and skill levels to standardized codes
- Calculates MD5 hashes for masks
- Generates standardized filenames
-
Dataset Analysis (
dataset_analysis/)- Performs quality assurance on masks
- Validates for missing/corrupted files, empty masks, etc.
- Visualizes overlap with other datasets (ISIC 2016-2019, HAM10000, PH2, etc.)
- Visualizes annotator overlap patterns
-
Multi-annotator Analysis (
multiannotator_analysis/)- Creates subset of images with multiple annotations
- Generates consensus masks (STAPLE and majority voting)
- Computes inter-annotator agreement (IAA) metrics
You can also run individual scripts separately:
cd dataset_creation/
python create_dataset.py # Creates metadata
python move_dataset.py # Copies files to target directorycd dataset_analysis/
python mask_qa.py # Quality assurance
python other_datasets_overlap.py # Dataset overlap visualization
python imaplusplus_annotator_overlap.py # Annotator overlap visualizationcd multiannotator_analysis/
python create_multiannotator_subset.py # Create multi-annotator subset
python create_consensus_masks.py # Generate consensus masks
python compute_IAA_metrics.py # Compute IAA metrics
python compute_image_level_metrics.py # Aggregate metrics per imageEach module uses a config.yaml file for configuration. Key settings include:
- Paths: Source and target directories for images and masks
- Metadata: Paths to input and output metadata CSV files
- Processing options: Verbose logging, parallel processing settings
Example configuration structure:
# dataset_creation/config.yaml
orig_imgs_dirs:
jpg: ["/path/to/images/"]
orig_segs_dir: "/path/to/masks/"
raw_img_metadata_path: "./original_metadata_files/raw_ISIC_images_metadata.csv.gz"
raw_seg_masks_metadata_path: "./original_metadata_files/raw_ISIC_segmasks_metadata.csv"
target_data_dir: "/path/to/output/"All metadata files are saved in output/metadata/:
IMAplusplus_seg_metadata.csv: Complete segmentation metadataIMAplusplus_img_metadata.csv: Image metadataIMAplusplus_multiannotator_subset_seg_metadata.csv: Multi-annotator subset metadataIMAplusplus_multiannotator_subset_IAA_metrics.csv: Pairwise IAA metricsIMAplusplus_multiannotator_subset_IAA_metrics_summary.csv: Summary statisticsIMAplusplus_multiannotator_subset_image_level_metrics.csv: Per-image aggregated metricsIMAplusplus_seg_metadata_qa_results.csv: Quality assurance results
Each segmentation mask has the following metadata:
ISIC_id: ISIC image identifierimg_filename: Image filenameseg_filename: Segmentation mask filenameannotator: Anonymized annotator ID (A00, A01, ..., ST, MV)tool: Segmentation tool (T1: manual pointlist, T2: unknown, T3: autofill)skill_level: Annotator skill level (S1: expert, S2: novice)mskObjectID: Original mask object IDmask_md5: MD5 hash of the mask file
For images with multiple annotations, two consensus masks are generated:
- STAPLE (
*_ST_ST_ST_ST.png): STAPLE algorithm consensus - Majority Voting (
*_MV_MV_MV_MV.png): Majority voting consensus
The following metrics are computed for all pairwise mask comparisons:
Overlap Metrics:
- Dice coefficient
- Jaccard coefficient
Boundary Metrics:
- Hausdorff distance (HD)
- 95th percentile Hausdorff distance (HD95)
- Average symmetric surface distance (ASSD)
- Normalized versions (by image diagonal length)
Annotators are anonymized to IDs (A00, A01, ...) based on the number of segmentations they produced, sorted in decreasing order.
-
Tools:
- T1: Manual pointlist
- T2: Unknown/unspecified
- T3: Autofill
-
Skill Levels:
- S1: Expert
- S2: Novice
The mask_qa.py script validates masks for:
- Missing or corrupted files (high severity)
- Empty masks (high severity)
- Masks covering entire image (high severity)
- Disconnected regions (medium severity)
- Masks touching image borders (low severity)
Many scripts support parallel processing for improved performance:
- MD5 hash calculation
- Mask validation
- IAA metric computation
The repository includes scripts for generating various visualizations:
- UpSet plots: Annotator and dataset overlap patterns.
- Distribution plots: Dataset statistics.
- Metric visualizations: IAA metrics, agreement patterns.
- Overlaid segmentations: Visualizing segmentations that have zero overlap.
Visualizations are saved in output/visualizations/ in multiple formats (PNG, PDF, SVG).