In this task, participants are asked to complete two independent binary image classification tasks that involve three unique diagnoses of skin lesions (melanoma, nevus, and seborrheic keratosis). In the first binary classification task, participants are asked to distinguish between (a) melanoma and (b) nevus and seborrheic keratosis. In the second binary classification task, participants are asked to distinguish between (a) seborrheic keratosis and (b) nevus and melanoma.
Definitions:
Lesion classification data includes the original image, paired with a gold standard (definitive) diagnosis, referred to as "Ground Truth".
Training Image Data
2000 images are provided as training data, including 374 "melanoma", 254 "seborrheic keratosis", and the remainder as benign nevi (1372). The training data is provided as a ZIP file, containing dermoscopic lesion images in JPEG format and a CSV file with some clinical metadata for each image.
All images are named using the scheme ISIC_<image_id>.jpg, where <image_id> is a 7-digit unique identifier. EXIF tags in the images have been removed; any remaining EXIF tags should not be relied upon to provide accurate metadata.
The CSV file contains three columns:
image_id, identifying the image that the row corresponds toage_approximate, containing the age of the lesion patient, rounded to 5 year intervals, or "unknown"sex, containing the sex of the lesion patient, or "unknown"Ground Truth Data
The Training Ground Truth file is a single CSV (comma-separated value) file, containing 3 columns:
ISIC_<image_id>, where <image_id> matches the corresponding Training Data image.Malignancy diagnosis data were obtained from expert consensus and pathology report information. Participants are not strictly required to limit development to the training data, and are free to train their algorithm using external data sources. However, any other sources of data in system development must be properly cited in the abstract.
This year, there are two phases for result submission:
An optional Validation Phase, with 150 images. Submissions to the Validation Phase are immediately evaluated and made public, allowing participants to test their submission systems and get some feedback on the performance of their submitted algorithm.
An official Test Phase, with 600 images.. Submissions to the Test Phase are made against a blind held-out dataset and are immediately evaluated, but not made public until after the final submission date, as they constitute the final evaluation of participants' algorithms.
Participants may make unlimited and independent submissions to each phase, but only the most recent submission to the Test Phase will be used for official judging.
Participants will be ranked according to each category individually, as well as the average performance across both categories (giving rise to the possibility of 3 distinct "winners"). Ranks and awards will be assigned based only on area under the receiver operating characteristic curve (AUC). However, submissions will also be evaluated using using a variety of common binary classification metrics, reported for scientific completeness, including:
Some useful resources for metrics computation include: