Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic.

Learn more.

Ok, Got it.

Research Code Competition

UBC Ovarian Cancer Subtype Classification and Outlier Detection (UBC-OCEAN)

Navigating Ovarian Cancer: Unveiling Common Histotypes and Unearthing Rare Variants

$50,000Prize Money

UBC
269 teams
2 months to go (2 months to go until merger deadline)

Dataset Description

Your challenge in this competition is to classify the type of ovarian cancer from microscopy scans of biopsy samples.

This competition uses a hidden test. When your submitted notebook is scored, the actual test data (including a full length sample submission) will be made available to your notebook. Due to the size of the dataset the train images will not be available to your submission notebook.

Files

[train/test]_images A folder containing the relevant images. There are two categories of images: whole slide images (WSI) and tissue microarray (TMA). Whole slide images are at 20x magnification and can be quite large. The TMAs are smaller (roughly 4,000x4,000 pixels) but at 40x magnification.
The test set contains images from different source hospitals than the train set, with the largest area images almost 100,000 x 50,000 pixels. We strongly recommend taking an expansive approach to thinking about the scenarios your error handling should manage, including differences in image dimensions, quality, slide staining techniques, and more. Expect roughly 2,000 images in the test set, the majority of which are TMAs. The total size is 550 GB so simply loading the data will be time consuming. Be warned that the test set was specifically constructed to assess how well models generalize.

A handful of the largest images in the test set will not currently fit entirely in memory on a notebook with a GPU. We are investigating a solution for this and will post an update the week of October 18th.

[train/test].csv Labels for the train set.

image_id - A unique ID code for each image.
label - The target class. One of these subtypes of ovarian cancer: CC, EC, HGSC, LGSC, MC, Other. The Other class is not present in the training set; identifying outliers is one of the challenges of this competition. Only available for the train set.
image_width - The image width in pixels.
image_height - The image height in pixels.
is_tma - True if the slide is a tissue microarray. Only available for the train set.

[train/test]_thumbnails A folder containing smaller .png copies of the whole slide images. Thumbnails are not provided for TMAs.

sample_submission.csv A valid sample submission. Only the first row is available for download.

Using the data outside of the competition

We request that participants refrain from utilizing the competition data, either in its entirety or in part, for any external projects, research, or applications until the official publication of the competition paper. We will ensure that participants receive prompt notification when the embargo is lifted.

Files

1056 files

Size

774.93 GB

Type

png, csv

License

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

sample_submission.csv(23 B)

Competition Rules

To see this data you need to agree to the competition rules.Please sign in or register to accept the rules.

Data Explorer

774.93 GB

test_images
test_thumbnails
train_images
train_thumbnails
sample_submission.csv
test.csv
train.csv

Summary

1056 files

10 columns

kaggle competitions download -c UBC-OCEAN

Metadata

License

Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0)

No Active Events

UBC Ovarian Cancer Subtype Classification and Outlier Detection (UBC-OCEAN)

Dataset Description

Files

Using the data outside of the competition

Files

Size

Type

License

sample_submission.csv(23 B)

Competition Rules

Data Explorer

Summary

Metadata

License