No Active Events

Create notebooks and keep track of their status here.


Kaggle uses cookies from Google to deliver and enhance the quality of its services and to analyze traffic.
Learn more.
Ok, Got it.

Dataset Description

Your challenge in this competition is to classify the type of ovarian cancer from microscopy scans of biopsy samples.

This competition uses a hidden test. When your submitted notebook is scored, the actual test data (including a full length sample submission) will be made available to your notebook. Due to the size of the dataset the train images will not be available to your submission notebook.

Files

[train/test]_images A folder containing the relevant images. There are two categories of images: whole slide images (WSI) and tissue microarray (TMA). Whole slide images are at 20x magnification and can be quite large. The TMAs are smaller (roughly 4,000x4,000 pixels) but at 40x magnification.
The test set contains images from different source hospitals than the train set, with the largest area images almost 100,000 x 50,000 pixels. We strongly recommend taking an expansive approach to thinking about the scenarios your error handling should manage, including differences in image dimensions, quality, slide staining techniques, and more. Expect roughly 2,000 images in the test set, the majority of which are TMAs. The total size is 550 GB so simply loading the data will be time consuming. Be warned that the test set was specifically constructed to assess how well models generalize.

A handful of the largest images in the test set will not currently fit entirely in memory on a notebook with a GPU. We are investigating a solution for this and will post an update the week of October 18th.

[train/test].csv Labels for the train set.

  • image_id - A unique ID code for each image.
  • label - The target class. One of these subtypes of ovarian cancer: CC, EC, HGSC, LGSC, MC, Other. The Other class is not present in the training set; identifying outliers is one of the challenges of this competition. Only available for the train set.
  • image_width - The image width in pixels.
  • image_height - The image height in pixels.
  • is_tma - True if the slide is a tissue microarray. Only available for the train set.

[train/test]_thumbnails A folder containing smaller .png copies of the whole slide images. Thumbnails are not provided for TMAs.

sample_submission.csv A valid sample submission. Only the first row is available for download.

Using the data outside of the competition

We request that participants refrain from utilizing the competition data, either in its entirety or in part, for any external projects, research, or applications until the official publication of the competition paper. We will ensure that participants receive prompt notification when the embargo is lifted.

sample_submission.csv(23 B)
get_app
fullscreen
chevron_right
Competition Rules

To see this data you need to agree to the competition rules.Please sign in or register to accept the rules.

Data Explorer

774.93 GB

  • arrow_right
    folder

    test_images

  • arrow_right
    folder

    test_thumbnails

  • arrow_right
    folder

    train_images

  • arrow_right
    folder

    train_thumbnails

  • calendar_view_week

    sample_submission.csv

  • calendar_view_week

    test.csv

  • calendar_view_week

    train.csv

Summary

arrow_right
folder

1056 files

arrow_right
calendar_view_week

10 columns

navigate_nextminimize
kaggle competitions download -c UBC-OCEAN
text_snippet

Metadata