Challenges: Detections | Captions | Keypoints
Begin by learning about the individual challenges. Compete to earn prizes and opportunities to present your work. Come to the workshops to learn about the state of the art!
Download the dataset, including the dataset tools, images, and annotations. Learn about the annotation format. See cocoDemo in either the Matlab or Python code.
cocoDemo
Develop your algorithm. Run your algorithm on COCO and save the results using the format described on this page. See evalDemo in either the Matlab or Python code.
evalDemo
Evaluate: Detections | Captions
Evaluate results of your system on the validation set. The same code will run on the evaluation server when you upload your results. See evalDemo in either the Matlab or Python code and evalCapDemo in the Python code for detection and caption demo code, respectively.
evalDemo |
evalCapDemo
Upload: Detections | Captions
Upload your results to the evaluation server.
Leaderboard: Detections | Captions
Check out the state-of-the-art! See what algorithms are best at the various tasks.
Tools
Matlab+Python+Lua APIs [Version 2.0]
V2.0 of the API was completed 07/2015 and includes detection evaluation code. The Lua API, added 05/2016, supports only load and view functionality (no eval code).
Images
2014 Training images [80K/13GB]
2014 Val. images [40K/6.2GB]
2014 Testing images [40K/6.2GB]
2015 Testing images [80K/12.4G]
Annotations
2014 Train/Val object instances [158MB]
2014 Train/Val person keypoints [70MB]
2014 Train/Val image captions [18.8MB]
2014 Testing Image info [0.74MB]
2015 Testing Image info [1.83MB]
Note: annotations updated on 07/23/2015 with the addition of a "coco_url" field (for allowing of direct downloads of individual images).
1. Overview
The 2014 Testing Images are for the COCO Captioning Challenge, while the 2015 Testing Images are for the Detection and Keypoint Challenges. The train and val data are common to all challenges. Note also that as an alternative to downloading the large image zip files, individual images may be downloaded from the COCO website using the "coco_url" field specified in the image info struct (see details below).
Please follow the instructions in the README to download and setup the COCO data (annotations and images). By downloading this dataset, you agree to our Terms of Use.
2. COCO API
The COCO API assists in loading, parsing, and visualizing annotations in COCO. The API supports object instance, object keypoint, and image caption annotations (for captions not all functionality is defined). For additional details see: CocoApi.m, coco.py, and CocoApi.lua for Matlab, Python, and Lua code, respectively, and also the Python API demo.
3. MASK API
COCO provides segmentation masks for every object instance. This creates two challenges: storing masks compactly and performing mask computations efficiently. We solve both challenges using a custom Run Length Encoding (RLE) scheme. The size of the RLE representation is proportional to the number of boundaries pixels of a mask and operations such as area, union, or intersection can be computed efficiently directly on the RLE. Specifically, assuming fairly simple shapes, the RLE representation is O(√n) where n is number of pixels in the object, and common computations are likewise O(√n). Naively computing the same operations on the decoded masks (stored as an array) would be O(n).
The MASK API provides an interface for manipulating masks stored in RLE format. The API is defined below, for additional details see: MaskApi.m, mask.py, or MaskApi.lua. Finally, we note that a majority of ground truth masks are stored as polygons (which are quite compact), these polygons are converted to RLE when needed.
4. Annotation format
COCO currently has three annotation types: object instances, object keypoints, and image captions. The annotations are stored using the JSON file format. All annotations share the basic data structure below:
The data structures specific to the various annotation types are described below.
4.1. Object Instance Annotations
Each instance annotation contains a series of fields, including the category id and segmentation mask of the object. The segmentation format depends on whether the instance represents a single object (iscrowd=0 in which case polygons are used) or a collection of objects (iscrowd=1 in which case RLE is used). Note that a single object (iscrowd=0) may require multiple polygons, for example if occluded. Crowd annotations (iscrowd=1) are used to label large groups of objects (e.g. a crowd of people). In addition, an enclosing bounding box is provided for each object (box coordinates are measured from the top left image corner and are 0-indexed). Finally, the categories field of the annotation structure stores the mapping of category id to category and supercategory names. See also the Detection Challenge.
4.2. Object Keypoint Annotations
A keypoint annotation contains all the data of the object annotation (including id, bbox, etc.) and two additional fields. First, "keypoints" is a length 3k array where k is the total number of keypoints defined for the category. Each keypoint has a 0-indexed location x,y and a visibility flag v defined as v=0: not labeled (in which case x=y=0), v=1: labeled but not visible, and v=2: labeled and visible. A keypoint is considered visible if it falls inside the object segment. "num_keypoints" indicates the number of labeled keypoints (v>0) for a given object (many objects, e.g. crowds and small objects, will have num_keypoints=0). Finally, for each category, the categories struct has two additional fields: "keypoints," which is a length k array of keypoint names, and "skeleton", which defines connectivity via a list of keypoint edge pairs and is used for visualization. Currently keypoints are only labeled for the person category (for most medium/large non-crowd person instances). See also the Keypoint Challenge.
4.3. Image Caption Annotations
These annotations are used to store image captions. Each caption describes the specified image and each image has at least 5 captions (some images have more). See also the Captioning Challenge.
1. Results Format Overview
This page describes the results format used by COCO. The general structure of the results format is similar for all annotation types: for both object detection (using either bounding boxes or object segments) and image caption generation. Submitting algorithm results on COCO for evaluation requires using the formats described below.
2. Results Format
The results format used by COCO closely mimics the format of the ground truth as described on the download page. We suggest reviewing the ground truth format before proceeding.
Each algorithmically produced result, such as an object bounding box, object segment, or image caption, is stored separately in its own result struct. This singleton result struct must contains the id of the image from which the result was generated (note that a single image will typically have multiple associated results). Results across the whole dataset are aggregated in an array of such result structs. Finally, the entire result struct array is stored to disk as a single JSON file (save via gason in Matlab or json.dump in Python).
The data struct for each of the three result types is described below. The format of the individual fields below (category_id, bbox, segmentation, etc.) is the same as for the ground truth (for details see the download page).
2.1. Object detection (bounding boxes)
Note: box coordinates are floats measured from the top left image corner (and are 0-indexed). We recommend rounding coordinates to the nearest tenth of a pixel to reduce resulting JSON file size.
2.2. Object detection (segmentation)
Note: a binary mask containing an object segment should be encoded to RLE using the MaskApi function encode(). For additional details see either MaskApi.m or mask.py. Note that the core RLE code is written in c (see maskApi.h), so it is possible to perform encoding without using Matlab or Python, but we do not provide support for this case.
2.3. Keypoint detection
Note: keypoint coordinates are floats measured from the top left image corner (and are 0-indexed). We recommend rounding coordinates to the nearest pixel to reduce file size. Note also that the visibility flags vi are not currently used (except for controlling visualization), we recommend simply setting vi=1.
2.4. Caption generation
3. Storing and Browsing Results
Example result JSON files are available in coco/results/ as part of the github package. Because the results format is similar to the ground truth annotation format, the CocoApi for accessing the ground truth can also be used to visualize and browse algorithm results. For details please see evalDemo (demo) and also loadRes() in the CocoApi.
1. Detection Evaluation
This page describes the detection evaluation code used by COCO. The evaluation code provided here can be used to obtain results on the publicly available COCO validation set. It computes multiple metrics described below. To obtain results on the COCO test set, for which ground truth annotations are hidden, generated results must be submitted to the evaluation server. For instructions on submitting results to the evaluation server please see the upload page. The exact same evaluation code, described below, is used to evaluate detections on the test set.
2. Metrics
The following 12 metrics are used for characterizing the performance of an object detector on COCO:
3. Results Format
The results format used for storing generated detections is described on the results format page. For reference, here is a summary of the detection results format for boxes and segments, respectively:
Note: box coordinates are floats measured from the top left image corner (and are 0-indexed). We recommend rounding coordinates to the nearest tenth of a pixel to reduce resulting JSON file size.
Note: binary masks should be encoded via RLE using the MaskApi function encode().
4. Evaluation Code
Evaluation code is available on the COCO github. Specifically, see either CocoEval.m or cocoeval.py in the Matlab or Python code, respectively. Also see evalDemo in either the Matlab or Python code (demo).
The evaluation parameters are as follows (defaults in brackets, in general no need to change):
Running the evaluation code via calls to evaluate() and accumulate() produces two data structures that measure detection quality. The two structs are evalImgs and eval, which measure quality per-image and aggregated across the entire dataset, respectively. The evalImgs struct has KxA entries, one per evaluation setting, while the eval struct combines this information into precision and recall arrays. Details for the two structs are below (see also CocoEval.m or cocoeval.py):
Finally summarize() computes the 12 detection metrics defined earlier based on the eval struct.
5. Analysis Code
In addition to the evaluation code, we also provide a function analyze() for performing a detailed breakdown of false positives. This was inspired by Diagnosing Error in Object Detectors by Derek Hoiem et al., but is quite different in implementation and details. The code generates plots like this:
Both plots show analysis of the ResNet (bbox) detector from Kaiming He et al., winner of the 2015 Detection Challenge. The first plot shows a breakdown of errors of ResNet for the person class; the second plot is an overall analysis of ResNet averaged over all categories.
Each plot is a series of precision recall curves where each PR curve is guaranteed to be strictly higher than the previous as the evaluation setting becomes more permissive. The curves are as follows:
The area under each curve is shown in brackets in the legend. In the case of the ResNet detector, overall AP at IoU=.75 is .399 and perfect localization would increase AP to .682. Interesting, removing all class confusions (both within supercategory and across supercategories) would only raise AP slightly to .713. Removing background fp would bump performance to .870 AP and the rest of the errors are missing detections (although presumably if more detections were added this would also add lots of fps). In summary, ResNet's errors are dominated by imperfect localization and background confusions.
For a given detector, the code generates a total of 372 plots! There are 80 categories, 12 supercategories, and 1 overall result, for a total of 93 different settings, and the analysis is performed at 4 scales (all, small, medium, large, so 93*4=372 plots). The file naming is [supercategory]-[category]-[size].pdf for the 80*4 per-category results, overall-[supercategory]-[size].pdf for the 12*4 per supercategory results, and overall-all-[size].pdf for the 1*4 overall results. Of all the plots, typically the overall and supercategory results are of the most interest.
Note: analyze() can take significant time to run, please be patient. As such, we typically do not run this code on the evaluation server; you must run the code locally using the validation set. Finally, currently analyze() is only part of the Matlab API; Python code coming soon.
1. Detections Upload
This page describes the upload instructions for submitting results to the detection evaluation server. Before proceeding, please review the results format and evaluation details. Submitting results allows you to participate in the COCO Detection Challenge and compare results to the state-of-the-art on the detection leaderboard.
2. Competition Details
The COCO 2015 Test Set can be obtained on the download page. The recommended training data consists of the COCO 2014 Training and Validation sets. External data of any form is allowed (except of course any form of annotation on the COCO Test set is forbidden). Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
There are two distinct detection challenges and associated leaderboards: for detectors that output bounding boxes and for detectors that output object segments. The bounding box challenge provides continuity with past challenges such as the PASCAL VOC; the detection by segmentation challenge encourages higher accuracy object localization. The evaluation code in the two cases computes IoU using boxes or segments, respectively, but is otherwise identical. Please see the evaluation details.
Please limit the number of entries to the evaluation server to a reasonable number, e.g. one entry per paper. To avoid overfitting, the number of submissions per user is limited to 2 upload per day and a maximum of 5 submissions per user. It is not acceptable to create multiple accounts for a single project to circumvent this limit. The exception to this is if a group publishes two papers describing unrelated methods, in this case both sets of results can be submitted for evaluation.
2.1. Test Set Splits
The 2015 COCO Test set consists of ~80K test images. To limit overfitting while giving researchers more flexibility to test their system, we have divided the test set into four roughly equally sized splits of ~20K images each: test-dev, test-standard, test-challenge, and test-reserve. Submission to the test set automatically results in submission on each split (identities of the splits are not publicly revealed). In addition, to allow for debugging and validation experiments, we allow researcher unlimited submission to test-dev. Each test split serves a distinct role; details below.
split | #imgs | submission | scores reported |
---|---|---|---|
Test-Dev | ~20K | unlimited | immediately |
Test-Standard | ~20K | limited | immediately |
Test-Challenge | ~20K | limited | challenge |
Test-Reserve | ~20K | limited | never |
Test-Dev: We place no limit on the number of submissions allowed to test-dev. In fact, we encourage use of the test-dev for performing validation experiments. Use test-dev to debug and finalize your method before submitting to the full test set.
Test-Standard: The test-standard split is the default test data for the detection competition. When comparing to the state of the art, results should be reported on test-standard.
Test-Challenge: The test-challenge split is used for the COCO Detection Challenge. Results will be revealed during the ImageNet and COCO Visual Recognition Challenges Workshop.
Test-Reserve: The test-reserve split is used to protect against possible overfitting. If there are substantial differences between a method's scores on test-standard and test-reserve this will raise a red-flag and prompt further investigation. Results on test-reserve will not be publicly revealed.
We emphasize that except for test-dev, results cannot be submitted to a single split and must instead be submitted on the full test set. A submission to the test set populates three leaderboards: test-dev, test-standard and test-challenge (the updated test-challenge leaderboard will not be revealed until the ECCV 2016 Workshop). It is not possible to submit to test-standard without submitting to test-challenge or vice-versa (however, it is possible to submit to the test set without making results public, see below). The identity of the images in each split is not revealed, except for test-dev.
2.2. Test-Dev Best Practices
The test-dev 2015 set is a subset of the 2015 Testing set. The specific images belonging to test-dev are listed in the "image_info_test-dev2015.json" file available on the download page as part of the "2015 Testing Image info" download. As discussed, we place no limit on the number of submissions allowed on test-dev. Note that while submitting to test-dev will produce evaluation results, doing so will not populate the public test-dev leaderboard. Instead, submitting to the full test set populates the test-dev leaderboard. This limits the number of results displayed on the test-dev leaderboard.
Test-dev should be used only for validation and debugging: in a publication it is not acceptable to report results on test-dev only. However, for validation experiments it is acceptable to report results of competing methods on test-dev (obtained from the public test-dev leaderboard). While test-dev is prone to some overfitting, we expect this may still be useful in practice. We emphasize that final comparisons should always be performed on test-standard.
The differences between the validation and test-dev sets are threefold: guaranteed consistent evaluation of test-dev using the evaluation server, test-dev cannot be used for training (annotations are private), and a leaderboard is provided for test-dev, allowing for comparison with the state-of-the-art. We note that the continued popularity of the outdated PASCAL VOC 2007 dataset partially stems from the fact that it allows for simultaneous validation experiments and comparisons to the state-of-the-art. Our goal with test-dev is to provide similar functionality (while keeping annotations private).
3. Enter The Competition
First you need to create an account on CodaLab. From your account you will be able to participate in all COCO challenges.
Before uploading your results to the evaluation server, you will need to create a JSON file containing your results in the correct format. The file should be named "detections_[testset]_[alg]_results.json". Replace [alg] with your algorithm name and [testset] with either "test-dev2015" or "test2015" depending on the test split you are using. Place the JSON file into a zip file named "detections_[testset]_[alg]_results.zip".
To submit your zipped result file to the COCO Detection Challenge click on the “Participate” tab on the CodaLab evaluation server. Select the challenge type (bbox or segm) and test split (test-dev or test). When you select “Submit / View Results” you will be given the option to submit new results. Please fill in the required fields and click “Submit”. A pop-up will prompt you to select the results zip file for upload. After the file is uploaded the evaluation server will begin processing. To view the status of your submission please select “Refresh Status”. Please be patient, the evaluation may take quite some time to complete (~20min on test-dev and ~80min on the full test set). If the status of your submission is “Failed” please check your file is named correctly and has the right format.
After you submit your results to the evaluation server, you can control whether your results are publicly posted to the CodaLab leaderboard. To toggle the public visibility of your results please select either “post to leaderboard” or “remove from leaderboard”. For now only one result can be published to the leaderboard at any time, we may change this in the future.
In addition to the CodaLab leaderboard, we also host our own more detailed leaderboard that includes additional results and method information (such as paper references). Note that the CodaLab leaderboard may contain results not yet migrated to our own leaderboard.
4. Download Evaluation Results
After evaluation is complete and the server shows a status of “Finished”, you will have the option to download your evaluation results by selecting “Download evaluation output from scoring step.” The zip file will contain three files:
The format of the eval file is described on the detection evaluation page.
1. Keypoint Evaluation
Note: Evaluation metrics were updated 09/05/2016. They are likely finalized, but are still subject to change if we discover any issues before the competition deadline. If you discover any flaws or pitfalls in the proposed metrics please contact us asap.
This page describes the keypoint evaluation metric used by COCO. The COCO keypoint task requires simultaneously detecting objects and localizing their keypoints (object locations are not given at test time). As the task of simultaneous detection and keypoint estimation is relatively new, we chose to adopt a novel metric inspired by object detection metrics. For simplicity, we refer to this task as keypoint detection and the prediction algorithm as the keypoint detector.
We suggest reviewing the evaluation metrics for object detection before proceeding. As in the other COCO tasks, the evaluation code can be used to evaluate results on the publicly available validation set. To obtain results on the test set, for which ground truth annotations are hidden, generated results must be submitted to the evaluation server. For instructions on submitting results to the evaluation server please see the upload page.
1.1. Evaluation Overview
The core idea behind evaluating keypoint detection is to mimic the evaluation metrics used for object detection, namely average precision (AP) and average recall (AR) and their variants. At the heart of these metrics is a similarity measure between ground truth objects and predicted objects. In the case of object detection, the IoU serves as this similarity measure (for both boxes and segments). Thesholding the IoU defines matches between the ground truth and predicted objects and allows computing precision-recall curves. To adopt AP/AR for keypoints detection, we thus only need to define an analogous similarity measure. We do so next by defining an object keypoint similarity (OKS) which plays the same role as the IoU.
1.2. Object Keypoint Similarity
For each object, ground truth keypoints have the form [x1,y1,v1,...,xk,yk,vk], where x,y are the keypoint locations and v is a visibility flag defined as v=0: not labeled, v=1: labeled but not visible, and v=2: labeled and visible. Each ground truth object also has a scale s which we define as the square root of the object segment area. For details on the ground truth format please see the download page.
For each object, the keypoint detector must output keypoint locations and an object-level confidence. Predicted keypoints for an object should have the same form as the ground truth: [x1,y1,v1,...,xk,yk,vk]. However, the detector's predicted vi are not currently used during evaluation, that is the keypoint detector is not required to predict per-keypoint visibilities or confidences.
We define the object keypoint similarity (OKS) as:
The di are the Euclidean distances between each corresponding ground truth and detected keypoint and the vi are the visibility flags of the ground truth (the detector's predicted vi are not used). To compute OKS, we pass the di through an unnormalized Guassian with standard deviation sκi, where s is the object scale and κi is a per-keypont constant that controls falloff. For each keypoint this yields a keypoint similarity that ranges between 0 and 1. These similarities are averaged over all labeled keypoints (keypoints for which vi>0). Predicted keypoints that are not labeled (vi=0) do not affect the OKS. Perfect predictions will have OKS=1 and predictions for which all keypoints are off by more than a few standard deviations sκi will have OKS~0. The OKS is analogous to the IoU. Given the OKS, we can compute AP and AR just as the IoU allows us to compute these metrics for box/segment detection.
1.3. Tuning OKS
We tune the κi such that the OKS is a perceptually meaningful and easy to interpret similarity measure. First, using 5000 redundantly annotated images in val, for each keypoint type i we measured the per-keypoint standard deviation σi with respect to object scale s. That is we compute σi2=E[di2/s2]. σi varies substantially for different keypoints: keypoints on a person's body (shoulders, knees, hips, etc.) tend to have a σ much larger than on a person's head (eyes, nose, ears).
To obtain a perceptually meaningful and interpretable similarity metric we set κi=2σi. With this setting of κi, at one, two, and three standard deviations of di/s the keypoint similarity exp(-di2/2s2κi2) takes on values of e-1/8=.88, e-4/8=.61 and e-9/8=.32. As expected, human annotated keypoints are normally distributed (ignoring occasional outliers). Thus, recalling the 68–95–99.7 rule, setting κi=2σi means that 68%, 95%, and 99.7% of human annotated keypoints should have a keypoint similarity of .88, .61, or .32 or higher, respectively (in practice the percentages are 75%, 95% and 98.7%).
The OKS is the average keypoint similarity across all (labeled) object keypoints. Below we plot the predicted OKS distribution with κi=2σi assuming 10 independent keypoints per object (blue curve) and the actual distribution of human OKS scores on the dually annotated data (green curve):
The curves don't match exactly for a few reasons: (1) object keypoints are not independent, (2) the number of labeled keypoints per objects varies, and (3) the real data contains 1-2% outliers (most of which are caused by annotators mistaking left for right or annotating the wrong person when two people are nearby). Nevertheless, the behavior is roughly as expected. We conclude with a few observations about human performance: (1) at OKS of .50, human performance is nearly perfect (95%), (2) median human OKS is ~.91, (3) human performance drops rapidly after an OKS of .95. Note that this OKS distribution can be used to predict human AR (as AR doesn't depend on false positives).
2. Metrics
The following 10 metrics are used for characterizing the performance of a keypoint detector on COCO:
3. Results Format
The results format used for storing generated keypoints is described on the results format page. For reference, here is a summary of the keypoint results:
Note: keypoint coordinates are floats measured from the top left image corner (and are 0-indexed). We recommend rounding coordinates to the nearest pixel to reduce file size. Note also that the visibility flags vi are not currently used (except for controlling visualization), we recommend simply setting vi=1.
4. Evaluation Code
Evaluation code is available on the COCO github. Specifically, see either CocoEval.m or cocoeval.py in the Matlab or Python code, respectively. Also see evalDemo in either the Matlab or Python code (demo).
1. Keypoints Upload
This page describes the upload instructions for submitting results to the keypoint evaluation server. Before proceeding, please review the results format and evaluation details. Submitting results allows you to participate in the COCO Keypoints Challenge and compare results to the state-of-the-art on the keypoints leaderboard.
2. Competition Details
The COCO 2015 Test Set can be obtained on the download page. The recommended training data consists of the COCO 2014 Training and Validation sets. External data of any form is allowed (except of course any form of annotation on the COCO Test set is forbidden). Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
Please limit the number of entries to the evaluation server to a reasonable number, e.g. one entry per paper. To avoid overfitting, the number of submissions per user is limited to 2 upload per day and a maximum of 5 submissions per user. It is not acceptable to create multiple accounts for a single project to circumvent this limit. The exception to this is if a group publishes two papers describing unrelated methods, in this case both sets of results can be submitted for evaluation.
2.1. Test Set Splits
The 2015 COCO Test set consists of ~80K test images. To limit overfitting while giving researchers more flexibility to test their system, we have divided the test set into four roughly equally sized splits of ~20K images each: test-dev, test-standard, test-challenge, and test-reserve. Submission to the test set automatically results in submission on each split (identities of the splits are not publicly revealed). In addition, to allow for debugging and validation experiments, we allow researcher unlimited submission to test-dev. Each test split serves a distinct role; details below.
split | #imgs | submission | scores reported |
---|---|---|---|
Test-Dev | ~20K | unlimited | immediately |
Test-Standard | ~20K | limited | immediately |
Test-Challenge | ~20K | limited | challenge |
Test-Reserve | ~20K | limited | never |
These are identical to the test splits used for the object detection challenge. To understand their role in more detail, and for best practices, please see the detection upload page (section 2).
3. Enter The Competition
First you need to create an account on CodaLab. From your account you will be able to participate in all COCO challenges.
Before uploading your results to the evaluation server, you will need to create a JSON file containing your results in the correct format. The file should be named "person_keypoints_[testset]_[alg]_results.json". Replace [alg] with your algorithm name and [testset] with either "test-dev2015" or "test2015" depending on the test split you are using. Place the JSON file into a zip file named "person_keypoints_[testset]_[alg]_results.zip".
To submit your zipped result file to the COCO Detection Challenge click on the “Participate” tab on the CodaLab evaluation server. Select test split (test-dev or test). When you select “Submit / View Results” you will be given the option to submit new results. Please fill in the required fields and click “Submit”. A pop-up will prompt you to select the results zip file for upload. After the file is uploaded the evaluation server will begin processing. To view the status of your submission please select “Refresh Status”. If the status of your submission is “Failed” please check your file is named correctly and has the right format.
After you submit your results to the evaluation server, you can control whether your results are publicly posted to the CodaLab leaderboard. To toggle the public visibility of your results please select either “post to leaderboard” or “remove from leaderboard”. For now only one result can be published to the leaderboard at any time, we may change this in the future.
In addition to the CodaLab leaderboard, we also host our own more detailed leaderboard that includes additional results and method information (such as paper references). Note that the CodaLab leaderboard may contain results not yet migrated to our own leaderboard.
4. Download Evaluation Results
After evaluation is complete and the server shows a status of “Finished”, you will have the option to download your evaluation results by selecting “Download evaluation output from scoring step.” The zip file will contain three files:
The format of the eval file is described on the keypoints evaluation page.
1. Caption Evaluation
This page describes the caption evaluation code used by COCO. The evaluation code provided here can be used to obtain results on the publicly available COCO validation set. It computes multiple common metrics, including BLEU, METEOR, ROUGE-L, and CIDEr (the writeup below contains references and descriptions of each metric). If you use the captions, evaluation code, or server, we ask that you cite Microsoft COCO Captions: Data Collection and Evaluation Server:
To obtain results on the COCO test set, for which ground truth annotations are hidden, generated results must be submitted to the evaluation server. For instructions on submitting results to the evaluation server please see the upload page. The exact same evaluation code, described below, is used to evaluate generated captions on the test set.
2. Results Format
The results format used for storing generated captions is described on the results format page. For reference, here is a summary of the caption results format:
3. Evaluation Code
Evaluation code can be obtained on the coco-captions github page. Unlike the general COCO API, the COCO caption evaluation code is only available under Python.
Running the evaluation code produces two data structures that summarize caption quality. The two structs are evalImgs and eval, which summarize caption quality per-image and aggregated across the entire test set, respectively. Details for the two data structures are given below. We recommend running the python caption evaluation demo for more details.
1. Captions Upload
This page describes the upload instructions for submitting results to the caption evaluation server. Before proceeding, please review the results format and evaluation details. Submitting results allows you to participate in the COCO Captioning Challenge 2015 and compare results to the state-of-the-art on the captioning leaderboard.
2. Competition Details
Training Data: The recommended training set for the captioning challenge is the COCO 2014 Training Set. The COCO 2014 Validation Set may also be used for training when submitting results on the test set. External data of any form is allowed (except any form of annotation on the COCO Testing set is forbidden). Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
Please limit the number of entries to the captioning challenge to a reasonable number, e.g. one entry per paper. To avoid overfitting to the test data, the number of submissions per user is limited to 1 upload per day and a maximum of 5 submissions per user. It is not acceptable to create multiple accounts for a single project to circumvent this limit. The exception to this is if a group publishes two papers describing unrelated methods, in this case both sets of results can be submitted for evaluation.
3. Enter The Competition
First you need to create an account on CodaLab. From your account you will be able to participate in all COCO challenges.
Before uploading your results to the evaluation server, you will need to create two JSON files containing your captioning results in the correct results format. One file should correspond to your results on the 2014 validation dataset, and the other to the 2014 test dataset. Both sets of results are required for submission. Your files should be named as follows:
Replace [alg] with your algorithm name and place both files into a single zip file named "results.zip".
To submit your zipped result file to the COCO Captioning Challenge click on the “Participate” tab on the CodaLab webpage. When you select “Submit / View Results” you will be given the option to submit new results. Please fill in the required fields and click “Submit”. A pop-up will prompt you to select the results zip file for upload. After the file is uploaded the evaluation server will begin processing. To view the status of your submission please select “Refresh Status”. Please be patient, the evaluation may take quite some time to complete. If the status of your submission is “Failed” please check to make sure your files are named correctly, they have the right format, and your zip file contains two files corresponding to the validation and testing datasets.
After you submit your results to the evaluation server, you can control whether your results are publicly posted to the CodaLab leaderboard. To toggle the public visibility of your results please select either “post to leaderboard” or “remove from leaderboard”. For now only one result can be published to the leaderboard at any time, we may change this in the future. After your results are posted to the CodaLab leaderboard, your captions on the validation dataset will be publicly available. Your captions on the test set will not be publicly released.
In addition to the CodaLab leaderboard, we also host our own more detailed leaderboard that includes additional results and method information (such as paper references). Note that the CodaLab leaderboard may contain results not yet migrated to our own leaderboard.
4. Download Evaluation Results
After evaluation is complete and the server shows a status of “Finished”, you will have the option to download your evaluation results by selecting “Download evaluation output from scoring step.” The zip file will contain five files:
The format of the evaluation files is described on the caption evaluation page. Please note that the *_evalImgs.json file is only available for download on the validation dataset, and not the test set.
Welcome to the COCO Captioning Challenge!
Winners were announced at CVPR 2015
Caption evaluation server remains open!
1. Introduction
Update: The COCO caption evaluation server remains open. Please submit new results to compare to state-of-the-art methods using several automatic evaluation metrics. The COCO 2015 Captioning Challenge is now, however, complete. Results were presented as part of the CVPR 2015 Large-scale Scene Understanding (LSUN) workshop and are available to view on the leaderboard.
The COCO Captioning Challenge is designed to spur the development of algorithms producing image captions that are informative and accurate. Teams will be competing by training their algorithms on the COCO 2014 dataset and having their results scored by human judges.
2. Dates
This captioning challenge is part of the Large-scale Scene Understanding (LSUN) CVPR 2015 workshop organized by Princeton University. For further details please visit the LSUN website.
3. Organizers
Yin Cui (Cornell)
Matteo Ruggero Ronchi (Caltech)
Tsung-Yi Lin (Cornell)
Piotr Dollár (Facebook AI Research)
Larry Zitnick (Microsoft Research)
4. Challenge Guidelines
Participants are recommended but not restricted to train their algorithms on COCO 2014 dataset. The results should contain a single caption for each validation and test image and they must be submitted and publicly published on the CodaLab leaderboard. Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
By the challenge deadline, both results on the validation and test sets must be submitted to the evaluation server. The results on validation will be public and used for performance diagnosis and visualization. The competitors' algorithms will be evaluated based on the feedback from human judges and the top performing teams will be awarded prizes. Two or three teams will also be invited to present at the LSUN workshop.
Please follow the instructions in the format, evaluate, and upload tabs which describe the results format, evaluation code, and upload instructions, resprecitvely. The COCO Caption Evaluation Toolkit is also available. The tooklit provides evaluation code for common metrics for caption analysis, including the BLEU, METEOR, ROUGE-L, and CIDEr metrics. Note that for the competition, instead of automated metrics, human judges will evaluate algorithm results.
Welcome to the COCO 2015 Detection Challenge!
1st Place Detection and Segmentation: Team MSRA
2nd Place Detection and Segmentation: Team FAIRCNN
Best Student Entry: Team ION
Detection results and winners' methods were presented at the ICCV 2015 ImageNet and COCO Visual Recognition Challenges Joint Workshop (slides and recording of all talks are now available). Challenge winners along with up-to-date results are available to view on the leaderboard. The evaluation server remains open for upload of new results.
1. Overview
We are pleased to announce the COCO 2015 Detection Challenge. This competition is designed to push the state of the art in object detection forward. Teams are encouraged to compete in either (or both) of two object detection challenges: using bounding box output or object segmentation output.
The COCO train, validation, and test sets, containing more than 200,000 images and 80 object categories, are available on the download page. All object instance are annotated with a detailed segmentation mask. Annotations on the training and validation sets (with over 500,000 object instances segmented) are publicly available.
2. Dates
3. Organizers
4. Award Committee
5. Challenge Guidelines
The detection evaluation page lists detailed information regarding how submissions will be scored. Instructions for submitting results are available on the detection upload page.
To limit overfitting while giving researchers more flexibility to test their system, we have divided the test set into a number of splits, including test-dev, test-standard, and test-challenge. Test-dev is used for debugging and validation experiments and allows for unlimited submission to the evaluation server. Test-standard is used to maintain a public leaderboard that is updated upon submission. Finally, test-challenge is used for the workshop competition; results will be revealed during the workshop at ICCV 2015. A more thorough explanation is available on the upload page.
Competitors are recommended but not restricted to train their algorithms on COCO 2014 train and val sets. The download page contains links to all COCO 2014 train+val images and associated annotations as well as the 2015 test images. Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
By the challenge deadline, results must be submitted to the evaluation server. Competitors' algorithms will be evaluated according to the rules described on the evaluation page. Challenge participants with the most successful and innovative methods will be invited to present.
After careful consideration, this challenge uses a more comprehensive comparison metric than the traditional AP at Intersection over Union (IoU) threshold of 0.5. Specifically, AP is averaged over multiple IoU values between 0.5 and 1.0; this rewards detectors with better localization. Please refer to the "Metrics" section of the evaluation page for a detailed explanation of the competition metrics.
6. Tools and Instructions
We provide extensive API support for the COCO images, annotations, and evaluation code. To download the COCO API, please visit our GitHub repository. For an overview of how to use the API, please visit the download page and consult the sections entitled COCO API and MASK API.
Due to the large size of the COCO dataset and the complexity of this challenge, the process of competing in this challenge may not seem simple. To help guide competitors to victory, we provide explanations and instructions for each step of the process on the download, format, evaluation, and upload pages. For additional questions, please contact cocodataset@outlook.com.
Welcome to the COCO 2016 Detection Challenge!
1. Overview
The COCO 2016 Detection Challenge is designed to push the state of the art in object detection forward. Teams are encouraged to compete in either (or both) of two object detection challenges: using bounding box output or object segmentation output.
This challenge is part of the ImageNet and COCO Visual Recognition workshop at ECCV 2016. For further details about the joint workshop please visit the workshop website. Participants are encouraged to participate in both the COCO and ImageNet detection challenges. Please also see the concurrent COCO 2016 Keypoint Challenge.
The COCO train, validation, and test sets, containing more than 200,000 images and 80 object categories, are available on the download page. All object instances are annotated with a detailed segmentation mask. Annotations on the training and validation sets (with over 500,000 object instances segmented) are publicly available.
This is the second COCO detection challenge and it closely follows the COCO 2015 Detection Challenge. In particular, the same data and metrics are being used for this year's challenge.
2. Dates
3. Organizers
4. Award Committee
5. Challenge Guidelines
The detection evaluation page lists detailed information regarding how submissions will be scored. Instructions for submitting results are available on the detection upload page.
To limit overfitting while giving researchers more flexibility to test their system, we have divided the test set into a number of splits, including test-dev, test-standard, and test-challenge. Test-dev is used for debugging and validation experiments and allows for unlimited submission to the evaluation server. Test-standard is used to maintain a public leaderboard that is updated upon submission. Finally, test-challenge is used for the workshop competition; results will be revealed during the workshop at ECCV 2016. A more thorough explanation is available on the upload page.
Competitors are recommended but not restricted to train their algorithms on COCO 2014 train and val sets. The download page contains links to all COCO 2014 train+val images and associated annotations as well as the 2015 test images. Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
By the challenge deadline, results must be submitted to the evaluation server. Competitors' algorithms will be evaluated according to the rules described on the evaluation page. Challenge participants with the most successful and innovative methods will be invited to present.
After careful consideration, this challenge uses a more comprehensive comparison metric than the traditional AP at Intersection over Union (IoU) threshold of 0.5. Specifically, AP is averaged over multiple IoU values between 0.5 and 1.0; this rewards detectors with better localization. Please refer to the "Metrics" section of the evaluation page for a detailed explanation of the competition metrics.
6. Tools and Instructions
We provide extensive API support for the COCO images, annotations, and evaluation code. To download the COCO API, please visit our GitHub repository. For an overview of how to use the API, please visit the download page and consult the sections entitled COCO API and MASK API.
Due to the large size of the COCO dataset and the complexity of this challenge, the process of competing in this challenge may not seem simple. To help, we provide explanations and instructions for each step of the process on the download, format, evaluation, and upload pages. For additional questions, please contact cocodataset@outlook.com.
Welcome to the COCO 2016 Keypoint Challenge!
1. Overview
Deadline has been extended to 09/16. We apologize for the delay of releasing evaluation code. The keypoint evaluation metrics is finalized and the keypoint evaluation server is open for test-dev evaluation. The full test set evaluation will open shortly. Thank you for your patience!
The COCO 2016 Keypoint Challenge requires localization of person keypoints in challenging, uncontrolled conditions. The keypoint challenge involves simultaneously detecting people and localizing their keypoints (person locations are not given at test time). For full details of this task please see the keypoint evaluation page.
This challenge is part of the ImageNet and COCO Visual Recognition workshop at ECCV 2016. For further details about the joint workshop please visit the workshop website. Please also see the concurrent COCO 2016 Detection Challenge.
Training and val data have now been released. The training set for this task consists of over 100K person instances labeled with keypoints (the majority of people in COCO at medium and large scales) and over 1 million total labeled keypoints. The val set has an addtional 50K annotated people.
2. Dates
3. Organizers
4. Award Committee
5. Challenge Guidelines
The keypoint evaluation page lists detailed information regarding how submissions will be scored. Instructions for submitting results are available on the keypoint upload page. Note that the keypoint challenge follows the detection challenge quite closely. Specifically, the same challenge rules apply and the same COCO images sets are used. Details follow below.
To limit overfitting while giving researchers more flexibility to test their system, we have divided the test set into a number of splits, including test-dev, test-standard, and test-challenge. Test-dev is used for debugging and validation experiments and allows for unlimited submission to the evaluation server. Test-standard is used to maintain a public leaderboard that is updated upon submission. Finally, test-challenge is used for the workshop competition; results will be revealed during the workshop at ECCV 2016. A more thorough explanation is available on the upload page.
Competitors are recommended but not restricted to train their algorithms on COCO 2014 train and val sets. The download page contains links to all COCO 2014 train+val images and associated annotations as well as the 2015 test images. Please specify any and all external data used for training in the "method description" when uploading results to the evaluation server.
By the challenge deadline, results must be submitted to the evaluation server. Competitors' algorithms will be evaluated according to the rules described on the evaluation page. Challenge participants with the most successful and innovative methods will be invited to present.
As noted earlier, the keypoint challenge involves simultaneously detecting people and localizing their keypoints (person locations are not given at test time). As this is a fairly under-explored setting, we have carefully designed a new set of metrics for this task. Please refer to the "Metrics" section of the evaluation page for a detailed explanation of the competition metrics.
6. Tools and Instructions
We provide extensive API support for the COCO images, annotations, and evaluation code. To download the COCO API, please visit our GitHub repository. For an overview of how to use the API, please visit the download page.
Due to the large size of the COCO dataset and the complexity of this challenge, the process of competing in this challenge may not seem simple. To help, we provide explanations and instructions for each step of the process on the download, format, evaluation, and upload pages. For additional questions, please contact cocodataset@outlook.com.
CIDEr-D | Meteor | ROUGE-L | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | SPICE | date |
---|
Metrics
References
CIDEr-D | Meteor | ROUGE-L | BLEU-1 | BLEU-2 | BLEU-3 | BLEU-4 | SPICE(x10) | date |
---|
Metrics
References
M1 | M2 | M3 | M4 | M5 | date |
---|
Metrics
Ranking
M1 | M2 | TOTAL | Ranking |
---|
References
AP | AP50 | AP75 | APM | APL | AR | AR50 | AR75 | ARM | ARL | date |
---|
Metrics
Please see the evaluation page for more detailed information about the metrics.
References
AP | AP50 | AP75 | APS | APM | APL | AR1 | AR10 | AR100 | ARS | ARM | ARL | date |
---|
Metrics
Please see the evaluation page for more detailed information about the metrics.
References