In 2016, we introduced Open Images, a collaborative release of ~9 million images annotated with labels spanning thousands of object categories. Since its initial release, we've been hard at work updating and refining the dataset, in order to provide a useful resource for the computer vision community to develop new models

Today, we are happy to announce Open Images V4, containing 15.4M bounding-boxes for 600 categories on 1.9M images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. The images are very diverse and often contain complex scenes with several objects (8 per image on average; visualizer).
Annotated images from the Open Images dataset. Left: Mark Paul Gosselaar plays the guitar by Rhys A. Right: Civilization by Paul Downey. Both images used under CC BY 2.0 license.
In conjunction with this release, we are also introducing the Open Images Challenge, a new object detection challenge to be held at the 2018 European Conference on Computer Vision (ECCV 2018). The Open Images Challenge follows in the tradition of PASCAL VOC, ImageNet and COCO, but at an unprecedented scale.

This challenge is unique in several ways:
  • 12.2M bounding-box annotations for 500 categories on 1.7M training images,
  • A broader range of categories than previous detection challenges, including new objects such as “fedora” and “snowman”.
  • In addition to the object detection main track, the challenge includes a Visual Relationship Detection track, on detecting pairs of objects in particular relations, e.g. “woman playing guitar”.
The training set is available now. A test set of 100k images will be released on July 1st 2018 by Kaggle. Deadline for submission of results is on September 1st 2018. We hope that the very large training set will stimulate research into more sophisticated detection models that will exceed current state-of-the-art performance, and that the 500 categories will enable a more precise assessment of where different detectors perform best. Furthermore, having a large set of images with many objects annotated enables to explore Visual Relationship Detection, which is a hot emerging topic with a growing sub-community.

In addition to the above, Open Images V4 also contains 30.1M human-verified image-level labels for 19,794 categories, which are not part of the Challenge. The dataset includes 5.5M image-level labels generated by tens of thousands of users from all over the world at crowdsource.google.com.