Open Images (Krasin, Duerig, Alldrin et al. 2017, with V4 in Kuznetsova et al., IJCV 2020) is Google's large-scale, openly licensed image dataset, intended as a successor to ImageNet at greater scale and richer annotation. The current release is Open Images V7 (October 2022).
Composition (V7)
- 9.0 million images in the full set, of which 1.7 million have detection annotations.
- 19,995 image-level label classes (machine-generated, human-verified for high-confidence subset).
- 600 object-detection classes with 15.8 million bounding boxes.
- 2.7 million instance segmentation masks across 350 classes.
- 3.3 million visual-relationship annotations ("woman holding cat", "person on bicycle").
- 66.4 million human-verified point-level segmentation labels (V7's distinctive contribution).
- 391,073 narrative localisation annotations linking spoken descriptions to image regions.
Images are sourced from Flickr under Creative Commons licences, predominantly CC-BY-2.0.
Licensing
Annotations are released under CC-BY-4.0 by Google; images retain their Flickr CC licences (CC-BY, CC-BY-SA, CC0). Open Images is one of the few large vision datasets whose image content is unambiguously redistributable.
Models trained on Open Images
Open Images has been the substrate for several Google detection systems including a generation of Inception-ResNet detection models, the EfficientDet family, SAM (Segment Anything Model) pre-training augmentations, and Owl-ViT open-vocabulary detection. The Open Images Challenge (2018-2019) drove a step-function in long-tail detection performance.
Issues and reception
Open Images' scale of human verification is its great strength; 66 M point labels is more annotation effort than any prior open dataset. However, the 600-class detection vocabulary still under-covers fine-grained distinctions, and the visual-relationship labels are sparse relative to image complexity. The dataset has been criticised for the photographic skew of Flickr CC content (over-represented: weddings, vacations, sunsets; under-represented: workplace and domestic scenes).
Open Images is the closest open analogue to internal Google detection corpora and has shaped the practical training of every open detection model since 2017, even where it does not appear on the headline benchmark line (which is usually COCO or LVIS).
Discussed in:
- Chapter 9: Neural Networks, Computer Vision