Open Source Datasets
Kinetics
A large-scale, high-quality dataset of URL links to approximately 300,000 video clips that covers 400 human action classes, including human-object interactions such as playing instruments, as well as human-human interactions such as shaking hands and hugging. Each action class has at least 400 video clips. Each clip is human annotated with a single action class and lasts around 10s.
dSprites - Disentanglement testing Sprites dataset
This dataset consists of 737,280 images of 2D shapes, procedurally generated from 5 ground truth independent latent factors, controlling the shape, scale, rotation and position of a sprite. This data can be used to assess the disentanglement properties of unsupervised learning methods.
DeepMind CNN/Daily Mail Reading Comprehension Corpus
This dataset contains over 1.5 million question and answer pairs for a reading comprehension task based on articles from the CNN and Daily Mail. Questions, answers and context are anonymised with random entity markers, thereby forcing systems to answer questions purely based on the context provided. This dataset accompanies the 'Teaching Machines to Read and Comprehend' paper.
Metacontrol for Adaptive Imagination-Based Optimization task
An artificially generated dataset for the spaceship task from 'Metacontrol for Adaptive Imagination-Based Optimization'. We generated five datasets, each containing scenes with a different number of planets (ranging from a single planet to five planets). Each dataset consisted of 100,000 training scenes and 1,000 testing scenes.
Collectible Card Game to Code
This dataset contains the language to code datasets described in our paper 'Latent Predictor Networks for Code Generation'.
Unsupervised Data Generated for GeoQuery and SAIL
This dataset contains the generated unsupervised data for GeoQuery and SAIL semantic parsing tasks in our paper 'Semantic Parsing with Semi-Supervised Sequential Autoencoders'.