COIN the currently largest dataset for comprehensive instruction video analysis.

Last update: Dec 28, 2022

Related tags

Deep Learning annotations

Overview

COIN Dataset

COIN is the currently largest dataset for comprehensive instruction video analysis. It contains 11,827 videos of 180 different tasks (i.e., car polishing, make French fries) related to 12 domains (i.e., vehicle, dish). All videos are collected from YouTube and annotated with an efficient toolbox.

Authors and Contributors

Yansong Tang^*, Dajun Ding^†, Yongming Rao^*, Yu Zheng^*, Danyang Zhang^*, Lili Zhao^†, Jiwen Lu^*, Jie Zhou^*, Yongxiang Lian^*, Yao Li^†, Jiali Sun^†, Chang Liu^†, Dongge You^†, Zirun Yang^†, Jiaojiao Ge^†, Jiayun Wang^*

^*Tsinghua University
^†Meitu Inc.

Contact: coin.dataset@gmail.com

License

You may use the codes and files for research only, including sharing and modifying the material. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

Dataset and Annotation

Taxonomy

The COIN is organized in a hierarchical structure, which contains three levels: domain, task and step. The corresponding relationship can be found at taxonomy [link]. We provide the taxonomy file of COIN in csv format. Below, we show a small part of the texonomy stored in taxonomy.xlsx:

domain_target_mapping

target_action_mapping

Domains	Targets
...	...
Vehicle	ChangeCarTire
Vehicle	InstallLicensePlateFrame
...	...
Gadgets	ReplaceCDDriveWithSSD

Target Id	Target Label	Action Id	Action Label
...	...	...	...
13	ChangeCarTire	259	unscrew the screw
13	ChangeCarTire	260	jack up the car
13	ChangeCarTire	261	remove the tire
13	ChangeCarTire	262	put on the tire
13	ChangeCarTire	263	tighten the screws
...	...	...	...

We store the url of video and their annotation in JSON format, which can be accessed with the link [COIN](Project link page). The json file is similar to that of ActivityNet. Below, we show an example entry from the key field "database":

"LtRSn-ntcLY": {
			"duration": 131.0309,
			"class": "ReplaceCDDriveWithSSD",
			"video_url": "https://www.youtube.com/embed/LtRSn-ntcLY",
			"start": 56.640895694775196,
			"annotation": [
				{
					"id": "212",
					"segment": [
						60.0,
						69.0
					],
					"label": "take out the laptop CD drive"
				},
				{
					"id": "216",
					"segment": [
						71.0,
						82.0
					],
					"label": "insert the hard disk tray into the position of the CD drive"
				}
			],
			"subset": "training",
			"end": 85.714362947023,
			"recipe_type": 131
		}

From the entry, we can easily retrieve the Youtube ID, duration, ROI and procedure information of the video. The field "annotation" comprises of a list of all annotated procedures within the video. The field "class" and sub-field "id" correspond to "task" and "step" of the taxonomy respectively.

File Structure

The annotation information is saved in COIN.json.

Field Name	Type	Example	Description
`database`	string	-	Key filed of the annotation file.
-	string	`LtRSn-ntcLY`	Youtube ID of the video.
`duration`	float	56.640895694775196	Duration of the video in seconds.
`class`	string	`ReplaceCDDriveWithSSD`	Name of the task in the video.
`video_url`	string	`https://www.youtube.com/embed/LtRSn-ntcLY`	Url of the video.
`start`	float	56.640895694775196	Start time of the ROI of the video.
`end`	float	85.714362947023	End time of the ROI of the video.
`subset`	string	`training` or `validation`	Subset of the video.
`recipe_type`	int	131	ID number of the task.
`annotation`	string	-	Annotation information of the video.
`annotation`:`id`	int	212	ID number of the procedure.
`annotation`:`label`	string	`take out the laptop CD drive`	Name of the procedure.
`annotation`:`segment`	list of float (len=2)	`[60.0,69.0]`	Start and end time of the procedure.

Comments

ERROR VideoUnavailable A lot of links are unavailable

HI, Thank you for your contribution! There are actually many videos unavailable since some video may be deleted by the author. Actually, I only downloaded 10927 videos. So we could not download them all, right?

opened by Aanonymity3930 2
Invalid link for S3D features

Hi, The link for S3D features is invalid now, could you check it? https://drive.google.com/file/d/1zI4sxtWCccmcZ3alMUVpsgszTMmB0Bc5/view?usp=sharing

Thanks!!

opened by FOXamber 0
Is any narration set available?

Hi,

As I understand, COIN contains only actions to corresponding segments of video such as [12s - 15s] -> cut tomato. Does the datasets contain any narration like [12s - 15s] -> cut tomato / take a chopping board and a tomato, then cut tomato on the board etc..

opened by EmreOzkose 0
Invalid link for the features

Hi,

Thank you for providing this nice dataset. The links for downloading the features are invalid right now, could you please check and fix it?

Thanks!

opened by eitan159 0
Invalid link for the Features

Hi,

Thank you for providing this nice dataset. The link for downloading the features by Mega is invalid right now, could you please check and fix it?

Thanks!

opened by dairui01 5

COIN the currently largest dataset for comprehensive instruction video analysis.

Related tags

Overview

COIN Dataset

Authors and Contributors

License

Dataset and Annotation

Taxonomy

File Structure

Comments

ERROR VideoUnavailable A lot of links are unavailable

Invalid link for S3D features

Is any narration set available?

Invalid link for the features

Invalid link for the Features

Owner

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

A coin flip game in which you can put the amount of money below or equal to 1000 and then choose heads or tail

OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Easy and comprehensive assessment of predictive power, with support for neuroimaging features

🎯 A comprehensive gradient-free optimization framework written in Python

Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

LibFewShot: A Comprehensive Library for Few-shot Learning.

A comprehensive list of published machine learning applications to cosmology

Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

AgML is a comprehensive library for agricultural machine learning

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

COIN the currently largest dataset for comprehensive instruction video analysis.

Related tags

Overview

COIN Dataset

Authors and Contributors

License

Dataset and Annotation

Taxonomy

File Structure

Comments

ERROR VideoUnavailable A lot of links are unavailable

Invalid link for S3D features

Is any narration set available?

Invalid link for the features

Invalid link for the Features

Owner

Code and data of the Fine-Grained R2R Dataset proposed in paper Sub-Instruction Aware Vision-and-Language Navigation

AugLy is a data augmentations library that currently supports four modalities (audio, image, text & video) and over 100 augmentations

A Comprehensive Analysis of Weakly-Supervised Semantic Segmentation in Different Image Domains (IJCV submission)

[CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis

Pytorch implementation of COIN, a framework for compression with implicit neural representations 🌸

A coin flip game in which you can put the amount of money below or equal to 1000 and then choose heads or tail

OptaPlanner wrappers for Python. Currently significantly slower than OptaPlanner in Java or Kotlin.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Easy and comprehensive assessment of predictive power, with support for neuroimaging features

🎯 A comprehensive gradient-free optimization framework written in Python

Open-L2O: A Comprehensive and Reproducible Benchmark for Learning to Optimize Algorithms

🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen*, Kaixiong Zhou*, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang

LibFewShot: A Comprehensive Library for Few-shot Learning.

A comprehensive list of published machine learning applications to cosmology

Facilitating Database Tuning with Hyper-ParameterOptimization: A Comprehensive Experimental Evaluation

AgML is a comprehensive library for agricultural machine learning

Siamese-nn-semantic-text-similarity - A repository containing comprehensive Neural Networks based PyTorch implementations for the semantic text similarity task

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

[Preprint] "Bag of Tricks for Training Deeper Graph Neural Networks A Comprehensive Benchmark Study" by Tianlong Chen, Kaixiong Zhou, Keyu Duan, Wenqing Zheng, Peihao Wang, Xia Hu, Zhangyang Wang