Video Visual Relation Detection (VidVRD) tracklets generation. also for ACM MM Visual Relation Understanding Grand Challenge

Last update: Dec 21, 2022

Related tags

Admin Panels VidVRD-tracklets

Overview

VidVRD-tracklets

This repository contains codes for Video Visual Relation Detection (VidVRD) tracklets generation based on MEGA and deepSORT. These tracklets are also suitable for ACM MM Visual Relation Understanding (VRU) Grand Challenge (which is base on the VidOR dataset).

If you are only interested in the generated tracklets, you can ignore these codes and download them directly from here

Download generated tracklets directly

We release the object tracklets for VidOR train/validation/test set. You can download the tracklets here, and put them in the following folder as

├── deepSORT
│   ├── ...
│   ├── tracking_results
│   │   ├── VidORtrain_freq1_m60s0.3_part01
│   │   ├── ...
│   │   ├── VidORtrain_freq1_m60s0.3_part14
│   │   ├── VidORval_freq1_m60s0.3
│   │   ├── VidORtest_freq1_m60s0.3
│   │   ├── readme.md
│   │   └── format_demo.py
│   └── ...
├── MEGA
│   ├── ... 
│   └── ...

Please refer to deepSORT/tracking_results/readme.md for more details

Evaluate the tracklets mAP

Run python deepSORT/eval_traj_mAP.py to evaluate the tracklets mAP. (you might need to change some args in deepSORT/eval_traj_mAP.py)

Generate object tracklets by yourself

The object tracklets generation pipeline mainly consists of two parts: MEGA (for video object detection), and deepSORT (for video object tracking).

Quick Start

Install MEGA as the official instructions MEGA/INSTALL.md (Note that the folder path may be different when installing).
- If you have any trouble when installing MEGA, you can try to clone the official MEGA repository and install it, and then replace the official mega.pytorch/mega_core with our modified MEGA/mega_core. Refer to MEGA/modification_details.md for the details of our modifications.
Download the VidOR dataset and the pre-trained weight of MEGA. Put them in the following folder as

├── deepSORT/
│   ├── ...
├── MEGA/
│   ├── ... 
│   ├── datasets/
│   │   ├── COCOdataset/        # used for MEGA training
│   │   ├── COCOinVidOR/        # used for MEGA training
│   │   ├── vidor-dataset/
│   │   │   ├── annotation/
│   │   │   │   ├── training/
│   │   │   │   └── validation/
│   │   │   ├── img_index/ 
│   │   │   │   ├── VidORval_freq1_0024.txt
│   │   │   │   ├── ...
│   │   │   ├── val_frames/
│   │   │   │   ├── 0001_2793806282/
│   │   │   │   │   ├── 000000.JPEG
│   │   │   │   │   ├── ...
│   │   │   │   ├── ...
│   │   │   ├── val_videos/
│   │   │   │   ├── 0001/
│   │   │   │   │   ├── 2793806282.mp4
│   │   │   │   │   ├── ...
│   │   │   │   ├── ...
│   │   │   ├── train_frames/
│   │   │   ├── train_videos/
│   │   │   ├── test_frames/
│   │   │   ├── test_videos/
│   │   │   └── video2img_vidor.py
│   │   └── construct_img_idx.py
│   ├── training_dir/
│   │   ├── COCO34ORfreq32_4gpu/
│   │   │   ├── inference/
│   │   │   │   ├── VidORval_freq1_0024/
│   │   │   │   │   ├── predictions.pth
│   │   │   │   │   └── result.txt
│   │   │   │   ├── ...
│   │   │   └── model_0180000.pth
│   │   ├── ...

Run python MEGA/datasets/vidor-dataset/video2img_vidor.py (note that you may need to change some args) to extract frames from videos (This causes a lot of data redundancy, but we have to do this, because MEGA takes image data as input).
Run python MEGA/datasets/construct_img_idx.py (note that you may need to change some args) to generate the img_index used in MEGA inference.
- The generated .txt files will be saved in MEGA/datasets/vidor-dataset/img_index/. You can use VidORval_freq1_0024.txt as a demo for the following commands.
Run the following command to detect frame-level object proposals with bbox features (RoI pooled features).
```
CUDA_VISIBLE_DEVICES=0   python  \
    MEGA/tools/test_net.py \
    --config-file MEGA/configs/MEGA/inference/VidORval_freq1_0024.yaml \
    MODEL.WEIGHT MEGA/training_dir/COCO34ORfreq32_4gpu/model_0180000.pth \
    OUTPUT_DIR MEGA/training_dir/COCO34ORfreq32_4gpu/inference
```
- The above command will generate a predictions.pth file for this VidORval_freq1_0024 demo. We also release this predictions.pth here.
- the config files for VidOR train set are in MEGA/configs/MEGA/partxx
- The predictions.pth contains frame-level box positions and features (RoI features) for each object. For RoI features, they can be accessed through roifeats = boxlist.get_field("roi_feats"), if you are familiar with MEGA or maskrcnn-benchmark
Run python MEGA/mega_boxfeatures/cvt_proposal_result.py (note that you may need to change some args) to convert predictions.pth to a .pkl file for the following deepSORT stage.
- We also provide VidORval_freq1_0024.pkl here
Run python deepSORT/deepSORT_tracking_v2.py (note that you may need to change some args) to perform deepSORT tracking. The results will be saved in deepSORT/tracking_results/

Train MEGA for VidOR by yourself

Download MS-COCO and put them as shown in above.
Run python MEGA/tools/extract_coco.py to extract annotations for COCO in VidOR, which results in COCO_train_34classes.pkl and COCO_valmini_34classes.pkl
train MEGA by the following commands:

    python -m torch.distributed.launch \
        --nproc_per_node=4 \
        tools/train_net.py \
        --master_port=$((RANDOM + 10000)) \
        --config-file MEGA/configs/MEGA/vidor_R_101_C4_MEGA_1x_4gpu.yaml \
        OUTPUT_DIR MEGA/training_dir/COCO34ORfreq32_4gpu

More detailed training instructions will be updated soon...

Comments

modification_details.md not done yet?

Hello, I failed to follow the mega.pytorch.INSTALL.md due to the CUDA version being incompatible with the Pytorch version. I exactly followed the INSTALL.md step by step, and I did install pytorch=1.3.0 and cudatoolkit=10.0 successfully. But, I still failed to install the apex.

So, I jumped back to your repo. May I please ask you if you are still trying to release the 'modification_details.md' later on or all we need to do is just to replace the official 'mega.pytorch/mega_core' with your modified 'MEGA/mega_core' ?

Thank yous so much for your time! Looking forward to your response soon! Cheers!

opened by ComeOnComeOnTurnYourRadioOn 1
Error while detecting frame-level object proposals

I followed the steps of MEGA installations and everything went well, but during the detection of frame-level object proposals, I am getting this error:

"raise RuntimeError("{} is a zip archive (did you mean to use torch.jit.load()?)".format(f.name)) RuntimeError: MEGA/training_dir/COCO34ORfreq32_4gpu/model_0180000.pth is a zip archive (did you mean to use torch.jit.load()?)"

Any solution? PS: command used: CUDA_VISIBLE_DEVICES=0 python MEGA/tools/test_net.py --config-file MEGA/configs/MEGA/inference/VidORval_freq1_0024.yaml MODEL.WEIGHT MEGA/training_dir/COCO34ORfreq32_4gpu/model_0180000.pth OUTPUT_DIR MEGA/training_dir/COCO34ORfreq32_4gpu/inference

opened by yogesh-iitj 1

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

75 Nov 2, 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

DPT This repo is the official implementation of DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021). We provide code and model

111 Dec 21, 2022

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

PySlowFast PySlowFast is an open source video understanding codebase from FAIR that provides state-of-the-art video classification models with efficie

5.3k Jan 3, 2023

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination The offical implementation for the "NOH-NMS: Improving Pedestrian Detection by

64 Nov 11, 2022

Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

CTDNet The PyTorch code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection" Requirements Python 3.6

28 Oct 20, 2022

The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection (ACM MM'21) By Zhuofan Zong, Qianggang Cao, Biao Leng Introduction F

9 Jul 30, 2022

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

ManiSkill-Learn ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge, a large-scale learning-from-dem

48 Dec 30, 2022

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

35 Nov 20, 2022

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation A pytorch-version implementation

11 Oct 8, 2022

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

138 Dec 30, 2022

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Cutoff: A Simple Data Augmentation Approach for Natural Language This repository contains source code necessary to reproduce the results presented in

49 Dec 22, 2022

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

CPT This repository contains code and checkpoints for CPT. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Gener

341 Dec 29, 2022

Muzic: Music Understanding and Generation with Artificial Intelligence

Muzic is a research project on AI music that empowers music understanding and generation with deep learning and artificial intelligence.

2.6k Dec 30, 2022

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

11 Aug 26, 2022

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

1.3k Dec 31, 2022

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

3DVG-Transformer This repository is for the ICCV 2021 paper "3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds" Our method "3DV

22 Dec 11, 2022

A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation during a Web Penetration Testing

📡 WebMap A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation

274 Jan 1, 2023

Video Visual Relation Detection (VidVRD) tracklets generation. also for ACM MM Visual Relation Understanding Grand Challenge

Related tags

Overview

VidVRD-tracklets

Download generated tracklets directly

Evaluate the tracklets mAP

Generate object tracklets by yourself

Quick Start

Train MEGA for VidOR by yourself

You might also like...

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DPT: Deformable Patch-based Transformer for Visual Recognition (ACM MM2021)

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Code for ACM MM 2020 paper "NOH-NMS: Improving Pedestrian Detection by Nearby Objects Hallucination"

Code for ACM MM2021 paper "Complementary Trilateral Decoder for Fast and Accurate Salient Object Detection"

The code repository for "RCNet: Reverse Feature Pyramid and Cross-scale Shift Network for Object Detection" (ACM MM'21)

ManiSkill-Learn is a framework for training agents on SAPIEN Open-Source Manipulation Skill Challenge (ManiSkill Challenge), a large-scale learning-from-demonstrations benchmark for object manipulation.

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

A pytorch-version implementation codes of paper: "BSN++: Complementary Boundary Regressor with Scale-Balanced Relation Modeling for Temporal Action Proposal Generation"

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

Muzic: Music Understanding and Generation with Artificial Intelligence

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

[ICCV2021] 3DVG-Transformer: Relation Modeling for Visual Grounding on Point Clouds

A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation during a Web Penetration Testing

Comments

modification_details.md not done yet?

Error while detecting frame-level object proposals

Owner

Awesome Video Datasets

A curated list of the latest breakthroughs in AI by release date with a clear video explanation, link to a more in-depth article, and code.

object detection; robust detection; ACM MM21 grand challenge; Security AI Challenger Phase VII

AI grand challenge 2020 Repo (Speech Recognition Track)

Motion detector, Full body detection, Upper body detection, Cat face detection, Smile detection, Face detection (haar cascade), Silverware detection, Face detection (lbp), and Sending email notifications

This is the winning solution of the Endocv-2021 grand challange.

Code for the paper "Relation of the Relations: A New Formalization of the Relation Extraction Problem"

Few-shot Relation Extraction via Bayesian Meta-learning on Relation Graphs

A collection of models for image - text generation in ACM MM 2021.

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark