VidVRD-tracklets
This repository contains codes for Video Visual Relation Detection (VidVRD) tracklets generation based on MEGA and deepSORT. These tracklets are also suitable for ACM MM Visual Relation Understanding (VRU) Grand Challenge (which is base on the VidOR dataset).
If you are only interested in the generated tracklets, you can ignore these codes and download them directly from here
Download generated tracklets directly
We release the object tracklets for VidOR train/validation/test set. You can download the tracklets here, and put them in the following folder as
├── deepSORT
│ ├── ...
│ ├── tracking_results
│ │ ├── VidORtrain_freq1_m60s0.3_part01
│ │ ├── ...
│ │ ├── VidORtrain_freq1_m60s0.3_part14
│ │ ├── VidORval_freq1_m60s0.3
│ │ ├── VidORtest_freq1_m60s0.3
│ │ ├── readme.md
│ │ └── format_demo.py
│ └── ...
├── MEGA
│ ├── ...
│ └── ...
Please refer to deepSORT/tracking_results/readme.md
for more details
Evaluate the tracklets mAP
Run python deepSORT/eval_traj_mAP.py
to evaluate the tracklets mAP. (you might need to change some args in deepSORT/eval_traj_mAP.py)
Generate object tracklets by yourself
The object tracklets generation pipeline mainly consists of two parts: MEGA
(for video object detection), and deepSORT
(for video object tracking).
Quick Start
-
Install MEGA as the official instructions
MEGA/INSTALL.md
(Note that the folder path may be different when installing).- If you have any trouble when installing MEGA, you can try to clone the official MEGA repository and install it, and then replace the official
mega.pytorch/mega_core
with our modifiedMEGA/mega_core
. Refer toMEGA/modification_details.md
for the details of our modifications.
- If you have any trouble when installing MEGA, you can try to clone the official MEGA repository and install it, and then replace the official
-
Download the VidOR dataset and the pre-trained weight of MEGA. Put them in the following folder as
├── deepSORT/
│ ├── ...
├── MEGA/
│ ├── ...
│ ├── datasets/
│ │ ├── COCOdataset/ # used for MEGA training
│ │ ├── COCOinVidOR/ # used for MEGA training
│ │ ├── vidor-dataset/
│ │ │ ├── annotation/
│ │ │ │ ├── training/
│ │ │ │ └── validation/
│ │ │ ├── img_index/
│ │ │ │ ├── VidORval_freq1_0024.txt
│ │ │ │ ├── ...
│ │ │ ├── val_frames/
│ │ │ │ ├── 0001_2793806282/
│ │ │ │ │ ├── 000000.JPEG
│ │ │ │ │ ├── ...
│ │ │ │ ├── ...
│ │ │ ├── val_videos/
│ │ │ │ ├── 0001/
│ │ │ │ │ ├── 2793806282.mp4
│ │ │ │ │ ├── ...
│ │ │ │ ├── ...
│ │ │ ├── train_frames/
│ │ │ ├── train_videos/
│ │ │ ├── test_frames/
│ │ │ ├── test_videos/
│ │ │ └── video2img_vidor.py
│ │ └── construct_img_idx.py
│ ├── training_dir/
│ │ ├── COCO34ORfreq32_4gpu/
│ │ │ ├── inference/
│ │ │ │ ├── VidORval_freq1_0024/
│ │ │ │ │ ├── predictions.pth
│ │ │ │ │ └── result.txt
│ │ │ │ ├── ...
│ │ │ └── model_0180000.pth
│ │ ├── ...
-
Run
python MEGA/datasets/vidor-dataset/video2img_vidor.py
(note that you may need to change some args) to extract frames from videos (This causes a lot of data redundancy, but we have to do this, because MEGA takes image data as input). -
Run
python MEGA/datasets/construct_img_idx.py
(note that you may need to change some args) to generate the img_index used in MEGA inference.- The generated
.txt
files will be saved inMEGA/datasets/vidor-dataset/img_index/
. You can useVidORval_freq1_0024.txt
as a demo for the following commands.
- The generated
-
Run the following command to detect frame-level object proposals with bbox features (RoI pooled features).
CUDA_VISIBLE_DEVICES=0 python \ MEGA/tools/test_net.py \ --config-file MEGA/configs/MEGA/inference/VidORval_freq1_0024.yaml \ MODEL.WEIGHT MEGA/training_dir/COCO34ORfreq32_4gpu/model_0180000.pth \ OUTPUT_DIR MEGA/training_dir/COCO34ORfreq32_4gpu/inference
-
The above command will generate a
predictions.pth
file for thisVidORval_freq1_0024
demo. We also release thispredictions.pth
here. -
the config files for VidOR train set are in
MEGA/configs/MEGA/partxx
-
The
predictions.pth
contains frame-level box positions and features (RoI features) for each object. For RoI features, they can be accessed throughroifeats = boxlist.get_field("roi_feats")
, if you are familiar with MEGA or maskrcnn-benchmark
-
-
Run
python MEGA/mega_boxfeatures/cvt_proposal_result.py
(note that you may need to change some args) to convertpredictions.pth
to a.pkl
file for the following deepSORT stage.- We also provide
VidORval_freq1_0024.pkl
here
- We also provide
-
Run
python deepSORT/deepSORT_tracking_v2.py
(note that you may need to change some args) to perform deepSORT tracking. The results will be saved indeepSORT/tracking_results/
Train MEGA for VidOR by yourself
-
Download MS-COCO and put them as shown in above.
-
Run
python MEGA/tools/extract_coco.py
to extract annotations for COCO in VidOR, which results inCOCO_train_34classes.pkl
andCOCO_valmini_34classes.pkl
-
train MEGA by the following commands:
python -m torch.distributed.launch \
--nproc_per_node=4 \
tools/train_net.py \
--master_port=$((RANDOM + 10000)) \
--config-file MEGA/configs/MEGA/vidor_R_101_C4_MEGA_1x_4gpu.yaml \
OUTPUT_DIR MEGA/training_dir/COCO34ORfreq32_4gpu
More detailed training instructions will be updated soon...