Starter Code for VALUE benchmark

Overview

StarterCode for VALUE Benchmark

This is the starter code for VALUE Benchmark [website], [paper].

Overview of VALUE Benchmark

This repository currently supports all baseline models in VALUE paper, including training with different video-subtitle fusion methods, different input channels, different visual representations and multi-task training. You can also perform transfer evaluation between different tasks with our evaluation code.

Before dive into the baseline models mentioned above, please make yourself familiar with the codebase by going through the examples in Quick Start and Single Task Finetuning.

The code in this repo are copied/modified from open-source implementations made available by HERO.

Updates

  • [7/27/2021] Please re-download violin_test_private.db at this link if you downloaded via script/download_violin.sh prior to 7/27/2021. The previous version is not consistent with our release, sorry for your inconvenience.

Requirements

We use the provided Docker image in HERO for easier reproduction. Please follow Requirements in HERO to set up the environment.

Quick Start

NOTE: Please run bash scripts/download_pretrained.sh $PATH_TO_STORAGE to get the latest pretrained checkpoints from HERO.

We use TVR as an end-to-end example for single-task finetuning.

  1. Download processed data and pretrained models with the following command.

    bash scripts/download_tvr.sh $PATH_TO_STORAGE

    After downloading you should see the following folder structure:

    ├── video_db
    │   ├── tv
    ├── pretrained
    │   └── hero-tv-ht100.pt
    └── txt_db
        ├── tv_subtitles.db
        ├── tvr_train.db
        ├── tvr_val.db
        └── tvr_test.db
    
  2. Launch the Docker container for running the experiments.

    # docker image should be automatically pulled
    source launch_container.sh $PATH_TO_STORAGE/txt_db $PATH_TO_STORAGE/video_db \
        $PATH_TO_STORAGE/finetune $PATH_TO_STORAGE/pretrained

    The launch script respects $CUDA_VISIBLE_DEVICES environment variable. Note that the source code is mounted into the container under /src instead of built into the image so that user modification will be reflected without re-building the image. (Data folders are mounted into the container separately for flexibility on folder structures.)

  3. Run finetuning for the TVR task.

    # inside the container
    horovodrun -np 8 python train_retrieval.py --config config/train-tvr-8gpu.json \
        --output_dir $YOUR_TVR_OUTPUT_DIR
    
    # for single gpu
    python train_retrieval.py --config $YOUR_CONFIG_JSON
  4. Run inference for the TVR task.

    # inference, inside the container
    python eval_vcmr.py --query_txt_db /txt/tvr_val.db/ --split val \
        --vfeat_db /video/tv/ --sub_txt_db /txt/tv_subtitles.db/ \
        --output_dir $YOUR_TVR_OUTPUT_DIR --checkpoint $BEST_CKPT_STEP \
        --task tvr
    

    The result file will be written at ${YOUR_TVR_OUTPUT_DIR}/results_val/results_${BEST_CKPT_STEP}_all.json. Change to --query_txt_db /txt/tvr_test.db/ --split test for inference on test split. Please format the result file as requested in VALUE Evaluation Tools for submission, this repository does not include formatting.

  5. Misc. In case you would like to reproduce the whole preprocessing pipeline.

  • Text annotation and subtitle preprocessing

    # outside of the container
    # make sure you have downloaded/constructed the video dbs for TV dataset
    # the prepro of tv_subtitles.db requires information from video_db/tv
    bash scripts/create_txtdb.sh $PATH_TO_STORAGE/txt_db \
        $PATH_TO_STORAGE/ann $PATH_TO_STORAGE/video_db
  • Video feature extraction

    We follow feature extraction code at HERO_Video_Feature_Extractor. Please follow the link for instructions to extract video features from ResNet, SlowFast, S3D in Mil-NCE and CLIP-ViT models. These features are saved as separate .npz files per video.

  • Video feature preprocessing and saved to lmdb

    # inside of the container
    
    # Use resnet_slowfast as an example
    # Gather slowfast/resnet feature paths
    python scripts/collect_video_feature_paths.py  \
        --feature_dir $PATH_TO_STORAGE/vis_feat_dir\
        --output $PATH_TO_STORAGE/video_db --dataset $DATASET_NAME \
        --feat_version resnet_slowfast 
    
    # Convert to lmdb
    python scripts/convert_videodb.py \
        --vfeat_info_file $PATH_TO_STORAGE/video_db/$DATASET_NAME/resnet_slowfast_info.pkl \
        --output $PATH_TO_STORAGE/video_db --dataset $DATASET_NAME --frame_length 1.5 \
        --feat_version resnet_slowfast
    • --frame_length: 1 feature per "frame_length" seconds, we use 1.5 in our implementation. set it to be consistent with the one used in feature extraction.
    • --compress: enable compression of lmdb
    • --feat_version: choose from resnet_slowfast, resnet_mil-nce(ResNet+S3D in paper), clip-vit_slowfast, clip-vit_mil-nce(CLIP-ViT+S3D in paper).

VALUE Single Task Finetuning

Video Retrieval Tasks

All video retrieval tasks can be finetuned with train_retrieval.py. We use YC2R as an additional example to show how to perform single-task finetuning on video retrieval tasks.

  1. download data
    # outside of the container
    bash scripts/download_yc2.sh $PATH_TO_STORAGE
  2. train
    # inside the container
    horovodrun -np 4 python train_retrieval.py --config config/train-yc2r-4gpu.json \
        --output_dir $YC2R_EXP
  3. inference
    # inside the container
    python eval_vr.py --query_txt_db /txt/yc2r_test.db/ --split test \
        --vfeat_db /video/yc2/ --sub_txt_db /txt/yc2_subtitles.db/ \
        --output_dir $YC2R_EXP --checkpoint $ckpt --task yc2r
    The result file will be written at $YC2R_EXP/results_test/results_$ckpt_all.json, which can be submitted to the evaluation server. Please format the result file as requested in VALUE Evaluation Tools for submission.

Video QA Tasks

All video question answering models can be finetuned with train_qa.py. We use TVQA to demonstrate how to perform single-task finetuning on video question answering tasks.

  1. download data

    # outside of the container
    bash scripts/download_tvqa.sh $PATH_TO_STORAGE
  2. train

    # inside the container
    horovodrun -np 8 python train_qa.py --config config/train-tvqa-8gpu.json \
        --output_dir $TVQA_EXP
  3. inference

    # inside the container
    horovodrun -np 8 python eval_videoQA.py --query_txt_db /txt/tvqa_test.db/ --split test \
        --vfeat_db /video/tv/ --sub_txt_db /txt/tv_subtitles.db/ \
        --output_dir $TVQA_EXP --checkpoint $ckpt --task tvqa

    The result file will be written at $TVQA_EXP/results_test/results_$ckpt_all.json, which can be submitted to the evaluation server. Please format the result file as requested in VALUE Evaluation Tools for submission.

    Use eval_violin.py for inference on VIOLIN task.

Captioning tasks

All video captioning models can be finetuned with train_captioning.py. We use TVC to demonstrate how to perform single-task finetuning on video captioning tasks.

  1. download data

    # outside of the container
    bash scripts/download_tvc.sh $PATH_TO_STORAGE
  2. train

    # inside the container
    horovodrun -np 8 python train_captioning.py --config config/train-tvc-8gpu.json \
        --output_dir $TVC_EXP
  3. inference

    # inside the container
    python inf_tvc.py --model_dir $TVC_EXP --ckpt_step $ckpt \
        --target_clip /txt/tvc_val_release.jsonl --output tvc_val_output.jsonl
    • The result file will be written at $TVC_EXP/tvc_val_output.jsonl
    • change to --target_clip /txt/tvc_test_release.jsonl for test results.
    • see scripts/prepro_tvc.sh for LMDB preprocessing.

    Use inf_vatex_en_c.py / inf_yc2c.py for inference on VATEX_EN_C / YC2C task.

VALUE Multi-Task Finetuning

  1. download data

    # outside of the container
    bash scripts/download_all.sh $PATH_TO_STORAGE
  2. train

    # inside the container
    horovodrun -np 8 python train_all_multitask.py \
        --config config/train-all-multitask-8gpu.json \
        --output_dir $AT_PT_FT_EXP
    • --config: change config file for different multi-task settings.
      • MT by domain group: config/train-tv_domain-multitask-8gpu.json / config/train-youtube_domain-multitask-8gpu.json
      • MT by task type: config/train-retrieval-multitask-8gpu.json / config/train-qa-multitask-8gpu.json / config/train-caption-multitask-8gpu.json
      • AT: config/train-all-multitask-8gpu.json
    • For multi-task baselines without pre-training, refer to configs under config/FT_only_configs
  3. inference

    Follow the inference instructions above for each task.

Training with Different Input Channels

To reproduce our experiments with different input channels, change the training config via --config. Take TVR as an example:

  1. Video-only
    # inside the container
    horovodrun -np 8 python train_retrieval.py \
        --config config/FT_only_configs/train-tvr_video_only-8gpu.json \
        --output_dir $TVR_V_only_EXP
  2. Subtitle-only
    # inside the container
    
    horovodrun -np 8 python train_retrieval.py \
        --config config/FT_only_configs/train-tvr_sub_only-8gpu.json \
        --output_dir $TVR_S_only_EXP
  3. Video + Subtitle
    # inside the container
    
    horovodrun -np 8 python train_retrieval.py \
        --config config/FT_only_configs/train-tvr-8gpu.json \
        --output_dir $TVR_EXP

Training with Different Video-Subtitle Fusion Methods

To reproduce our experiments with different video-subtitle fusion methods, change the fusion methods via --model_config for training. Take TVR as an example:

# Training, inside the container
horovodrun -np 8 python train_retrieval.py --config config/FT_only_configs/train-tvr-8gpu.json \
    --output_dir $TVR_EXP --model_config config/model_config/hero_finetune.json
  • config/model_config/hero_finetune.json: default temporal align + cross-modal transformer
  • config/model_config/video_sub_sequence_finetune.json: sequence concatenation
  • config/model_config/video_sub_feature_add_finetune.json: temporal align + summation
  • config/model_config/video_sub_feature_concat_finetune.json: temporal align + concatenation

For two-stream experiments in our paper, please train video-only and subtitle-only models following Training with Video-only and Subtitle-only and use evaluation scripts in two_stream_eval. Take TVR as an example,

# Evaluation, inside the container
python eval_vcmr.py --query_txt_db /txt/tvr_val.db/ --split val \
    --vfeat_db /video/tv/ --sub_txt_db /txt/tv_subtitles.db/ \
    --video_only_model_dir $TVR_V_only_EXP --video_only_checkpoint $BEST_V_only_CKPT_STEP \
    --sub_only_model_dir $TVR_S_only_EXP --sub_only_checkpoint $BEST_S_only_CKPT_STEP \
    --task tvr

Training with Different Visual Representations

To reproduce our experiments with different visual representations, change the visual representations via --vfeat_version for training. Take TVR as an example:

# inside the container
horovodrun -np 8 python train_retrieval.py --config config/FT_only_configs/train-tvr-8gpu.json \
    --output_dir $TVR_EXP --vfeat_version resnet

We provide all feature variations used in the paper, including:

  • 2D features: resnet and clip-vit
  • 3D features: mil-nce(S3D in paper) and slowfast
  • 2D+3D features: resnet_slowfast, resnet_mil-nce(ResNet+S3D in paper), clip-vit_mil-nce(CLIP-ViT+S3D in paper), clip-vit_slowfast
  • --vfeat_version: default is set to be resnet_slowfast

Task Transferability Evaluation

To reproduce our experiments about task transferability, you will need to first have a trained model on source task and run evaluation on target task. Take TVR->How2R as an example:

  1. Train on TVR task
    # inside the container
    horovodrun -np 8 python train_retrieval.py --config config/FT_only_configs/train-tvr-8gpu.json \
        --output_dir $TVR_EXP 
  2. Evaluate the trained model on How2R task:
    # inside the container
    python eval_vcmr.py --query_txt_db /txt/how2r_val_1k.db/ --split val \
        --vfeat_db /video/how2/ --sub_txt_db /txt/how2_subtitles.db/ \
        --output_dir $TVR_EXP --checkpoint $BEST_TVR_CKPT_STEP \
        --task how2r

Pre-training

All VALUE baselines are based on the pre-trained checkpoint released in HERO. The pre-training experiments are not tested in this codebase.

If you wish to perform pre-training, please refer to instructions in HERO.

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{li2021value,
  title={VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation},
  author={Li, Linjie and Lei, Jie and Gan, Zhe and Yu, Licheng and Chen, Yen-Chun and Pillai, Rohit and Cheng, Yu and Zhou, Luowei and Wang, Xin Eric and Wang, William Yang and others},
  booktitle={35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks},
  year={2021}
}

@inproceedings{li2020hero,
  title={HERO: Hierarchical Encoder for Video+ Language Omni-representation Pre-training},
  author={Li, Linjie and Chen, Yen-Chun and Cheng, Yu and Gan, Zhe and Yu, Licheng and Liu, Jingjing},
  booktitle={EMNLP},
  year={2020}
}

License

MIT

Comments
  • id2nframe.json file not found

    id2nframe.json file not found

    Hi,

    I downloaded the data of a dataset (VLEP) with the corresponding script, but when using train_qa.py with the base config, the dataloading fails as it seems that the file loaded in data.py lines 60-65 {img_dir}/id2nframe_{frame_interval:g}.json or {img_dir}/id2nframe.json is missing (and I couldn't find it in the downloaded folders). Would you mind providing it please?

    PS1: it seems that when replacing name2nframe by None, the _compute_nframe function then fails as the downloaded database seems corrupted (fnames = json.loads(self.txn.get(key=b'keys').decode('utf-8')) --> lmdb.CorruptedError: mdb_get: MDB_CORRUPTED: Located page was wrong type).

    PS2: it seems also that default configs uses "vfeat_version": "resnet_slowfast", but the downloading script only downloads a "slowfast" folder in video_db, not a "resnet_slowfast" one.

    Best, Antoine Yang

    opened by antoyang 8
  • How2QA Preprocessing

    How2QA Preprocessing

    It seems create_txtdb.sh does not process How2 data. In particular, I do not understand what is the mapping between the original YouTube video (11 character unique identifier) and the vid_name in the txtdb of How2QA (which is numerical, e.g. 29752). Do you know where I could find this information?

    opened by antoyang 4
  • About the length of the sample video of the VLEP data set

    About the length of the sample video of the VLEP data set

    hello,I have noticed that the average video length of the VLEP dataset mentioned in the VALUE paper is 30s. But the article published the VLEP data set-"What is More Likely to Happen Next? Video-and-Language Future Event Prediction", in which table 2 mentioned that the Premise Event average length is 6s. why?

    opened by RQsky 3
  • Question: Is there a way to check if the dataset is lossless?

    Question: Is there a way to check if the dataset is lossless?

    Hi @linjieli222

    Since the dataset requires transferring large files, file corruption may occur due to network failures. Is there a way to check if the file I have received has been successfully downloaded without loss? (something like md5sum)

    opened by liveseongho 3
  • I follow the tutorial, but it is far from the effect in the paper

    I follow the tutorial, but it is far from the effect in the paper

    Hello, I almost follow the tutorial, but the model does not perform well on the validation set. Different from the tutorial, I made 2 changes. I don’t know if this is the reason. I hope you can give some suggestions. Thank you.

    1. I deleted the 144th line of code in save.py, because the system keeps prompting that the contents of the two files are different
    2. I changed the training batch_size to 8, because when the batch_size is larger, it prompts CUDA out of memory

    The attachment is the training log log.txt

    opened by RQsky 2
  • Connection timed out while downloading VATEX dataset

    Connection timed out while downloading VATEX dataset

    Hi, I'm trying to download VATEX dataset, but I failed. It seems the server is not responding. Could you check the status? All downloads were successful except for VATEX.

    HTTP request failed
    
    Head https://datarelease.blob.core.windows.net/value-leaderboard/starter_code_data/video_db/vatex.tar?timeout=901: dial tcp 20.150.35.196:443: i/o timeout
    
    INFO: trying to copy the source as container/directory/list of files
    
    failed to perform copy command due to error: cannot start job due to error: the destination must be an existing directory in this download scenario.
    
    --2021-07-28 10:05:47--  https://datarelease.blob.core.windows.net/value-leaderboard/starter_code_data/txt_db/vatex_subtitles.db.tar
    Resolving datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)... 20.150.35.196
    Connecting to datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)|20.150.35.196|:443... failed: Connection timed out.
    Retrying.
    
    --2021-07-28 10:07:58--  (try: 2)  https://datarelease.blob.core.windows.net/value-leaderboard/starter_code_data/txt_db/vatex_subtitles.db.tar
    Connecting to datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)|20.150.35.196|:443... failed: Connection timed out.
    Retrying.
    
    --2021-07-28 10:10:10--  (try: 3)  https://datarelease.blob.core.windows.net/value-leaderboard/starter_code_data/txt_db/vatex_subtitles.db.tar
    Connecting to datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)|20.150.35.196|:443... failed: Connection timed out.
    Retrying.
    
    --2021-07-28 10:12:25--  (try: 4)  https://datarelease.blob.core.windows.net/value-leaderboard/starter_code_data/txt_db/vatex_subtitles.db.tar
    Connecting to datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)|20.150.35.196|:443... failed: Connection timed out.
    Retrying.
    
    --2021-07-28 10:14:39--  (try: 5)  https://datarelease.blob.core.windows.net/value-leaderboard/starter_code_data/txt_db/vatex_subtitles.db.tar
    Connecting to datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)|20.150.35.196|:443... failed: Connection timed out.
    Retrying.
    
    --2021-07-28 10:16:55--  (try: 6)  https://datarelease.blob.core.windows.net/value-leaderboard/starter_code_data/txt_db/vatex_subtitles.db.tar
    Connecting to datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)|20.150.35.196|:443... failed: Connection timed out.
    Retrying.
    
    opened by liveseongho 2
  • Found some issues in 'train_all_multitask.py'

    Found some issues in 'train_all_multitask.py'

    Hi @linjieli222

    I found some issues while I try train_all_multitask.py.

    https://github.com/VALUE-Leaderboard/StarterCode/blob/7f124b4ddef86af887ca593b369d470bb0e3586b/train_all_multitask.py#L52-L53

    I think train_vcmr and train_vr should be modified to train_retrieval according to the current StarterCode. Current train_all_multitask.py reports an error message ModuleNotFoundError.

    Also, train-all-multitask-8gpu.json loads data from /data/release/txt_db_v2 and /data/release/video_db, but /data is not specified in launch_container.sh.

    Are /txt/tvr_train.db and /data/release/txt_db_v2/tvr_train.db same? Please explain about _v2.

    I think train_all_multitask.py is run on different container or environment. Could you check about running train_all_multitask.py? Before I modify the file path on config file manually, I want to check if I missed something.

    opened by liveseongho 2
  • Missing TVC files

    Missing TVC files

    Hi @linjieli222,

    Just wanted to let you know that there is a file missing for TVC which is tvc_val_release.jsonl. This is the error message I get:

    $ sh scripts/download_tvc.sh storage 
    azcopy exists, skip downloading
    --2021-07-21 09:29:49--  https://datarelease.blob.core.windows.net/value-leaderboard/tv_tasks/tvc_val_release.jsonl
    Resolving datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)... 20.150.35.196
    Connecting to datarelease.blob.core.windows.net (datarelease.blob.core.windows.net)|20.150.35.196|:443... connected.
    HTTP request sent, awaiting response... 404 The specified blob does not exist.
    2021-07-21 09:29:50 ERROR 404: The specified blob does not exist..
    
    File ‘storage/txt_db/tvc_test_release.jsonl’ already there; not retrieving.
    

    Luckily, I was able to download such files using the HERO scripts:

    RAW_URL=https://raw.githubusercontent.com/jayleicn/TVCaption/66666ec08657d8963b165b18eafabd6427d44261/data/
    for SPLIT in 'train' 'val' 'test_public'; do
        wget $RAW_URL/tvc_${SPLIT}_release.jsonl -P $DOWNLOAD/txt_db
    done
    
    opened by aleSuglia 2
  • Question: loading pretrained model weights for TVQA

    Question: loading pretrained model weights for TVQA

    Hi @linjieli222,

    I was trying to load the pretrained weights for TVQA fine-tuning. I'm using the file hero-tv-ht100.pt and when I try to load the checkpoint I get the following warnings:

    07/20/2021 17:16:39 - INFO - model.modeling_utils -   Weights of HeroForVideoQA not initialized from pretrained model: ['v_encoder.fom_output.linear_1.weight', 'v_encoder.fom_output.linear_1.bias', 'v_encoder.fom_output.LayerNorm.weight', 'v_encoder.fom_output.LayerNorm.bias', 'v_encoder.fom_output.linear_2.weight', 'v_encoder.fom_output.linear_2.bias', 'qa_pool.weight', 'qa_pred_head.linear_1.weight', 'qa_pred_head.linear_1.bias', 'qa_pred_head.LayerNorm.weight', 'qa_pred_head.LayerNorm.bias', 'qa_pred_head.linear_2.weight', 'qa_pred_head.linear_2.bias', 'st_ed_pool.weight', 'st_ed_pred_head.linear_1.weight', 'st_ed_pred_head.linear_1.bias', 'st_ed_pred_head.LayerNorm.weight', 'st_ed_pred_head.LayerNorm.bias', 'st_ed_pred_head.linear_2.weight', 'st_ed_pred_head.linear_2.bias']
    
    07/20/2021 17:16:39 - INFO - model.modeling_utils -   Weights from pretrained model not used in HeroForVideoQA: ['q_feat_attn.query_input_proj.LayerNorm.weight', 'q_feat_attn.query_input_proj.LayerNorm.bias', 'q_feat_attn.query_input_proj.net.1.weight', 'q_feat_attn.query_input_proj.net.1.bias', 'q_feat_attn.query_pos_embed.position_embeddings.weight', 'q_feat_attn.query_pos_embed.LayerNorm.weight', 'q_feat_attn.query_pos_embed.LayerNorm.bias', 'q_feat_attn.query_self_attention.self.query.weight', 'q_feat_attn.query_self_attention.self.query.bias', 'q_feat_attn.query_self_attention.self.key.weight', 'q_feat_attn.query_self_attention.self.key.bias', 'q_feat_attn.query_self_attention.self.value.weight', 'q_feat_attn.query_self_attention.self.value.bias', 'q_feat_attn.query_self_attention.output.dense.weight', 'q_feat_attn.query_self_attention.output.dense.bias', 'q_feat_attn.query_self_attention.output.LayerNorm.weight', 'q_feat_attn.query_self_attention.output.LayerNorm.bias', 'q_feat_attn.modular_vector_mapping.weight', 'video_query_linear.weight', 'video_query_linear.bias', 'video_st_predictor.weight', 'video_ed_predictor.weight', 'vocab_padded', 'v_encoder.fr_output.linear_1.weight', 'v_encoder.fr_output.linear_1.bias', 'v_encoder.fr_output.LayerNorm.weight', 'v_encoder.fr_output.LayerNorm.bias', 'v_encoder.fr_output.linear_2.weight', 'v_encoder.fr_output.linear_2.bias', 'v_encoder.itm_clip_transform.linear_1.weight', 'v_encoder.itm_clip_transform.linear_1.bias', 'v_encoder.itm_clip_transform.LayerNorm.weight', 'v_encoder.itm_clip_transform.LayerNorm.bias', 'v_encoder.itm_clip_transform.linear_2.weight', 'v_encoder.itm_clip_transform.linear_2.bias', 'v_encoder.itm_sub_transform.linear_1.weight', 'v_encoder.itm_sub_transform.linear_1.bias', 'v_encoder.itm_sub_transform.LayerNorm.weight', 'v_encoder.itm_sub_transform.LayerNorm.bias', 'v_encoder.itm_sub_transform.linear_2.weight', 'v_encoder.itm_sub_transform.linear_2.bias']
    

    Can you confirm that the model has been loaded successfully and there are no missing weights?

    opened by aleSuglia 2
  • Bug in Video captioning generator

    Bug in Video captioning generator

    Hi @linjieli222,

    I was looking at your code for the video captioning generator and I've noticed a potential bug. Specifically, I was looking at the method greedy_decode that is reported here: https://github.com/VALUE-Leaderboard/StarterCode/blob/main/model/videoCap.py#L314

    At the line https://github.com/VALUE-Leaderboard/StarterCode/blob/main/model/videoCap.py#L338, you compute output_ids as the argmax of the logits generated for the current timestep of the decoder. However, it seems that you do not accumulate them in a separate list. Indeed, you reuse the same variable output_ids when computing the outputs here: https://github.com/VALUE-Leaderboard/StarterCode/blob/main/model/videoCap.py#L341

    Am I wrong?

    Thanks!

    opened by aleSuglia 1
  • Retrieval test in How2r and VATEX dataset!!

    Retrieval test in How2r and VATEX dataset!!

    when I test in the How2r and VATEX dataset, I have a bug, such as: Computing Video Embeddings: 0%| | 0/3 [00:00<?, ?it/s] Traceback (most recent call last): File "eval_vr.py", line 427, in main(args) File "eval_vr.py", line 145, in main model, eval_dataloader, opts.split, opts, model_opts) File "/opt/conda/lib/python3.6/site-packages/torch/autograd/grad_mode.py", line 49, in decorate_no_grad return func(*args, **kwargs) File "eval_vr.py", line 216, in validate_full_vr video_item = val_loader.dataset.video_db[vid] File "/src/data/data.py", line 364, in getitem example = self.txt_db[vid_] File "/src/data/data.py", line 210, in getitem txt_dump = self.db[id_] File "/src/data/data.py", line 164, in getitem return msgpack.loads(decompress(self.txn.get(key.encode('utf-8'))), TypeError: a bytes-like object is required, not 'NoneType'

    opened by haoshuai714 0
  • Clarification about 'validate_full_vcmr

    Clarification about 'validate_full_vcmr"

    HI @linjieli222,

    I was looking at the validation phase of the retrieval setup and I can see you have implemented two different variants: validate and full_validate. From my understanding, validate only computes loss function scores while full_validate also generates predictions for which retrieval-based metrics are used. The validate setup is quite straightforward to me, so no clarification is required there. However, the full_validate looks quite complex and I haven't seen an actual description of this method in the original paper. In particular, could you please report a brief description of the steps that are reported in this function: https://github.com/VALUE-Leaderboard/StarterCode/blob/main/eval_vcmr.py#L172?

    question 
    opened by aleSuglia 1
  • Training hang

    Training hang

    Hello @linjieli222,

    I'm trying to train a model for VideoQA but I obtain the following error:

    [1,0]<stderr>:Stalled ranks:
    [1,0]<stderr>:1: [allgather.noname.1]
    [1,0]<stderr>:[2021-07-20 09:23:22.936726: W horovod/common/stall_inspector.cc:105] One or more tensors were submitted to be reduced, gathered or broadcasted by subset of ranks and are waiting for remainder of ranks for more than 60 seconds. This may indicate that different ranks are trying to submit different tensors or that only subset of ranks is submitting tensors, which will cause deadlock.
    [1,0]<stderr>:07/20/2021 09:31:42 - INFO - __main__ -   122039 samples loaded
    [1,0]<stderr>:A process has executed an operation involving a call
    [1,0]<stderr>:to the fork() system call to create a child process.
    [1,0]<stderr>:
    [1,0]<stderr>:As a result, the libfabric EFA provider is operating in
    [1,0]<stderr>:a condition that could result in memory corruption or
    [1,0]<stderr>:other system errors.
    [1,0]<stderr>:
    [1,0]<stderr>:For the libfabric EFA provider to work safely when fork()
    [1,0]<stderr>:is called, the application must handle memory registrations
    [1,0]<stderr>:(FI_MR_LOCAL) and you will need to set the following environment
    [1,0]<stderr>:variables:
    [1,0]<stderr>:          RDMAV_FORK_SAFE=1
    [1,0]<stderr>:MPI applications do not support this mode.
    [1,0]<stderr>:
    [1,0]<stderr>:However, this setting can result in signficant performance
    [1,0]<stderr>:impact to your application due to increased cost of memory
    [1,0]<stderr>:registration.
    [1,0]<stderr>:
    [1,0]<stderr>:You may want to check with your application vendor to see
    [1,0]<stderr>:if an application-level alternative (of not using fork)
    [1,0]<stderr>:exists.
    [1,0]<stderr>:
    [1,0]<stderr>:Please refer to https://github.com/ofiwg/libfabric/issues/6332
    [1,0]<stderr>:for more information.
    [1,0]<stderr>:
    [1,0]<stderr>:Your job will now abort.
    

    This happens immediately after the script loads the data. I can see the logging info [1,0]<stderr>:07/20/2021 09:31:42 - INFO - __main__ - 122039 samples loaded. Can you please advise?

    Otherwise, would you have a trained model for VideoQA that I can test?

    UPDATE: I've also tried with single GPU (by removing horovodrun) and the same error happens.

    help wanted 
    opened by aleSuglia 4
  • Docker and ease of debugging

    Docker and ease of debugging

    Hello,

    Thanks again for releasing this codebase. I just wanted to ask your opinion about Docker. Docker has definitely its benefits when running code on production but I believe it's not really well-suited for research prototyping. In particular, I believe that a research codebase should be really easy to use, debug and modify. Docker prevents you from freely debugging in your favourite IDE (PyCharm). I think it would be easy to do so if the codebase was easily installable using Anaconda just like other framework such as AllenNLP.

    Looking at the codebase, seems that the major problem when it comes to dependencies, is Horovod. Is there any way you can somehow refactor your codebase following Horovod-GPU-project? I think this will allow everybody to easily install Horovod without the burden of having to learn Docker internals.

    Another important point was the fact that currently seems that the codebase supports only GPU execution. However, most of the time, it would be useful to test/debug your model on CPU as well. In this case, the major issue that I see is that you assume that apex is installed. Huggingface decided to deprecate Apex because is now been outperformed by the internal Torch implementation (see https://github.com/huggingface/transformers/issues/9377 for details).

    I think these two points will incredibly improve the user experience of your codebase for people that: 1) don't have GPU; 2) don't have experience/the current environment to use Docker (some companies might not like it).

    I would love to hear your thoughts on this. I'm planning to modify the codebase myself tackle such problems so please let me know if you would be interested in a PR.

    Thanks, Alessandro

    opened by aleSuglia 2
Owner
VALUE Benchmark
VALUE Benchmark
Starter kit for getting started in the Music Demixing Challenge.

Music Demixing Challenge - Starter Kit ?? Challenge page This repository is the Music Demixing Challenge Submission template and Starter kit! Clone th

AIcrowd 106 Dec 20, 2022
This code is 3d-CNN model that can predict environmental value

Predict-environmental-value-3dCNN This code is 3d-CNN model that can predict environmental value. Firstly, I built a model that can create a lot of bu

null 1 Jan 6, 2022
A data-driven approach to quantify the value of classifiers in a machine learning ensemble.

Documentation | External Resources | Research Paper Shapley is a Python library for evaluating binary classifiers in a machine learning ensemble. The

Benedek Rozemberczki 188 Dec 29, 2022
Functional TensorFlow Implementation of Singular Value Decomposition for paper Fast Graph Learning

tf-fsvd TensorFlow Implementation of Functional Singular Value Decomposition for paper Fast Graph Learning with Unique Optimal Solutions Cite If you f

Sami Abu-El-Haija 14 Nov 25, 2021
Using Hotel Data to predict High Value And Potential VIP Guests

Description Using hotel data and AI to predict high value guests and potential VIP guests. Hotel can leverage on prediction resutls to run more effect

HCG 12 Feb 14, 2022
Segcache: a memory-efficient and scalable in-memory key-value cache for small objects

Segcache: a memory-efficient and scalable in-memory key-value cache for small objects This repo contains the code of Segcache described in the followi

TheSys Group @ CMU CS 78 Jan 7, 2023
On the model-based stochastic value gradient for continuous reinforcement learning

On the model-based stochastic value gradient for continuous reinforcement learning This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, a

Facebook Research 46 Dec 15, 2022
Weighted QMIX: Expanding Monotonic Value Function Factorisation

This repo contains the cleaned-up code that was used in "Weighted QMIX: Expanding Monotonic Value Function Factorisation"

whirl 82 Dec 29, 2022
Codes accompanying the paper "Learning Nearly Decomposable Value Functions with Communication Minimization" (ICLR 2020)

NDQ: Learning Nearly Decomposable Value Functions with Communication Minimization Note This codebase accompanies paper Learning Nearly Decomposable Va

Tonghan Wang 69 Nov 26, 2022
PyTorch implementation of the Value Iteration Networks (VIN) (NIPS '16 best paper)

Value Iteration Networks in PyTorch Tamar, A., Wu, Y., Thomas, G., Levine, S., and Abbeel, P. Value Iteration Networks. Neural Information Processing

LEI TAI 75 Nov 24, 2022
PyTorch implementation of Value Iteration Networks (VIN): Clean, Simple and Modular. Visualization in Visdom.

VIN: Value Iteration Networks This is an implementation of Value Iteration Networks (VIN) in PyTorch to reproduce the results.(TensorFlow version) Key

Xingdong Zuo 215 Dec 7, 2022
Pytorch implementation of Value Iteration Networks (NIPS 2016 best paper)

VIN: Value Iteration Networks A quick thank you A few others have released amazing related work which helped inspire and improve my own implementation

Kent Sommer 297 Dec 26, 2022
arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

arxiv-sanity, but very lite, simply providing the core value proposition of the ability to tag arxiv papers of interest and have the program recommend similar papers.

Andrej 671 Dec 31, 2022
A Simple Key-Value Data-store written in Python

mercury-db This is a File Based Key-Value Datastore that supports basic CRUD (Create, Read, Update, Delete) operations developed using Python. The dat

Vaidhyanathan S M 1 Jan 9, 2022
Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-like Documents.

Value Retrieval with Arbitrary Queries for Form-like Documents Introduction Pytorch Implementation of Value Retrieval with Arbitrary Queries for Form-

Salesforce 13 Sep 15, 2022
Viperdb - A tiny log-structured key-value database written in pure Python

ViperDB ?? ViperDB is a lightweight embedded key-value store written in pure Pyt

null 17 Oct 17, 2022
A code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Vanderhaeghe, and Yotam Gingold from SIGGRAPH Asia 2020.

A Benchmark for Rough Sketch Cleanup This is the code repository associated with the paper A Benchmark for Rough Sketch Cleanup by Chuan Yan, David Va

null 33 Dec 18, 2022
The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

TriageSQL The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question Intention Classification Benchmark for Text

Yusen Zhang 22 Nov 9, 2022
Official code for 'Robust Siamese Object Tracking for Unmanned Aerial Manipulator' and offical introduction to UAMT100 benchmark

SiamSA: Robust Siamese Object Tracking for Unmanned Aerial Manipulator Demo video ?? Our video on Youtube and bilibili demonstrates the evaluation of

Intelligent Vision for Robotics in Complex Environment 12 Dec 18, 2022