VidLanKD
Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal.
Setup
# Create python environment (optional)
conda create -n vidlankd python=3.7
# Install python dependencies
pip install -r requirements.txt
To speed up the training, we use mixed precision with Apex.
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Dataset Preparation
Text Dataset
We provide scripts to obtain datasets "wiki103" and "wiki".
Wiki103, a seleted subset of English Wikipedia.
bash data/wiki103/get_data_cased.bash
English Wikipedia. The scripts are modified from XLM.
bash data/wiki/get_data_cased.bash en
Video Dataset
Howto100m where you can download official captions and videos features.
Video Features Extraction Code
To be updated.
- We extracted our 2D-level video features with ResNet152 from torchvision.
- We extracted our 3D-level video features with 3D-RexNext.
Downstream tasks
GLUE dataset
Download dataset
python download_glue_data.py --data_dir data/glue --tasks all
Training
Teacher model pre-training
# bash scripts/small_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/small_vlm_howto100m.bash 0,1,2,3 howto100m_bert_small_vokenhinge
# bash scripts/base_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/base_vlm_howto100m.bash 0,1,2,3 howto100m_bert_base_vokenhinge
Knowledge transfer to student model
# bash scripts/small_vlm_wiki103.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/small_vlm_wiki103.bash 0,1,2,3 howto100m_bert_small_vokenhinge/checkpoint-epoch0019 wiki103_bert_small_vokenmmd
# bash scripts/base_vlm_wiki.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/base_vlm_wiki.bash 0,1,2,3 howto100m_bert_base_vokenhinge/checkpoint-epoch0019 wiki_bert_base_vokenmmd
Finetuning on GLUE tasks
# bash scripts/run_glue_at_epoch.bash $GPUS $NumTrainEpochs $SNAP_PATH
bash scripts/run_glue_at_epoch.bash 0,1,2,3 3 snap/vlm/wiki103_bert_small_vokenmmd/checkpoint-epoch0019
Acknowledgements
Part of the code is built based on vokenization, huggingface transformers, and facebook faiss.