Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Zineng Tang

Last update: Dec 20, 2022

Related tags

Deep Learning VidLanKD

Overview

VidLanKD

Implementation of VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer by Zineng Tang, Jaemin Cho, Hao Tan, Mohit Bansal.

Setup

# Create python environment (optional)
conda create -n vidlankd python=3.7

# Install python dependencies
pip install -r requirements.txt

To speed up the training, we use mixed precision with Apex.

git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

Dataset Preparation

Text Dataset

We provide scripts to obtain datasets "wiki103" and "wiki".

Wiki103, a seleted subset of English Wikipedia.

bash data/wiki103/get_data_cased.bash

English Wikipedia. The scripts are modified from XLM.

bash data/wiki/get_data_cased.bash en

Video Dataset

Howto100m where you can download official captions and videos features.

Video Features Extraction Code

To be updated.

We extracted our 2D-level video features with ResNet152 from torchvision.
We extracted our 3D-level video features with 3D-RexNext.

Downstream tasks

GLUE dataset

Download dataset

python download_glue_data.py --data_dir data/glue --tasks all

Training

Teacher model pre-training

# bash scripts/small_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/small_vlm_howto100m.bash 0,1,2,3 howto100m_bert_small_vokenhinge
# bash scripts/base_vlm_howto100m.bash $GPUS #teacher_SNAP_PATH
bash scripts/base_vlm_howto100m.bash 0,1,2,3 howto100m_bert_base_vokenhinge

Knowledge transfer to student model

# bash scripts/small_vlm_wiki103.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/small_vlm_wiki103.bash 0,1,2,3 howto100m_bert_small_vokenhinge/checkpoint-epoch0019 wiki103_bert_small_vokenmmd
# bash scripts/base_vlm_wiki.bash $GPUS #teacher_SNAP_PATH #student_SNAP_PATH
bash scripts/base_vlm_wiki.bash 0,1,2,3 howto100m_bert_base_vokenhinge/checkpoint-epoch0019 wiki_bert_base_vokenmmd

Finetuning on GLUE tasks

# bash scripts/run_glue_at_epoch.bash $GPUS $NumTrainEpochs $SNAP_PATH                        
bash scripts/run_glue_at_epoch.bash 0,1,2,3 3 snap/vlm/wiki103_bert_small_vokenmmd/checkpoint-epoch0019

Acknowledgements

Part of the code is built based on vokenization, huggingface transformers, and facebook faiss.

Comments

Question about kd1 student head

Hi @zinengtang Interesting work. One question about the kd1 student head implementation. In your paper you mentioned that the vector representations are from the last hidden state of language models

In your code https://github.com/zinengtang/VidLanKD/blob/46bae35e1342293ee0d3f5035b497f752ea267c1/vlm/model.py#L172 There is one extra student head. Seems the sequence_output and kd_pred1 have exact same dimensions. Could you explain why this extra head is necessary?

opened by ylli0218 4
About GLUE data

Hi, I'm interested in your work. I found one thing that is very confusing in download_glue_data.py. I notice that for downloading QQP the address is https://dl.fbaipublicfiles.com/glue/data/STS-B.zip' while for downloading STS-B the address is 'https://dl.fbaipublicfiles.com/glue/data/QQP-clean.zip'. Does this indicate the GLUE results in your paper are reversed for QQP and STS-B? Thanks in advance.

opened by yxgnahz 4
Unable to reproduce the results in the paper

Hi, I used the student model you released and fine-tuned it on GLUE. My experimental sitting is consistent with that in the paper “train 3 epochs with a learning rate of 1e-4 and a batch-size of 32”. Unfortunately, my experimental results are much worse than those in the paper.

My results (The BERT_baseline is the one from Vokenization): Results in paper:

In addition, I also noticed that the random seed has a great impact on the performance of VidLanKD on QNLI and QQP tasks. I wonder why? Do you have any other settings that need to pay attention to during the fine-tuning?

opened by VickiCui 4
CLIP video features

Hi,

I read the paper and was really fascinated by the work. I was trying to use your model for my own project. I read that you use CLIP features on images as well. I was wondering how to extract the CLIP features as I just saw ResNet based features on the video_extractor repository. Am I missing something? Any help would be appreciated. Thanks!

opened by arshiyaaggarwal 1
the release of video features and small preatrained models

Hi, I am very interested in your work and plan to reproduce it. Would you mind releasing the video features extracted by 2D&3D encoder, and small (BERT 6L/512H) teacher & student model? Thanks!

opened by yellow-binary-tree 1

Owner

Zineng Tang

GitHub

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

75 Nov 2, 2022

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

1.3k Dec 31, 2022

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

22 Nov 25, 2022

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

0 Mar 1, 2022

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

38 Dec 12, 2022

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

SGLKT-VisDial Pytorch Implementation for the paper: Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer Gi-Cheon Kang, Junseok P

9 Jul 5, 2022

This repo will contain code to reproduce and build upon understanding transfer learning

What is being transferred in transfer learning? This repo contains the code for the following paper: Behnam Neyshabur*, Hanie Sedghi*, Chiyuan Zhang*.

4 Jun 16, 2021

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

Attention Transfer PyTorch code for "Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Tran

1.4k Dec 23, 2022

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation This repository is the official implementation of CVPR 2021 paper:

9 Nov 14, 2022

A PaddlePaddle version of Neural Renderer, refer to its PyTorch version

Neural 3D Mesh Renderer in PadddlePaddle A PaddlePaddle version of Neural Renderer, refer to its PyTorch version Install Run: pip install neural-rende

13 Jul 12, 2022

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

SHOT++ Code for our TPAMI submission "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer" that is ext

75 Dec 16, 2022

Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Transfer-Learn is an open-source and well-documented library for Transfer Learning. It is based on pure PyTorch with high performance and friendly API. Our code is pythonic, and the design is consistent with torchvision. You can easily develop new algorithms, or readily apply existing algorithms.

2.2k Jan 3, 2023

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Transfer Style API It's an API to use with Tranfer Style App, where you can use

1 Feb 13, 2022

Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Visualizing Adapted Knowledge in Domain Transfer @inproceedings{hou2021visualizing, title={Visualizing Adapted Knowledge in Domain Transfer}, auth

80 Dec 25, 2022

Does Pretraining for Summarization Reuqire Knowledge Transfer?

Pretraining summarization models using a corpus of nonsense

Approximately Correct Machine Intelligence (ACMI) Lab

12 Dec 19, 2022

Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

TSOD Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer" Usage For training, open train_test, run p

2 Dec 23, 2021

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

DataFree A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation" Authors: Gongfa

47 Jan 9, 2023

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

FunMatch-Distillation TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A g

67 Dec 20, 2022

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network

KaGRMN-DSG_ABSA This repository contains the PyTorch source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated

4 May 20, 2022

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

Related tags

Overview

VidLanKD

Setup

Dataset Preparation

Text Dataset

Video Dataset

Video Features Extraction Code

Downstream tasks

GLUE dataset

Training

Acknowledgements

Comments

Question about kd1 student head

About GLUE data

Unable to reproduce the results in the paper

CLIP video features

the release of video features and small preatrained models

Owner

Zineng Tang

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

🌈 PyTorch Implementation for EMNLP'21 Findings "Reasoning Visual Dialog with Sparse Graph Learning and Knowledge Transfer"

This repo will contain code to reproduce and build upon understanding transfer learning

Improving Convolutional Networks via Attention Transfer (ICLR 2017)

The official implementation of CVPR 2021 Paper: Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation.

A PaddlePaddle version of Neural Renderer, refer to its PyTorch version

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

Transfer-Learn is an open-source and well-documented library for Transfer Learning.

Transfer style api - An API to use with Tranfer Style App, where you can use two image and transfer the style

Code for CVPR2021 "Visualizing Adapted Knowledge in Domain Transfer". Visualization for domain adaptation. #explainable-ai

Does Pretraining for Summarization Reuqire Knowledge Transfer?

Code for the ICME 2021 paper "Exploring Driving-Aware Salient Object Detection via Knowledge Transfer"

[IJCAI-2021] A benchmark of data-free knowledge distillation from paper "Contrastive Model Inversion for Data-Free Knowledge Distillation"

TF2 implementation of knowledge distillation using the "function matching" hypothesis from the paper Knowledge distillation: A good teacher is patient and consistent by Beyer et al.

Source Code for our paper: Understand me, if you refer to Aspect Knowledge: Knowledge-aware Gated Recurrent Memory Network