Region_Learner
The Pytorch implementation for "Video-Text Pre-training with Learned Regions" (arxiv)
We are still cleaning up the code further and preparing for pre-training weights.
Preparation
Overall, this code is built on PyTorch with DistributedDataParallel (DDP).
- Create conda env and install required packages via
sh install_env.sh
- Create some important folders
mkdir data
(you can symlink huge datasets to this folder)mkdir results
Finetuning (on MSR-VTT)
- Download data (see https://github.com/m-bain/frozen-in-time#-finetuning-benchmarks-msr-vtt)
- Run
sh finetune.sh
Pre-training
- Download WebVid-2M (see https://github.com/m-bain/webvid)
- Download CC-3M (see https://ai.google.com/research/ConceptualCaptions/download)
- Run
sh pre-training.sh
Pre-trained Weights
Coming soon.
Acknowledgements
This code is based off Frozen in Time