source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Overview

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

This repository contains source code and pre-trained/fine-tuned checkpoints for NAACL 2021 paper "LightningDOT". It currently supports fine-tuning on MSCOCO and Flickr30k. Pre-training code and a demo for FULL MSCOCO retrieval are also available.

Overview of LightningDot

Some code in this repo is copied/modifed from UNITER and DPR.

If you find the code useful for your research, please consider citing:

    @inproceedings{sun2021lightningdot,
    title={LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval},
    author={Sun, Siqi and Chen, Yen-Chun and Li, Linjie and Wang, Shuohang and Fang, Yuwei and Liu, Jingjing},
    booktitle={NAACL-HLT},
    year={2021}
    } 

UNITER Environment

To run UNITER for re-ranker, please set a seperate environment based on this repo.

All pre-training and fine-tuning are using a conda environment that can be created as follows.

Environment

Under the project home folder, first run (depends on your CUDA version)

conda env create -f DVL.yml
conda activate DVL
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

, then install apex by

cd ../
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

In order to use distributed training, under super user, install mpi by

rm -r /usr/local/mpi

wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.4.tar.gz 
tar -xvf openmpi-4.0.4.tar.gz 
cd openmpi-4.0.4
./configure --prefix=/usr/local/mpi --enable-orterun-prefix-by-default --disable-getpwuid --with-verbs
sudo apt-get install libnuma-dev
sudo make -j$(nproc) all && sudo make install
ldconfig

cd -
rm -r openmpi-4.0.4
rm openmpi-4.0.4.tar.gz

export OPENMPI_VERSION=4.0.4

. Finally install horovod by

echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" \
    > /etc/apt/sources.list.d/nvidia-ml.list
apt update
apt install libnccl2=2.4.7-1+cuda10.1 libnccl-dev=2.4.7-1+cuda10.1

export PATH=/usr/local/mpi/bin:$PATH
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod
ldconfig

If you see Error Msg: /usr/bin/ld: cannot find -lnuma, then try

sudo apt-get install libnuma-dev

Download Checkpoints and Meta file

Under project home folder, run

bash bash/download_data.sh

Currently the raw image files and extracted features are not available to download.

Pre-training

Modify the config file at ./config/pretrain-alldata-base.json accordingly, and run

horovodrun -np $NUM_GPU python pretrain.py --config ./config/pretrain-alldata-base.json

. Typically you need to change img_checkpoint, output_dir, and train/val datasets.

A pre-trained checkpoint is availabe at LightningDot.

The checkpoints for UNITER-base and BERT-base can be obtaind from UNITER-base and BERT-base.

Fine-tuning on MSCOCO and Flickr30k

We provide an sample bash script at ./bash/train_flickr.sh, which we used to search for learning rate.

Two checkpoints that have been already fine-tuned on MSCOCO and Flickr30k are also provided at COCO-FT and Flickr-FT.

Evaluation

Run

python eval_itm.py  your_eval_config.json  your_checkpoint.pt 

to run the evaluation. We provide three examples that could be obtained solely based on checkpoints and configurations provided in this repo.

Note that your results may NOT be exactly the same with results below due to different machine/environment configurations (but they should be close enough).

  • Zero-shot evaluation on Flickr30k:
python eval_itm.py ./config/flickr30k_eval_config.json ./data/model/LightningDot.pt
image retrieval recall = {1: 0.5332, 5: 0.8058, 10: 0.8804}
txt retrieval recall = {1: 0.682, 5: 0.891, 10: 0.94}.
  • Fine-tune on flickr, evaluate on flickr:
python eval_itm.py ./config/flickr30k_eval_config.json ./data/model/flickr-ft.pt
image retrieval recall = {1: 0.699, 5: 0.911, 10: 0.9518}
txt retrieval recall = {1: 0.839, 5: 0.972, 10: 0.986}
  • Fine-tune on MSCOCO, evaluate on MSCOCO:
python eval_itm.py ./config/coco_eval_config.json ./data/model/coco-ft.pt
image retrieval recall = {1: 0.4577, 5: 0.7453, 10: 0.8379}
txt retrieval recall = {1: 0.6004, 5: 0.8516, 10: 0.9172}

Meta File

You may need the meta file used in some scripts, which can be obtained from MSCOCO-Meta and Flickr-Meta.

Demo

TODO

Re-Ranking

Note that Re-ranker is using prediction file generated from UNITER or OSCAR due to use of different pytorch version.

Re-ranking script is currently provided as is, and has not been cleaned yet.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

MIT

Comments
  • datasets missing

    datasets missing

    Hi! I see in the config file that it's expecting multiple train and val datasets (ending in ".db"). Could you tell me where can I download them or what does the data look like (is it one per line, tab separated?) Thanks in advance, David

    opened by dahrs 5
  • Cross-modal Retrieval Objective (CMR)

    Cross-modal Retrieval Objective (CMR)

    Can you point me to the place in your code where CMR is implemented? You used CMR + VMLM + SMRM for the pre-training, according to the paper. However, CMR is not part of your supported tasks. Am I missing something?

    opened by mojivalipour 6
  • Paper pretraining config

    Paper pretraining config

    It appears that config/pretrain-alldata-base.json is not your paper pretraining configuration. There is no cls_concat setting in this configuration file, so it uses the default value. As a result, unlike your paper, this configuration uses MLM instead of VMLM. Could you please provide a correct configuration that reproduces your results?

    opened by mojivalipour 1
  • coco_cap dataset

    coco_cap dataset

    Hello,

    Just a quick question. As far as I can see, coco_cap dataset is not shared in lightningDOT repo. I have been able to download coco_cap dataset from Uniter repo (with ./scripts/download_indomain.sh). May I know if I can use the coco_cap dataset from Uniter for lightningDOT training? Kindly let me know. Thank you.

    opened by mattgithub1919 1
  • Demo Notebook

    Demo Notebook

    Hi @intersun @ChenRocks

    Can you share the data checkpoints used in Image_retriever.ipynb? And if you can share any updated notebook that can be used as a demo to check what actual images are retrieved for a given single query then that will be really helpful.

    Thanks

    opened by shivangibithel 1
  • Gradiant overflow in finetuning

    Gradiant overflow in finetuning

    Hi,

    Thank you very much for the great work, and for sharing the fine-tuning data last week. I got an issue when I tried to fine-tune and evaluate the model on the flickr30k, using:

    # I just run the second command (GPU:1 lr: 2e-5 )
    ./bash/train_flickr.sh
    

    The epoch start normally at the beginning, but suddenly the loss strat increasing at epoch 6:

    
    Epoch: 6: Step: 555/1511, loss=0.527620, loss_nce=0.527620, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 559/1511, loss=0.727350, loss_nce=0.727350, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 563/1511, loss=0.570808, loss_nce=0.570808, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 567/1511, loss=0.393095, loss_nce=0.393095, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 571/1511, loss=0.674848, loss_nce=0.674848, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 575/1511, loss=0.499143, loss_nce=0.499143, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 579/1511, loss=0.594417, loss_nce=0.594417, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 583/1511, loss=0.637567, loss_nce=0.637567, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 587/1511, loss=0.848309, loss_nce=0.848309, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 591/1511, loss=0.859852, loss_nce=0.859852, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 595/1511, loss=0.551946, loss_nce=0.551946, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 599/1511, loss=0.569656, loss_nce=0.569656, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 603/1511, loss=0.811136, loss_nce=0.811136, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 607/1511, loss=0.926843, loss_nce=0.926843, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 611/1511, loss=0.878590, loss_nce=0.878590, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 615/1511, loss=0.930382, loss_nce=0.930382, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 619/1511, loss=1.138345, loss_nce=1.138345, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 623/1511, loss=1.101084, loss_nce=1.101084, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 627/1511, loss=0.899013, loss_nce=0.899013, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 631/1511, loss=1.180095, loss_nce=1.180095, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 635/1511, loss=1.371186, loss_nce=1.371186, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 639/1511, loss=1.614157, loss_nce=1.614157, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 643/1511, loss=1.712646, loss_nce=1.712646, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 647/1511, loss=2.504568, loss_nce=2.504568, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 651/1511, loss=2.761936, loss_nce=2.761936, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 655/1511, loss=4.210203, loss_nce=4.210203, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 659/1511, loss=6.195764, loss_nce=6.195764, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 663/1511, loss=8.189028, loss_nce=8.189028, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 667/1511, loss=12.597887, loss_nce=12.597887, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 671/1511, loss=11.704583, loss_nce=11.704583, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 675/1511, loss=13.765331, loss_nce=13.765331, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 679/1511, loss=18.207155, loss_nce=18.207155, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 683/1511, loss=16.359169, loss_nce=16.359169, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 687/1511, loss=20.523600, loss_nce=20.523600, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 691/1511, loss=27.668240, loss_nce=27.668240, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 695/1511, loss=30.855385, loss_nce=30.855385, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 699/1511, loss=35.086441, loss_nce=35.086441, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 703/1511, loss=30.574892, loss_nce=30.574892, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 707/1511, loss=52.953876, loss_nce=52.953876, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 711/1511, loss=40.207417, loss_nce=40.207417, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 715/1511, loss=53.108303, loss_nce=53.108303, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 719/1511, loss=47.695160, loss_nce=47.695160, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 723/1511, loss=45.211182, loss_nce=45.211182, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 727/1511, loss=49.979271, loss_nce=49.979271, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 731/1511, loss=45.502415, loss_nce=45.502415, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 735/1511, loss=42.128304, loss_nce=42.128304, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 739/1511, loss=57.433262, loss_nce=57.433262, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 743/1511, loss=70.618607, loss_nce=70.618607, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 747/1511, loss=52.835541, loss_nce=52.835541, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 751/1511, loss=57.775532, loss_nce=57.775532, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 755/1511, loss=75.909271, loss_nce=75.909271, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 759/1511, loss=47.627548, loss_nce=47.627548, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 763/1511, loss=55.984451, loss_nce=55.984451, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 767/1511, loss=39.634636, loss_nce=39.634636, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 771/1511, loss=43.213181, loss_nce=43.213181, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 775/1511, loss=37.875175, loss_nce=37.875175, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 779/1511, loss=45.833000, loss_nce=45.833000, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 783/1511, loss=42.249699, loss_nce=42.249699, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 787/1511, loss=49.242207, loss_nce=49.242207, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 791/1511, loss=59.082058, loss_nce=59.082058, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 795/1511, loss=44.366467, loss_nce=44.366467, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 799/1511, loss=61.286034, loss_nce=61.286034, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 803/1511, loss=65.236374, loss_nce=65.236374, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 807/1511, loss=55.568848, loss_nce=55.568848, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 811/1511, loss=81.588463, loss_nce=81.588463, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 815/1511, loss=138.267487, loss_nce=138.267487, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 819/1511, loss=205.398163, loss_nce=205.398163, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 823/1511, loss=106.781647, loss_nce=106.781647, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 827/1511, loss=114.370003, loss_nce=114.370003, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 831/1511, loss=85.564255, loss_nce=85.564255, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 835/1511, loss=58.856918, loss_nce=58.856918, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 839/1511, loss=48.463295, loss_nce=48.463295, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 843/1511, loss=49.180916, loss_nce=49.180916, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 847/1511, loss=42.912064, loss_nce=42.912064, loss_kd=0.0, lr=0.000013
    Epoch: 6: Step: 851/1511, loss=33.153042, loss_nce=33.153042, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 855/1511, loss=49.714306, loss_nce=49.714306, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 859/1511, loss=30.225197, loss_nce=30.225197, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 863/1511, loss=40.542446, loss_nce=40.542446, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 867/1511, loss=42.657013, loss_nce=42.657013, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 871/1511, loss=29.824253, loss_nce=29.824253, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 875/1511, loss=38.451778, loss_nce=38.451778, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 879/1511, loss=30.017517, loss_nce=30.017517, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 883/1511, loss=30.451855, loss_nce=30.451855, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 887/1511, loss=24.856079, loss_nce=24.856079, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 891/1511, loss=26.671665, loss_nce=26.671665, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 895/1511, loss=24.949318, loss_nce=24.949318, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 899/1511, loss=24.966484, loss_nce=24.966484, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 903/1511, loss=31.370058, loss_nce=31.370058, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 907/1511, loss=54.106686, loss_nce=54.106686, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 911/1511, loss=27.364002, loss_nce=27.364002, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 915/1511, loss=31.717720, loss_nce=31.717720, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 919/1511, loss=32.850029, loss_nce=32.850029, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 923/1511, loss=36.481514, loss_nce=36.481514, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 927/1511, loss=36.080856, loss_nce=36.080856, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 931/1511, loss=43.164818, loss_nce=43.164818, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 935/1511, loss=82.020950, loss_nce=82.020950, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 939/1511, loss=36.782185, loss_nce=36.782185, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 943/1511, loss=32.322525, loss_nce=32.322525, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 947/1511, loss=37.928696, loss_nce=37.928696, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 951/1511, loss=37.906788, loss_nce=37.906788, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 955/1511, loss=40.255390, loss_nce=40.255390, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 959/1511, loss=36.430790, loss_nce=36.430790, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 963/1511, loss=34.600498, loss_nce=34.600498, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 967/1511, loss=39.713654, loss_nce=39.713654, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 971/1511, loss=46.052864, loss_nce=46.052864, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 975/1511, loss=37.347187, loss_nce=37.347187, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 979/1511, loss=41.355392, loss_nce=41.355392, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 983/1511, loss=45.157066, loss_nce=45.157066, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 987/1511, loss=32.828815, loss_nce=32.828815, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 991/1511, loss=55.191578, loss_nce=55.191578, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 995/1511, loss=49.200516, loss_nce=49.200516, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 999/1511, loss=34.357136, loss_nce=34.357136, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1003/1511, loss=37.069489, loss_nce=37.069489, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1007/1511, loss=45.910133, loss_nce=45.910133, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1011/1511, loss=41.456188, loss_nce=41.456188, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1015/1511, loss=60.424339, loss_nce=60.424339, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1019/1511, loss=35.902451, loss_nce=35.902451, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1023/1511, loss=43.260071, loss_nce=43.260071, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1027/1511, loss=39.661362, loss_nce=39.661362, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1031/1511, loss=64.590012, loss_nce=64.590012, loss_kd=0.0, lr=0.000012
    Epoch: 6: Step: 1035/1511, loss=34.630993, loss_nce=34.630993, loss_kd=0.0, lr=0.000012
    

    and continue like this for the end of the training, then the code crash at the evaluation

    Epoch: 14: Step: 1459/1511, loss=1448.427734, loss_nce=1448.427734, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1463/1511, loss=1645.300171, loss_nce=1645.300171, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1467/1511, loss=1398.610107, loss_nce=1398.610107, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1471/1511, loss=1394.673096, loss_nce=1394.673096, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1475/1511, loss=2031.539795, loss_nce=2031.539795, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1479/1511, loss=1238.061768, loss_nce=1238.061768, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1483/1511, loss=1475.774780, loss_nce=1475.774780, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1487/1511, loss=1240.767578, loss_nce=1240.767578, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1491/1511, loss=1186.123657, loss_nce=1186.123657, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1495/1511, loss=1728.326904, loss_nce=1728.326904, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1499/1511, loss=1731.635498, loss_nce=1731.635498, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1503/1511, loss=1679.102173, loss_nce=1679.102173, loss_kd=0.0, lr=0.000000
    Epoch: 14: Step: 1507/1511, loss=1465.885498, loss_nce=1465.885498, loss_kd=0.0, lr=0.000000
    Total data indexed 1014
    Total data indexed 5070
    Saved checkpoint at /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.best.pt
    Saved checkpoint at /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.last.pt
    test dataset len = 5000, dataloader len = 63
    Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.
    
    Defaults for this optimization level are:
    enabled                : True
    opt_level              : O2
    cast_model_type        : torch.float16
    patch_torch_functions  : False
    keep_batchnorm_fp32    : True
    master_weights         : True
    loss_scale             : dynamic
    Processing user overrides (additional kwargs that are not None)...
    After processing overrides, optimization options are:
    enabled                : True
    opt_level              : O2
    cast_model_type        : torch.float16
    patch_torch_functions  : False
    keep_batchnorm_fp32    : True
    master_weights         : True
    loss_scale             : dynamic
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 512.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.0
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.5
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.25
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.125
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0625
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.03125
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.015625
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0078125
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.00390625
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.001953125
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0009765625
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.00048828125
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.000244140625
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0001220703125
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 6.103515625e-05
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.0517578125e-05
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.52587890625e-05
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.62939453125e-06
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.814697265625e-06
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.9073486328125e-06
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.5367431640625e-07
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.76837158203125e-07
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.384185791015625e-07
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1920928955078125e-07
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.960464477539063e-08
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.9802322387695312e-08
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4901161193847656e-08
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.450580596923828e-09
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.725290298461914e-09
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.862645149230957e-09
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.313225746154785e-10
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.656612873077393e-10
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.3283064365386963e-10
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1641532182693481e-10
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.820766091346741e-11
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.9103830456733704e-11
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4551915228366852e-11
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.275957614183426e-12
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.637978807091713e-12
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.8189894035458565e-12
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.094947017729282e-13
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.547473508864641e-13
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.2737367544323206e-13
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1368683772161603e-13
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.684341886080802e-14
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.842170943040401e-14
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4210854715202004e-14
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.105427357601002e-15
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.552713678800501e-15
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.7763568394002505e-15
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.881784197001252e-16
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.440892098500626e-16
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.220446049250313e-16
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1102230246251565e-16
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.551115123125783e-17
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.7755575615628914e-17
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.3877787807814457e-17
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 6.938893903907228e-18
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.469446951953614e-18
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.734723475976807e-18
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.673617379884035e-19
    Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.336808689942018e-19
    Traceback (most recent call last):
      File "train_itm.py", line 369, in <module>
        args.txt_retrieval, img2txt)
    AttributeError: 'Namespace' object has no attribute 'txt_retrieval'
    

    However, I tried to evaluate the best model biencoder.best.pt using the following command:

    python eval_itm.py ./config/flickr30k_eval_config.json /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.best.pt
    

    and get the following results:

    Total data indexed 1000
    Total data indexed 5000
    time cost = 10.698805809020996s
    average loss = nan, accuracy = 0.0126
    indexed  1000 data
    image retrieval recall = {1: 0.001, 5: 0.005, 10: 0.01}
    txt retrieval recall = {1: 0.001, 5: 0.005, 10: 0.01}
    
    opened by ghaddarAbs 2
  • Pretraining dataset

    Pretraining dataset

    Hi,

    Thank you very much for the great work, and for making your code publicly available. I am trying to run the code to reproduce the results, however, the pre-training datasets are missing from the download script. Is it possible to upload the pretraining data, similar to what you did for the fine-tuning ones last week?

    In fact, I tried to use coco and vg datasets distributed by the UNITER code, while adjusting the train/val dataset in ./config/pretrain-alldata-base.json as follow:

    {
           "name": "coco_cap",
           "db": [
               "/path/to//uniter/txt_db/pretrain_coco_train.db/",
               "/path/to//uniter/txt_db/pretrain_coco_val.db/"
           ],
           "img": [
               "/path/to//uniter/img_db/coco_train2014/",
               "/path/to//uniter/img_db/coco_val2014/"
           ],
           "tasks": [
               "itm",
               "mlm",
               "mrfr",
               "mrckl"
           ],
           "mix_ratio": [
               16,
               8,
               4,
               4
           ]
       },
       {
           "name": "vg_cap",
           "db": [
               "/path/to//uniter/txt_db/pretrain_vg_train.db/"
           ],
           "img": [
               "/path/to//uniter/img_db/vg/"
           ],
           "tasks": [
               "itm",
               "mlm",
               "mrfr",
               "mrckl"
           ],
           "mix_ratio": [
               16,
               12,
               6,
               6
           ]
       }
    ],
    "val_datasets": [
       {
           "name": "coco_cap",
           "db": [
               "/path/to//uniter/txt_db/pretrain_coco_val.db/"
           ],
           "img": [
               "/path/to//uniter/img_db/coco_val2014/"
           ],
           "tasks": [
               "itm",
               "mlm",
               "mrfr",
               "mrckl"
           ]
       },
       {
           "name": "vg_cap",
           "db": [
               "/path/to//uniter/txt_db/pretrain_vg_val.db/"
           ],
           "img": [
               "/path/to//uniter/img_db/vg/"
           ],
           "tasks": [
               "itm",
               "mlm",
               "mrfr",
               "mrckl"
           ]
       }
    
    
    

    Surprisingly, the pretraining code worked, but I get another issue. I got gradient overflow at the beginning of the training and then this error at 3%: ZeroDivisionError: float division by zero

    Here are some logs for gradient overflow

    [1,2]<stdout>:Gradient overflow.  Skipping step, loss scaler 5 reducing loss scale to 4.3601508761683463e-106
    [1,1]<stdout>:Gradient overflow.  Skipping step, loss scaler 5 reducing loss scale to 4.3601508761683463e-106
    [1,3]<stdout>:Gradient overflow.  Skipping step, loss scaler 5 reducing loss scale to 4.3601508761683463e-106
    [1,0]<stdout>:Gradient overflow.  Skipping step, loss scaler 5 reducing loss scale to 4.3601508761683463e-106
      3%|▎         | 8792/300000 [2:51:23<79:18:44,  1.02it/s][1,1]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,0]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,3]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,2]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,3]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,2]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,1]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,0]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,1]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,2]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,0]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,3]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,3]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    [1,2]<stdout>:Inf/Nan in loss/mrfr_coco_cap
    

    and here is the log of the error:

    [1,0]<stderr>:ZeroDivisionError: float division by zero
      3%|▎         | 8856/300000 [2:52:34<94:33:17,  1.17s/it]--------------------------------------------------------
    ------------------
    Primary job  terminated normally, but 1 process returned
    a non-zero exit code. Per user-direction, the job has been aborted.
    

    I understand why this error is happening, the loss gradually gets smaller until it became 0. However, I can't understand what to do to solve this error? I looked at the issues in apex and it seems that I have bad input that is causing the issue. So my conclusion was that I am not using the correct pretraining dataset.

    Can you please share the pretraining data?

    Thanks

    opened by ghaddarAbs 5
covid question answering datasets and fine tuned models

Covid-QA Fine tuned models for question answering on Covid-19 data. Hosted Inference This model has been contributed to huggingface.Click here to see

Abhijith Neil Abraham 19 Sep 9, 2021
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
Tool cek opsi checkpoint facebook!

tool apa ini? cek_opsi_facebook adalah sebuah tool yang mengecek opsi checkpoint akun facebook yang terkena checkpoint! tujuan dibuatnya tool ini? too

Muhammad Latif Harkat 2 Jul 17, 2022
Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

null 47 Jun 30, 2022
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets Introduction This repo contains the source code accompanying the paper: Well-tuned Sim

null 52 Jan 4, 2023
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

THUNLP 31 Nov 19, 2022
Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

ATP: AMRize Then Parse! Enhancing AMR Parsing with PseudoAMRs Hi this is the source code of our paper "ATP: AMRize Then Parse! Enhancing AMR Parsing w

Chen Liang 13 Nov 23, 2022
Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Hurdles to Progress in Long-form Question Answering This repository contains the official scripts and datasets accompanying our NAACL 2021 paper, "Hur

Kalpesh Krishna 41 Nov 8, 2022
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 7, 2023
Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

LongDocSum Code for NAACL 2021 paper "Efficient Attentions for Long Document Summarization" This repository contains data and models needed to reprodu

null 56 Jan 2, 2023
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Unsupervised-Multi-hop-QA This repository contains code and models for the paper: Unsupervised Multi-hop Question Answering by Question Generation (NA

Liangming Pan 70 Nov 27, 2022
Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Maths from examples - Learning advanced mathematical computations from examples This is the source code and data sets relevant to the paper Learning a

Facebook Research 171 Nov 23, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022