LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

This repository contains source code and pre-trained/fine-tuned checkpoints for NAACL 2021 paper "LightningDOT". It currently supports fine-tuning on MSCOCO and Flickr30k. Pre-training code and a demo for FULL MSCOCO retrieval are also available.

Some code in this repo is copied/modifed from UNITER and DPR.

If you find the code useful for your research, please consider citing:

    @inproceedings{sun2021lightningdot,
    title={LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval},
    author={Sun, Siqi and Chen, Yen-Chun and Li, Linjie and Wang, Shuohang and Fang, Yuwei and Liu, Jingjing},
    booktitle={NAACL-HLT},
    year={2021}
    }

UNITER Environment

To run UNITER for re-ranker, please set a seperate environment based on this repo.

All pre-training and fine-tuning are using a conda environment that can be created as follows.

Environment

Under the project home folder, first run (depends on your CUDA version)

conda env create -f DVL.yml
conda activate DVL
conda install pytorch torchvision cudatoolkit=10.1 -c pytorch

, then install apex by

cd ../
git clone https://github.com/NVIDIA/apex
cd apex
pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./

In order to use distributed training, under super user, install mpi by

rm -r /usr/local/mpi

wget https://download.open-mpi.org/release/open-mpi/v4.0/openmpi-4.0.4.tar.gz 
tar -xvf openmpi-4.0.4.tar.gz 
cd openmpi-4.0.4
./configure --prefix=/usr/local/mpi --enable-orterun-prefix-by-default --disable-getpwuid --with-verbs
sudo apt-get install libnuma-dev
sudo make -j$(nproc) all && sudo make install
ldconfig

cd -
rm -r openmpi-4.0.4
rm openmpi-4.0.4.tar.gz

export OPENMPI_VERSION=4.0.4

. Finally install horovod by

echo "deb http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1604/x86_64 /" \
    > /etc/apt/sources.list.d/nvidia-ml.list
apt update
apt install libnccl2=2.4.7-1+cuda10.1 libnccl-dev=2.4.7-1+cuda10.1

export PATH=/usr/local/mpi/bin:$PATH
HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_WITH_PYTORCH=1 pip install --no-cache-dir horovod
ldconfig

If you see Error Msg: /usr/bin/ld: cannot find -lnuma, then try

sudo apt-get install libnuma-dev

Download Checkpoints and Meta file

Under project home folder, run

bash bash/download_data.sh

Currently the raw image files and extracted features are not available to download.

Pre-training

Modify the config file at ./config/pretrain-alldata-base.json accordingly, and run

horovodrun -np $NUM_GPU python pretrain.py --config ./config/pretrain-alldata-base.json

. Typically you need to change img_checkpoint, output_dir, and train/val datasets.

A pre-trained checkpoint is availabe at LightningDot.

The checkpoints for UNITER-base and BERT-base can be obtaind from UNITER-base and BERT-base.

Fine-tuning on MSCOCO and Flickr30k

We provide an sample bash script at ./bash/train_flickr.sh, which we used to search for learning rate.

Two checkpoints that have been already fine-tuned on MSCOCO and Flickr30k are also provided at COCO-FT and Flickr-FT.

Evaluation

Run

python eval_itm.py  your_eval_config.json  your_checkpoint.pt

to run the evaluation. We provide three examples that could be obtained solely based on checkpoints and configurations provided in this repo.

Note that your results may NOT be exactly the same with results below due to different machine/environment configurations (but they should be close enough).

Zero-shot evaluation on Flickr30k:

python eval_itm.py ./config/flickr30k_eval_config.json ./data/model/LightningDot.pt

image retrieval recall = {1: 0.5332, 5: 0.8058, 10: 0.8804}
txt retrieval recall = {1: 0.682, 5: 0.891, 10: 0.94}.

Fine-tune on flickr, evaluate on flickr:

python eval_itm.py ./config/flickr30k_eval_config.json ./data/model/flickr-ft.pt

image retrieval recall = {1: 0.699, 5: 0.911, 10: 0.9518}
txt retrieval recall = {1: 0.839, 5: 0.972, 10: 0.986}

Fine-tune on MSCOCO, evaluate on MSCOCO:

python eval_itm.py ./config/coco_eval_config.json ./data/model/coco-ft.pt

image retrieval recall = {1: 0.4577, 5: 0.7453, 10: 0.8379}
txt retrieval recall = {1: 0.6004, 5: 0.8516, 10: 0.9172}

Meta File

You may need the meta file used in some scripts, which can be obtained from MSCOCO-Meta and Flickr-Meta.

Demo

TODO

Re-Ranking

Note that Re-ranker is using prediction file generated from UNITER or OSCAR due to use of different pytorch version.

Re-ranking script is currently provided as is, and has not been cleaned yet.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

License

MIT

Hi,

Thank you very much for the great work, and for sharing the fine-tuning data last week. I got an issue when I tried to fine-tune and evaluate the model on the flickr30k, using:

# I just run the second command (GPU:1 lr: 2e-5 )
./bash/train_flickr.sh

The epoch start normally at the beginning, but suddenly the loss strat increasing at epoch 6:


Epoch: 6: Step: 555/1511, loss=0.527620, loss_nce=0.527620, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 559/1511, loss=0.727350, loss_nce=0.727350, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 563/1511, loss=0.570808, loss_nce=0.570808, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 567/1511, loss=0.393095, loss_nce=0.393095, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 571/1511, loss=0.674848, loss_nce=0.674848, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 575/1511, loss=0.499143, loss_nce=0.499143, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 579/1511, loss=0.594417, loss_nce=0.594417, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 583/1511, loss=0.637567, loss_nce=0.637567, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 587/1511, loss=0.848309, loss_nce=0.848309, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 591/1511, loss=0.859852, loss_nce=0.859852, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 595/1511, loss=0.551946, loss_nce=0.551946, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 599/1511, loss=0.569656, loss_nce=0.569656, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 603/1511, loss=0.811136, loss_nce=0.811136, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 607/1511, loss=0.926843, loss_nce=0.926843, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 611/1511, loss=0.878590, loss_nce=0.878590, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 615/1511, loss=0.930382, loss_nce=0.930382, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 619/1511, loss=1.138345, loss_nce=1.138345, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 623/1511, loss=1.101084, loss_nce=1.101084, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 627/1511, loss=0.899013, loss_nce=0.899013, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 631/1511, loss=1.180095, loss_nce=1.180095, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 635/1511, loss=1.371186, loss_nce=1.371186, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 639/1511, loss=1.614157, loss_nce=1.614157, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 643/1511, loss=1.712646, loss_nce=1.712646, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 647/1511, loss=2.504568, loss_nce=2.504568, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 651/1511, loss=2.761936, loss_nce=2.761936, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 655/1511, loss=4.210203, loss_nce=4.210203, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 659/1511, loss=6.195764, loss_nce=6.195764, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 663/1511, loss=8.189028, loss_nce=8.189028, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 667/1511, loss=12.597887, loss_nce=12.597887, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 671/1511, loss=11.704583, loss_nce=11.704583, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 675/1511, loss=13.765331, loss_nce=13.765331, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 679/1511, loss=18.207155, loss_nce=18.207155, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 683/1511, loss=16.359169, loss_nce=16.359169, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 687/1511, loss=20.523600, loss_nce=20.523600, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 691/1511, loss=27.668240, loss_nce=27.668240, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 695/1511, loss=30.855385, loss_nce=30.855385, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 699/1511, loss=35.086441, loss_nce=35.086441, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 703/1511, loss=30.574892, loss_nce=30.574892, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 707/1511, loss=52.953876, loss_nce=52.953876, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 711/1511, loss=40.207417, loss_nce=40.207417, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 715/1511, loss=53.108303, loss_nce=53.108303, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 719/1511, loss=47.695160, loss_nce=47.695160, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 723/1511, loss=45.211182, loss_nce=45.211182, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 727/1511, loss=49.979271, loss_nce=49.979271, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 731/1511, loss=45.502415, loss_nce=45.502415, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 735/1511, loss=42.128304, loss_nce=42.128304, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 739/1511, loss=57.433262, loss_nce=57.433262, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 743/1511, loss=70.618607, loss_nce=70.618607, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 747/1511, loss=52.835541, loss_nce=52.835541, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 751/1511, loss=57.775532, loss_nce=57.775532, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 755/1511, loss=75.909271, loss_nce=75.909271, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 759/1511, loss=47.627548, loss_nce=47.627548, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 763/1511, loss=55.984451, loss_nce=55.984451, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 767/1511, loss=39.634636, loss_nce=39.634636, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 771/1511, loss=43.213181, loss_nce=43.213181, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 775/1511, loss=37.875175, loss_nce=37.875175, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 779/1511, loss=45.833000, loss_nce=45.833000, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 783/1511, loss=42.249699, loss_nce=42.249699, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 787/1511, loss=49.242207, loss_nce=49.242207, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 791/1511, loss=59.082058, loss_nce=59.082058, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 795/1511, loss=44.366467, loss_nce=44.366467, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 799/1511, loss=61.286034, loss_nce=61.286034, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 803/1511, loss=65.236374, loss_nce=65.236374, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 807/1511, loss=55.568848, loss_nce=55.568848, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 811/1511, loss=81.588463, loss_nce=81.588463, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 815/1511, loss=138.267487, loss_nce=138.267487, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 819/1511, loss=205.398163, loss_nce=205.398163, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 823/1511, loss=106.781647, loss_nce=106.781647, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 827/1511, loss=114.370003, loss_nce=114.370003, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 831/1511, loss=85.564255, loss_nce=85.564255, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 835/1511, loss=58.856918, loss_nce=58.856918, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 839/1511, loss=48.463295, loss_nce=48.463295, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 843/1511, loss=49.180916, loss_nce=49.180916, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 847/1511, loss=42.912064, loss_nce=42.912064, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 851/1511, loss=33.153042, loss_nce=33.153042, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 855/1511, loss=49.714306, loss_nce=49.714306, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 859/1511, loss=30.225197, loss_nce=30.225197, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 863/1511, loss=40.542446, loss_nce=40.542446, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 867/1511, loss=42.657013, loss_nce=42.657013, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 871/1511, loss=29.824253, loss_nce=29.824253, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 875/1511, loss=38.451778, loss_nce=38.451778, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 879/1511, loss=30.017517, loss_nce=30.017517, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 883/1511, loss=30.451855, loss_nce=30.451855, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 887/1511, loss=24.856079, loss_nce=24.856079, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 891/1511, loss=26.671665, loss_nce=26.671665, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 895/1511, loss=24.949318, loss_nce=24.949318, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 899/1511, loss=24.966484, loss_nce=24.966484, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 903/1511, loss=31.370058, loss_nce=31.370058, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 907/1511, loss=54.106686, loss_nce=54.106686, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 911/1511, loss=27.364002, loss_nce=27.364002, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 915/1511, loss=31.717720, loss_nce=31.717720, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 919/1511, loss=32.850029, loss_nce=32.850029, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 923/1511, loss=36.481514, loss_nce=36.481514, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 927/1511, loss=36.080856, loss_nce=36.080856, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 931/1511, loss=43.164818, loss_nce=43.164818, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 935/1511, loss=82.020950, loss_nce=82.020950, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 939/1511, loss=36.782185, loss_nce=36.782185, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 943/1511, loss=32.322525, loss_nce=32.322525, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 947/1511, loss=37.928696, loss_nce=37.928696, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 951/1511, loss=37.906788, loss_nce=37.906788, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 955/1511, loss=40.255390, loss_nce=40.255390, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 959/1511, loss=36.430790, loss_nce=36.430790, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 963/1511, loss=34.600498, loss_nce=34.600498, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 967/1511, loss=39.713654, loss_nce=39.713654, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 971/1511, loss=46.052864, loss_nce=46.052864, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 975/1511, loss=37.347187, loss_nce=37.347187, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 979/1511, loss=41.355392, loss_nce=41.355392, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 983/1511, loss=45.157066, loss_nce=45.157066, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 987/1511, loss=32.828815, loss_nce=32.828815, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 991/1511, loss=55.191578, loss_nce=55.191578, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 995/1511, loss=49.200516, loss_nce=49.200516, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 999/1511, loss=34.357136, loss_nce=34.357136, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1003/1511, loss=37.069489, loss_nce=37.069489, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1007/1511, loss=45.910133, loss_nce=45.910133, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1011/1511, loss=41.456188, loss_nce=41.456188, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1015/1511, loss=60.424339, loss_nce=60.424339, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1019/1511, loss=35.902451, loss_nce=35.902451, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1023/1511, loss=43.260071, loss_nce=43.260071, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1027/1511, loss=39.661362, loss_nce=39.661362, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1031/1511, loss=64.590012, loss_nce=64.590012, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1035/1511, loss=34.630993, loss_nce=34.630993, loss_kd=0.0, lr=0.000012

and continue like this for the end of the training, then the code crash at the evaluation

Epoch: 14: Step: 1459/1511, loss=1448.427734, loss_nce=1448.427734, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1463/1511, loss=1645.300171, loss_nce=1645.300171, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1467/1511, loss=1398.610107, loss_nce=1398.610107, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1471/1511, loss=1394.673096, loss_nce=1394.673096, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1475/1511, loss=2031.539795, loss_nce=2031.539795, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1479/1511, loss=1238.061768, loss_nce=1238.061768, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1483/1511, loss=1475.774780, loss_nce=1475.774780, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1487/1511, loss=1240.767578, loss_nce=1240.767578, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1491/1511, loss=1186.123657, loss_nce=1186.123657, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1495/1511, loss=1728.326904, loss_nce=1728.326904, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1499/1511, loss=1731.635498, loss_nce=1731.635498, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1503/1511, loss=1679.102173, loss_nce=1679.102173, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1507/1511, loss=1465.885498, loss_nce=1465.885498, loss_kd=0.0, lr=0.000000
Total data indexed 1014
Total data indexed 5070
Saved checkpoint at /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.best.pt
Saved checkpoint at /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.last.pt
test dataset len = 5000, dataloader len = 63
Selected optimization level O2:  FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled                : True
opt_level              : O2
cast_model_type        : torch.float16
patch_torch_functions  : False
keep_batchnorm_fp32    : True
master_weights         : True
loss_scale             : dynamic
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2048.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 512.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 32.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 16.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.0
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.5
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.25
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.03125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.015625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0078125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.00390625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.001953125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0009765625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.00048828125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.000244140625
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 0.0001220703125
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 6.103515625e-05
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.0517578125e-05
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.52587890625e-05
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.62939453125e-06
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.814697265625e-06
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.9073486328125e-06
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.5367431640625e-07
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.76837158203125e-07
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.384185791015625e-07
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1920928955078125e-07
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.960464477539063e-08
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.9802322387695312e-08
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4901161193847656e-08
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.450580596923828e-09
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.725290298461914e-09
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.862645149230957e-09
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.313225746154785e-10
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.656612873077393e-10
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.3283064365386963e-10
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1641532182693481e-10
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.820766091346741e-11
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.9103830456733704e-11
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4551915228366852e-11
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.275957614183426e-12
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.637978807091713e-12
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.8189894035458565e-12
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 9.094947017729282e-13
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.547473508864641e-13
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.2737367544323206e-13
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1368683772161603e-13
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.684341886080802e-14
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.842170943040401e-14
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.4210854715202004e-14
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 7.105427357601002e-15
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.552713678800501e-15
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.7763568394002505e-15
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.881784197001252e-16
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.440892098500626e-16
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.220446049250313e-16
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.1102230246251565e-16
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 5.551115123125783e-17
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 2.7755575615628914e-17
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.3877787807814457e-17
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 6.938893903907228e-18
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 3.469446951953614e-18
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 1.734723475976807e-18
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 8.673617379884035e-19
Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to 4.336808689942018e-19
Traceback (most recent call last):
  File "train_itm.py", line 369, in <module>
    args.txt_retrieval, img2txt)
AttributeError: 'Namespace' object has no attribute 'txt_retrieval'

However, I tried to evaluate the best model biencoder.best.pt using the following command:

python eval_itm.py ./config/flickr30k_eval_config.json /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.best.pt

and get the following results:

Total data indexed 1000
Total data indexed 5000
time cost = 10.698805809020996s
average loss = nan, accuracy = 0.0126
indexed  1000 data
image retrieval recall = {1: 0.001, 5: 0.005, 10: 0.01}
txt retrieval recall = {1: 0.001, 5: 0.005, 10: 0.01}

datasets missing

Hi! I see in the config file that it's expecting multiple train and val datasets (ending in ".db"). Could you tell me where can I download them or what does the data look like (is it one per line, tab separated?) Thanks in advance, David

opened by dahrs 5
Cross-modal Retrieval Objective (CMR)

Can you point me to the place in your code where CMR is implemented? You used CMR + VMLM + SMRM for the pre-training, according to the paper. However, CMR is not part of your supported tasks. Am I missing something?

opened by mojivalipour 6
Paper pretraining config

It appears that config/pretrain-alldata-base.json is not your paper pretraining configuration. There is no cls_concat setting in this configuration file, so it uses the default value. As a result, unlike your paper, this configuration uses MLM instead of VMLM. Could you please provide a correct configuration that reproduces your results?

opened by mojivalipour 1
coco_cap dataset

Hello,

Just a quick question. As far as I can see, coco_cap dataset is not shared in lightningDOT repo. I have been able to download coco_cap dataset from Uniter repo (with ./scripts/download_indomain.sh). May I know if I can use the coco_cap dataset from Uniter for lightningDOT training? Kindly let me know. Thank you.

opened by mattgithub1919 1
Demo Notebook

Hi @intersun @ChenRocks

Can you share the data checkpoints used in Image_retriever.ipynb? And if you can share any updated notebook that can be used as a demo to check what actual images are retrieved for a given single query then that will be really helpful.

Thanks

opened by shivangibithel 1

Gradiant overflow in finetuning

opened by ghaddarAbs 2

Pretraining dataset

Thank you very much for the great work, and for making your code publicly available. I am trying to run the code to reproduce the results, however, the pre-training datasets are missing from the download script. Is it possible to upload the pretraining data, similar to what you did for the fine-tuning ones last week?

In fact, I tried to use coco and vg datasets distributed by the UNITER code, while adjusting the train/val dataset in ./config/pretrain-alldata-base.json as follow:

{
       "name": "coco_cap",
       "db": [
           "/path/to//uniter/txt_db/pretrain_coco_train.db/",
           "/path/to//uniter/txt_db/pretrain_coco_val.db/"
       ],
       "img": [
           "/path/to//uniter/img_db/coco_train2014/",
           "/path/to//uniter/img_db/coco_val2014/"
       ],
       "tasks": [
           "itm",
           "mlm",
           "mrfr",
           "mrckl"
       ],
       "mix_ratio": [
           16,
           8,
           4,
           4
       ]
   },
   {
       "name": "vg_cap",
       "db": [
           "/path/to//uniter/txt_db/pretrain_vg_train.db/"
       ],
       "img": [
           "/path/to//uniter/img_db/vg/"
       ],
       "tasks": [
           "itm",
           "mlm",
           "mrfr",
           "mrckl"
       ],
       "mix_ratio": [
           16,
           12,
           6,
           6
       ]
   }
],
"val_datasets": [
   {
       "name": "coco_cap",
       "db": [
           "/path/to//uniter/txt_db/pretrain_coco_val.db/"
       ],
       "img": [
           "/path/to//uniter/img_db/coco_val2014/"
       ],
       "tasks": [
           "itm",
           "mlm",
           "mrfr",
           "mrckl"
       ]
   },
   {
       "name": "vg_cap",
       "db": [
           "/path/to//uniter/txt_db/pretrain_vg_val.db/"
       ],
       "img": [
           "/path/to//uniter/img_db/vg/"
       ],
       "tasks": [
           "itm",
           "mlm",
           "mrfr",
           "mrckl"
       ]
   }

Surprisingly, the pretraining code worked, but I get another issue. I got gradient overflow at the beginning of the training and then this error at 3%: ZeroDivisionError: float division by zero

Here are some logs for gradient overflow

[1,2]<stdout>:Gradient overflow.  Skipping step, loss scaler 5 reducing loss scale to 4.3601508761683463e-106
[1,1]<stdout>:Gradient overflow.  Skipping step, loss scaler 5 reducing loss scale to 4.3601508761683463e-106
[1,3]<stdout>:Gradient overflow.  Skipping step, loss scaler 5 reducing loss scale to 4.3601508761683463e-106
[1,0]<stdout>:Gradient overflow.  Skipping step, loss scaler 5 reducing loss scale to 4.3601508761683463e-106
  3%|▎         | 8792/300000 [2:51:23<79:18:44,  1.02it/s][1,1]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,0]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,3]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,2]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,3]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,2]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,1]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,0]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,1]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,2]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,0]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,3]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,3]<stdout>:Inf/Nan in loss/mrfr_coco_cap
[1,2]<stdout>:Inf/Nan in loss/mrfr_coco_cap

and here is the log of the error:

[1,0]<stderr>:ZeroDivisionError: float division by zero
  3%|▎         | 8856/300000 [2:52:34<94:33:17,  1.17s/it]--------------------------------------------------------
------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

I understand why this error is happening, the loss gradually gets smaller until it became 0. However, I can't understand what to do to solve this error? I looked at the issues in apex and it seems that I have bad input that is causing the issue. So my conclusion was that I am not using the correct pretraining dataset.

Can you please share the pretraining data?

Thanks

opened by ghaddarAbs 5

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Related tags

Overview

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

UNITER Environment

Environment

Download Checkpoints and Meta file

Pre-training

Fine-tuning on MSCOCO and Flickr30k

Evaluation

Meta File

Demo

Re-Ranking

Contributing

License

Comments

datasets missing

Cross-modal Retrieval Objective (CMR)

Paper pretraining config

coco_cap dataset

Demo Notebook

Gradiant overflow in finetuning

Pretraining dataset

Owner

Siqi

covid question answering datasets and fine tuned models

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Tool cek opsi checkpoint facebook!

Offcial repository for the IEEE ICRA 2021 paper Auto-Tuned Sim-to-Real Transfer.

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

[NeurIPS 2021] Well-tuned Simple Nets Excel on Tabular Datasets

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Source code, datasets and trained models for the paper Learning Advanced Mathematical Computations from Examples (ICLR 2021), by François Charton, Amaury Hayat (ENPC-Rutgers) and Guillaume Lample

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).