Hi,
Thank you very much for the great work, and for sharing the fine-tuning data last week.
I got an issue when I tried to fine-tune and evaluate the model on the flickr30k, using:
# I just run the second command (GPU:1 lr: 2e-5 )
./bash/train_flickr.sh
The epoch start normally at the beginning, but suddenly the loss strat increasing at epoch 6:
Epoch: 6: Step: 555/1511, loss=0.527620, loss_nce=0.527620, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 559/1511, loss=0.727350, loss_nce=0.727350, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 563/1511, loss=0.570808, loss_nce=0.570808, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 567/1511, loss=0.393095, loss_nce=0.393095, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 571/1511, loss=0.674848, loss_nce=0.674848, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 575/1511, loss=0.499143, loss_nce=0.499143, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 579/1511, loss=0.594417, loss_nce=0.594417, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 583/1511, loss=0.637567, loss_nce=0.637567, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 587/1511, loss=0.848309, loss_nce=0.848309, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 591/1511, loss=0.859852, loss_nce=0.859852, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 595/1511, loss=0.551946, loss_nce=0.551946, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 599/1511, loss=0.569656, loss_nce=0.569656, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 603/1511, loss=0.811136, loss_nce=0.811136, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 607/1511, loss=0.926843, loss_nce=0.926843, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 611/1511, loss=0.878590, loss_nce=0.878590, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 615/1511, loss=0.930382, loss_nce=0.930382, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 619/1511, loss=1.138345, loss_nce=1.138345, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 623/1511, loss=1.101084, loss_nce=1.101084, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 627/1511, loss=0.899013, loss_nce=0.899013, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 631/1511, loss=1.180095, loss_nce=1.180095, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 635/1511, loss=1.371186, loss_nce=1.371186, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 639/1511, loss=1.614157, loss_nce=1.614157, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 643/1511, loss=1.712646, loss_nce=1.712646, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 647/1511, loss=2.504568, loss_nce=2.504568, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 651/1511, loss=2.761936, loss_nce=2.761936, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 655/1511, loss=4.210203, loss_nce=4.210203, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 659/1511, loss=6.195764, loss_nce=6.195764, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 663/1511, loss=8.189028, loss_nce=8.189028, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 667/1511, loss=12.597887, loss_nce=12.597887, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 671/1511, loss=11.704583, loss_nce=11.704583, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 675/1511, loss=13.765331, loss_nce=13.765331, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 679/1511, loss=18.207155, loss_nce=18.207155, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 683/1511, loss=16.359169, loss_nce=16.359169, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 687/1511, loss=20.523600, loss_nce=20.523600, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 691/1511, loss=27.668240, loss_nce=27.668240, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 695/1511, loss=30.855385, loss_nce=30.855385, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 699/1511, loss=35.086441, loss_nce=35.086441, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 703/1511, loss=30.574892, loss_nce=30.574892, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 707/1511, loss=52.953876, loss_nce=52.953876, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 711/1511, loss=40.207417, loss_nce=40.207417, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 715/1511, loss=53.108303, loss_nce=53.108303, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 719/1511, loss=47.695160, loss_nce=47.695160, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 723/1511, loss=45.211182, loss_nce=45.211182, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 727/1511, loss=49.979271, loss_nce=49.979271, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 731/1511, loss=45.502415, loss_nce=45.502415, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 735/1511, loss=42.128304, loss_nce=42.128304, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 739/1511, loss=57.433262, loss_nce=57.433262, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 743/1511, loss=70.618607, loss_nce=70.618607, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 747/1511, loss=52.835541, loss_nce=52.835541, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 751/1511, loss=57.775532, loss_nce=57.775532, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 755/1511, loss=75.909271, loss_nce=75.909271, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 759/1511, loss=47.627548, loss_nce=47.627548, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 763/1511, loss=55.984451, loss_nce=55.984451, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 767/1511, loss=39.634636, loss_nce=39.634636, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 771/1511, loss=43.213181, loss_nce=43.213181, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 775/1511, loss=37.875175, loss_nce=37.875175, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 779/1511, loss=45.833000, loss_nce=45.833000, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 783/1511, loss=42.249699, loss_nce=42.249699, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 787/1511, loss=49.242207, loss_nce=49.242207, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 791/1511, loss=59.082058, loss_nce=59.082058, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 795/1511, loss=44.366467, loss_nce=44.366467, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 799/1511, loss=61.286034, loss_nce=61.286034, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 803/1511, loss=65.236374, loss_nce=65.236374, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 807/1511, loss=55.568848, loss_nce=55.568848, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 811/1511, loss=81.588463, loss_nce=81.588463, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 815/1511, loss=138.267487, loss_nce=138.267487, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 819/1511, loss=205.398163, loss_nce=205.398163, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 823/1511, loss=106.781647, loss_nce=106.781647, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 827/1511, loss=114.370003, loss_nce=114.370003, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 831/1511, loss=85.564255, loss_nce=85.564255, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 835/1511, loss=58.856918, loss_nce=58.856918, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 839/1511, loss=48.463295, loss_nce=48.463295, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 843/1511, loss=49.180916, loss_nce=49.180916, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 847/1511, loss=42.912064, loss_nce=42.912064, loss_kd=0.0, lr=0.000013
Epoch: 6: Step: 851/1511, loss=33.153042, loss_nce=33.153042, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 855/1511, loss=49.714306, loss_nce=49.714306, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 859/1511, loss=30.225197, loss_nce=30.225197, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 863/1511, loss=40.542446, loss_nce=40.542446, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 867/1511, loss=42.657013, loss_nce=42.657013, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 871/1511, loss=29.824253, loss_nce=29.824253, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 875/1511, loss=38.451778, loss_nce=38.451778, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 879/1511, loss=30.017517, loss_nce=30.017517, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 883/1511, loss=30.451855, loss_nce=30.451855, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 887/1511, loss=24.856079, loss_nce=24.856079, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 891/1511, loss=26.671665, loss_nce=26.671665, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 895/1511, loss=24.949318, loss_nce=24.949318, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 899/1511, loss=24.966484, loss_nce=24.966484, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 903/1511, loss=31.370058, loss_nce=31.370058, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 907/1511, loss=54.106686, loss_nce=54.106686, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 911/1511, loss=27.364002, loss_nce=27.364002, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 915/1511, loss=31.717720, loss_nce=31.717720, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 919/1511, loss=32.850029, loss_nce=32.850029, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 923/1511, loss=36.481514, loss_nce=36.481514, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 927/1511, loss=36.080856, loss_nce=36.080856, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 931/1511, loss=43.164818, loss_nce=43.164818, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 935/1511, loss=82.020950, loss_nce=82.020950, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 939/1511, loss=36.782185, loss_nce=36.782185, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 943/1511, loss=32.322525, loss_nce=32.322525, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 947/1511, loss=37.928696, loss_nce=37.928696, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 951/1511, loss=37.906788, loss_nce=37.906788, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 955/1511, loss=40.255390, loss_nce=40.255390, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 959/1511, loss=36.430790, loss_nce=36.430790, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 963/1511, loss=34.600498, loss_nce=34.600498, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 967/1511, loss=39.713654, loss_nce=39.713654, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 971/1511, loss=46.052864, loss_nce=46.052864, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 975/1511, loss=37.347187, loss_nce=37.347187, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 979/1511, loss=41.355392, loss_nce=41.355392, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 983/1511, loss=45.157066, loss_nce=45.157066, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 987/1511, loss=32.828815, loss_nce=32.828815, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 991/1511, loss=55.191578, loss_nce=55.191578, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 995/1511, loss=49.200516, loss_nce=49.200516, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 999/1511, loss=34.357136, loss_nce=34.357136, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1003/1511, loss=37.069489, loss_nce=37.069489, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1007/1511, loss=45.910133, loss_nce=45.910133, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1011/1511, loss=41.456188, loss_nce=41.456188, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1015/1511, loss=60.424339, loss_nce=60.424339, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1019/1511, loss=35.902451, loss_nce=35.902451, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1023/1511, loss=43.260071, loss_nce=43.260071, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1027/1511, loss=39.661362, loss_nce=39.661362, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1031/1511, loss=64.590012, loss_nce=64.590012, loss_kd=0.0, lr=0.000012
Epoch: 6: Step: 1035/1511, loss=34.630993, loss_nce=34.630993, loss_kd=0.0, lr=0.000012
and continue like this for the end of the training, then the code crash at the evaluation
Epoch: 14: Step: 1459/1511, loss=1448.427734, loss_nce=1448.427734, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1463/1511, loss=1645.300171, loss_nce=1645.300171, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1467/1511, loss=1398.610107, loss_nce=1398.610107, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1471/1511, loss=1394.673096, loss_nce=1394.673096, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1475/1511, loss=2031.539795, loss_nce=2031.539795, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1479/1511, loss=1238.061768, loss_nce=1238.061768, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1483/1511, loss=1475.774780, loss_nce=1475.774780, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1487/1511, loss=1240.767578, loss_nce=1240.767578, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1491/1511, loss=1186.123657, loss_nce=1186.123657, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1495/1511, loss=1728.326904, loss_nce=1728.326904, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1499/1511, loss=1731.635498, loss_nce=1731.635498, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1503/1511, loss=1679.102173, loss_nce=1679.102173, loss_kd=0.0, lr=0.000000
Epoch: 14: Step: 1507/1511, loss=1465.885498, loss_nce=1465.885498, loss_kd=0.0, lr=0.000000
Total data indexed 1014
Total data indexed 5070
Saved checkpoint at /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.best.pt
Saved checkpoint at /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.last.pt
test dataset len = 5000, dataloader len = 63
Selected optimization level O2: FP16 training with FP32 batchnorm and FP32 master weights.
Defaults for this optimization level are:
enabled : True
opt_level : O2
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : True
master_weights : True
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O2
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : True
master_weights : True
loss_scale : dynamic
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32768.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16384.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8192.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4096.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2048.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1024.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 512.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 256.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 128.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 64.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 32.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 16.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.5
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.03125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.015625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0078125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00390625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.001953125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0009765625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.00048828125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.000244140625
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.0001220703125
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.103515625e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.0517578125e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.52587890625e-05
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.62939453125e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.814697265625e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.9073486328125e-06
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.5367431640625e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.76837158203125e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.384185791015625e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1920928955078125e-07
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.960464477539063e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9802322387695312e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4901161193847656e-08
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.450580596923828e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.725290298461914e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.862645149230957e-09
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.313225746154785e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.656612873077393e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.3283064365386963e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1641532182693481e-10
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.820766091346741e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.9103830456733704e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4551915228366852e-11
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.275957614183426e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.637978807091713e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.8189894035458565e-12
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 9.094947017729282e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.547473508864641e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.2737367544323206e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1368683772161603e-13
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.684341886080802e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.842170943040401e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.4210854715202004e-14
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 7.105427357601002e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.552713678800501e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.7763568394002505e-15
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.881784197001252e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.440892098500626e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.220446049250313e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.1102230246251565e-16
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 5.551115123125783e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.7755575615628914e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.3877787807814457e-17
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 6.938893903907228e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 3.469446951953614e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.734723475976807e-18
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.673617379884035e-19
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.336808689942018e-19
Traceback (most recent call last):
File "train_itm.py", line 369, in <module>
args.txt_retrieval, img2txt)
AttributeError: 'Namespace' object has no attribute 'txt_retrieval'
However, I tried to evaluate the best model biencoder.best.pt
using the following command:
python eval_itm.py ./config/flickr30k_eval_config.json /path/to/flickr-bert-two_stream/2e-5_96_0_none_0.0_768_both_run1/biencoder.best.pt
and get the following results:
Total data indexed 1000
Total data indexed 5000
time cost = 10.698805809020996s
average loss = nan, accuracy = 0.0126
indexed 1000 data
image retrieval recall = {1: 0.001, 5: 0.005, 10: 0.01}
txt retrieval recall = {1: 0.001, 5: 0.005, 10: 0.01}