Multi-Modal Self-Supervision using GDT and StiCa

This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized Data Transformations and Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning. In this repository, we provide PyTorch code for pretraining and testing our proposed GDT and StiCa models.

If you find GDT and STiCA useful in your research, please use the following BibTeX entries for citation.

@misc{patrick2020multimodal,
      title={Multi-modal Self-Supervision from Generalized Data Transformations}, 
      author={Mandela Patrick and Yuki M. Asano and Polina Kuznetsova and Ruth Fong and João F. Henriques and Geoffrey Zweig and Andrea Vedaldi},
      year={2021},
      booktitle={International Conference on Computer Vision (ICCV)},
}

@misc{m2021spacetime,
    title={Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning},
    author={Mandela Patrick and Yuki M. Asano and Bernie Huang and Ishan Misra and Florian Metze and Joao Henriques and Andrea Vedaldi},
    year={2021},
    booktitle={International Conference on Computer Vision (ICCV)},
}

Highlights

(1) GDT: Formulate and generalize most pretext tasks in a NCE objective.

Using this formulation, we test various pretext tasks previously unexplored and achieve SOTA downstream performance.

(2) STiCA: Importance of incorporating within-modal invariance in cross-modal learning

We show how to efficiently incorporate within-modal invariance learning using feature crops and achieve SOTA downstream performance.

Model Zoo

We provide GDT models pretrained on Kinetics-400 (K400), HowTo100M (HT100M), and Instagram-65M (IG65M) datasets, and StiCa models pretrained on Kinetics-400 (K400).

name	dataset	# of frames	spatial crop	HMDB51 Top1	UCF101 Top1	url
GDT	K400	30	112	62.3	90.9	model
GDT	HT100M	30	112	94.1	67.4	model
GDT	IG65M	30	112	72.8	95.2	model

name	dataset	# of frames	spatial crop	HMDB51 Top1	UCF101 Top1	url
STiCA	K400	60	112	67.0	93.1	Coming Soon

Installation

This repo was tested with Ubuntu 16.04.5 LTS, Python 3.7.5, PyTorch 1.3.1, Torchvision 0.4.1, and CUDA 10.0.

Step 1

Clone this repo to your local machine

Step 2

Install required packages using conda env create -f environment.yml

Step 3

Activate conda environment using conda activate GDT

Step 4

Install kornia library pip install kornia==0.1.4

Step 5

See below for how to pretrain GDT / StiCa or benchmark pretrained models

Data Preperation

For Kinetics-400/600, HMDB-51 and UCF-101 datasets:

Ensure all datasets are in the format:

$ROOT_DIR/$SPLIT/$CLASS/*

To prepare How-To-100M dataset, do the following:

Download the word2vec matrix and dictionary, unzip the file, and place in datasets/data folder.

wget https://www.rocq.inria.fr/cluster-willow/amiech/word2vec.zip
unzip word2vec.zip
mv word2vec.pth datasets/data/word2vec.pth

Download the csv files of captions.

wget https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/howto100m_captions.zip
unzip howto100m_captions.zip

Download the preprocessed HowTo100M videos (12TB in total) by filling this Google form: https://forms.gle/hztrfnFQUJWBtiki8.

Usage

GDT pretraining

To pretrain audio-visual GDT on K-400

Multi-node distributed training with SLURM cluster:

sbatch pretraining_scripts/pretrain_gdt_k400.sh ${HYPOTHESIS_DESC} ${HYPOTHESIS}

Single-node distributed training:

python -m torch.distributed.launch --master_port=$RANDOM --nproc_per_node=2 --use_env main_gdt.py --batch_size $BS --lr $LR --hypothesis {1,2,3,4,5,6,7,8,9}

To pretrain video-text GDT on HT100M

Multi-node training with SLURM cluster:

sbatch pretraining_scripts/pretrain_gdt_ht100m.sh ${HYPOTHESIS_DESC} ${HYPOTHESIS}

Single-node distributed training:

python -m torch.distributed.launch --master_port=$RANDOM --nproc_per_node=2 --use_env main_gdt.py --batch_size $BS --lr $LR --hypothesis {1,2,3,4,5,6,7,8,9} --dataset ht100m --decode_audio False --model vid_text_gdt --sample_rate 2

$HYPOTHESIS refers to the hypotheses explored in GDT. We experiment with the following:

1 - cross-modal baseline (cross_modal_baseline)
2 - variant to time reversal (v_reversal)
3 - invariant to time reversal (i_reversal)
4 - variant to time shift (v_shift)
5 - invariant to time shift (i_shift)
6 - variant to time reversal and variant to time shift (v_reversal_v_shift)
7 - invariant to time reversal, variant to time shift (i_reversal_v_shift)
8 - variant to time reversal, and invariant to time shift (v_reversal_i_shift)
9 - invariant to time reversal, invariant to time shift (i_reversal_i_shift)

Please modify the following in SLURM script:

SBATCH directives (e.g. partition, nodes, constraint,)
SAV_FOLDER
--root_dir (path of K-400 / HT100M train directory)

All experiments were run with 8 nodes (64 GPUs, volta32). Please scale batch-size and learning-rate appropriately.

STiCA pretraining

To pretrain audio-visual STiCA on K-400

Multi-node training with SLURM cluster:

sbatch scripts/pretrain_stica.sh $NUM_FRAMES $AUD_NUM_SEC $NUM_LARGE_CROPS $NUM_SMALL_CROPS $NUM_SMALL_TCROPS $NUM_LARGE_TCROPS $NUM_LAYER

Single-node distributed training:

python -m torch.distributed.launch --master_port=$RANDOM --nproc_per_node=2 --use_env main_stica.py --batch_size $BS --base_lr $LR

Hyper-parameters:

NUM_FRAMES - number of frames (e.g. 30)
AUD_NUM_SEC - number of seconds (30f: 1sec, 60f: 2s)
NUM_LARGE_CROPS - num of large feature spatial crops (e.g. 2)
NUM_SMALL_CROPS - num of small feature spatial crops (e.g. 4)
NUM_SMALL_TCROPS - num of large feature spatial crops (e.g. 1)
NUM_LARGE_TCROPS - num of small feature spatial crops (e.g. 2)
NUM_LAYER - num of transformer pooling layers (0 == GAP, >1 is num. of transformer layers)
e.g. sbatch scripts/pretrain_stica.sh 30 1 2 4 1 2 0

Please modify the following in SLURM script:

SBATCH directives (e.g. partition, nodes, constraint,)
SAV_FOLDER
--root_dir (path of K-400 / HT100M train directory)

All experiments were run with 8 nodes (64 GPUs, volta32). Please scale batch-size and learning-rate appropriately.

Benchmarking

To evaluate pretraining on video action recognition on UCF-101 and HMDB-51 datasets,

Locally:

python3 eval_video.py --dataset {ucf101, hmdb51} --fold {1,2,3} --weights-path {WEIGHTS_PATH} --model ${vid_text_gdt, stica, av_gdt}

On SLURM:

bash scripts/eval.sh ${WEIGHTS_PATH} ${OUTPUT_DIR} ${CKPT_NUM} ${CLIP_LEN} ${vid_text_gdt, stica, av_gdt} ${1, 2, 3}

Modify --root_dir, --ucf101-annotation-path, and --hmdb51-annotation-path in eval_video.py.

License

The majority of this work is licensed under CC-NC 4.0 International license.

Contributing

We actively welcome your pull requests. Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

Hi, Thanks for releasing the code. I tried to reproduce the ucf101 fine-tuning results on your GDT_kinetics pretrained model. However, the results seem quite far away. I didn't change any hyperparameters while finetuning. Here is my command CUDA_VISIBLE_DEVICES=0,1,2,3 python3 eval_video.py --dataset ucf101 --fold 1 --weights-path ./pretrained/gdt_K400.pth --model av_gdt --root_dir /local-ssd/fmthoker/ucf101/video/ --ucf101-annotation-path /localssd/fmthoker/ucf101/ucfTrainTestlist/

Logs

Evaluating on folds: [1] INFO - 08/24/21 15:46:23 - 0:00:00 - ============ Initialized logger ============ INFO - 08/24/21 15:46:23 - 0:00:00 - agg_model: False aud_base_arch: resnet9 aud_sample_rate: 24000 aud_spec_type: 2 audio_augtype: none base_lr: 0.00025 batch_size: 32 ckpt_epoch: 0 clip_len: 32 colorjitter: True cross_modal_alpha: 0.5 cross_modal_nce: True dataset: ucf101 dp: 0.0 dump_checkpoints: ./checkpoints dump_path: . epochs: 12 feature_extract: False fm_crop: False fold: 1 head_lr: 0.0025 headcount: 1 hmdb51_annotation_path: /datasets01/hmdb51/112018/splits/ lr_gamma: 0.05 lr_milestones: 6,10 lr_warmup_epochs: 2 mlptype: 0 model: av_gdt momentum: 0.9 multi_crop: False num_data_samples: None num_frames: 32 num_head: 4 num_large_crops: 1 num_layer: 2 num_sec: 2 num_sec_aud: 1 num_small_crops: 0 num_spatial_crops: 3 optim_name: sgd output_dir: . positional_emb: False pretrained: False print_freq: 10 qkv_mha: False rank: 0 resume: root_dir: /local-ssd/fmthoker/ucf101/video/ sample_rate: 1 start_epoch: 0 steps_bet_clips: 1 supervised: False target_fps: 30 test_crop_size: 128 test_only: False test_time_cj: False train_clips_per_video: 10 train_crop_size: 128 transformer_time_dim: 8 tsf_lr: 0.00025 ucf101_annotation_path: /local-ssd/fmthoker/ucf101/ucfTrainTestlist/ use_audio_temp_jittering: False use_bn: False use_dropout: False use_gaussian: False use_grayscale: False use_l2_norm: False use_larger_last: False use_mlp: False use_random_resize_crop: True use_scheduler: True use_volume_jittering: True val_clips_per_video: 10 vid_base_arch: r2plus1d_18 wd_base: 0.005 wd_tsf: 0.005 weight_decay: 0.005 weights_path: ./pretrained/gdt_K400.pth workers: 16 z_normalize: False

INFO - 08/24/21 15:46:23 - 0:00:00 - Loading model Using Audio-Visual GDT Using GDT model {'block': <class 'src.vmz.BasicBlock'>, 'conv_makers': [<class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>], 'layers': [2, 2, 2, 2], 'stem': <class 'src.vmz.R2Plus1dStem'>, 'larger_last': False} Randomy initializing models resnet9, duration: 1 Using Linear Layer INFO - 08/24/21 15:46:23 - 0:00:01 - Loading model weights INFO - 08/24/21 15:46:27 - 0:00:04 - Epoch checkpoint: 101 didnt load mlp_v.block_forward.2.weight didnt load mlp_v.block_forward.4.weight didnt load mlp_v.block_forward.4.bias didnt load mlp_v.block_forward.4.running_mean didnt load mlp_v.block_forward.4.running_var didnt load mlp_v.block_forward.4.num_batches_tracked didnt load mlp_v.block_forward.8.weight didnt load mlp_v.block_forward.8.bias didnt load mlp_a.block_forward.2.weight didnt load mlp_a.block_forward.4.weight didnt load mlp_a.block_forward.4.bias didnt load mlp_a.block_forward.4.running_mean didnt load mlp_a.block_forward.4.running_var didnt load mlp_a.block_forward.4.num_batches_tracked didnt load mlp_a.block_forward.8.weight didnt load mlp_a.block_forward.8.bias INFO - 08/24/21 15:46:27 - 0:00:04 - Loading model done Using non-agg GDT model Classifier to 101 classes; INFO - 08/24/21 15:46:27 - 0:00:04 - Getting params for finetuning INFO - 08/24/21 15:46:27 - 0:00:04 - ('weight', torch.Size([101, 512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('bias', torch.Size([101])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.0.weight', torch.Size([45, 3, 1, 7, 7])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.1.weight', torch.Size([45])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.1.bias', torch.Size([45])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.3.weight', torch.Size([64, 45, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.4.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.4.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.0.weight', torch.Size([230, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.1.weight', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.1.bias', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.3.weight', torch.Size([128, 230, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.0.weight', torch.Size([230, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.1.weight', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.1.bias', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.3.weight', torch.Size([128, 230, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.0.weight', torch.Size([128, 64, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.0.weight', torch.Size([288, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.1.weight', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.1.bias', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.3.weight', torch.Size([128, 288, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.0.weight', torch.Size([288, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.1.weight', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.1.bias', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.3.weight', torch.Size([128, 288, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.0.weight', torch.Size([460, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.1.weight', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.1.bias', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.3.weight', torch.Size([256, 460, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.0.weight', torch.Size([460, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.1.weight', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.1.bias', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.3.weight', torch.Size([256, 460, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.0.weight', torch.Size([256, 128, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.0.weight', torch.Size([576, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.1.weight', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.1.bias', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.3.weight', torch.Size([256, 576, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.0.weight', torch.Size([576, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.1.weight', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.1.bias', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.3.weight', torch.Size([256, 576, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.0.weight', torch.Size([921, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.1.weight', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.1.bias', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.3.weight', torch.Size([512, 921, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.0.weight', torch.Size([921, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.1.weight', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.1.bias', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.3.weight', torch.Size([512, 921, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.0.weight', torch.Size([512, 256, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.0.weight', torch.Size([1152, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.1.weight', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.1.bias', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.3.weight', torch.Size([512, 1152, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.0.weight', torch.Size([1152, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.1.weight', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.1.bias', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.3.weight', torch.Size([512, 1152, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ===========Check Grad============ INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.4.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.4.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('classifier.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('classifier.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - =================================

INFO - 08/24/21 15:46:27 - 0:00:04 - Creating AV Datasets Constructing ucf101 train... /local-ssd/fmthoker/ucf101/video/ datasets/data/ucf101_train.txt ['/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeM akeup/v_ApplyEyeMakeup_g01_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi', '/local -ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c06.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ ApplyEyeMakeup_g02_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c04.avi'] Constructing ucf101 dataloader (size: 13320) from datasets/data/ucf101_train.txt /local-ssd/fmthoker/ucf101/ucfTrainTestlist/trainlist01.txt Total number of videos: 13320, Valid videos: 9537 Constructing ucf101 test... /local-ssd/fmthoker/ucf101/video/ datasets/data/ucf101_test.txt ['/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeM akeup/v_ApplyEyeMakeup_g01_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi', '/local -ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c06.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ ApplyEyeMakeup_g02_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c04.avi'] Constructing ucf101 dataloader (size: 399600) from datasets/data/ucf101_test.txt /local-ssd/fmthoker/ucf101/ucfTrainTestlist/testlist01.txt Total number of videos: 399600, Valid videos: 113490 INFO - 08/24/21 15:46:28 - 0:00:06 - Creating data loaders INFO - 08/24/21 15:46:28 - 0:00:06 - Using SGD with lr: 0.0025, wd: 0.005 INFO - 08/24/21 15:46:28 - 0:00:06 - Num. of Epochs: 12, Milestones: [4, 8] INFO - 08/24/21 15:46:28 - 0:00:06 - Using scheduler with 2 warmup epochs INFO - 08/24/21 15:46:28 - 0:00:06 - Start training epoch: 0 INFO - 08/24/21 15:47:20 - 0:00:57 - Epoch[0] - Iter: [0/298] Time 51.689 (51.689) Data 1622540465.039 (1622540465.039) Loss 4.6286 (4.6286) Prec 3.125 (3.125) LR 0.0025 INFO - 08/24/21 15:49:26 - 0:03:03 - Epoch[0] - Iter: [50/298] Time 2.506 (3.487) Data 1622540435.687 (1622540436.263) Loss 4.5635 (4.6110) Prec 0.000 (1.164) LR 0.0025 INFO - 08/24/21 15:51:31 - 0:05:08 - Epoch[0] - Iter: [100/298] Time 2.493 (2.998) Data 1622540435.687 (1622540435.978) Loss 4.6392 (4.6082) Prec 0.000 (1.145) LR 0.0025 INFO - 08/24/21 15:53:36 - 0:07:13 - Epoch[0] - Iter: [150/298] Time 2.481 (2.832) Data 1622540435.687 (1622540435.882) Loss 4.5959 (4.6050) Prec 0.000 (1.407) LR 0.0025 INFO - 08/24/21 15:55:41 - 0:09:18 - Epoch[0] - Iter: [200/298] Time 2.482 (2.748) Data 1622540435.687 (1622540435.833) Loss 4.5906 (4.6011) Prec 0.000 (1.632) LR 0.0025 INFO - 08/24/21 15:57:45 - 0:11:23 - Epoch[0] - Iter: [250/298] Time 2.542 (2.698) Data 1622540435.687 (1622540435.804) Loss 4.6088 (4.5984) Prec 3.125 (1.755) LR 0.0025 INFO - 08/24/21 15:59:43 - 0:13:20 - Start evaluating epoch: 0 INFO - 08/24/21 15:59:43 - 0:13:20 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 15:59:45 - 0:13:22 - Start training epoch: 1 INFO - 08/24/21 16:00:10 - 0:13:47 - Epoch[1] - Iter: [0/298] Time 25.480 (25.480) Data 1622540458.407 (1622540458.407) Loss 4.5279 (4.5279) Prec 9.375 (9.375) LR 0.01125 INFO - 08/24/21 16:02:16 - 0:15:53 - Epoch[1] - Iter: [50/298] Time 2.515 (2.965) Data 1622540435.688 (1622540436.133) Loss 4.5346 (4.5540) Prec 12.500 (3.799) LR 0.01125 INFO - 08/24/21 16:04:21 - 0:17:58 - Epoch[1] - Iter: [100/298] Time 2.510 (2.735) Data 1622540435.688 (1622540435.912) Loss 4.5357 (4.5358) Prec 6.250 (5.972) LR 0.01125 INFO - 08/24/21 16:06:26 - 0:20:03 - Epoch[1] - Iter: [150/298] Time 2.532 (2.659) Data 1622540435.687 (1622540435.838) Loss 4.4710 (4.5186) Prec 15.625 (7.554) LR 0.01125 INFO - 08/24/21 16:08:31 - 0:22:09 - Epoch[1] - Iter: [200/298] Time 2.522 (2.621) Data 1622540435.687 (1622540435.801) Loss 4.3993 (4.5033) Prec 31.250 (8.893) LR 0.01125 INFO - 08/24/21 16:10:37 - 0:24:14 - Epoch[1] - Iter: [250/298] Time 2.525 (2.597) Data 1622540435.688 (1622540435.778) Loss 4.4011 (4.4881) Prec 6.250 (9.587) LR 0.01125 INFO - 08/24/21 16:12:34 - 0:26:11 - Start evaluating epoch: 1 INFO - 08/24/21 16:12:34 - 0:26:11 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:12:36 - 0:26:13 - Start training epoch: 2 INFO - 08/24/21 16:13:06 - 0:26:43 - Epoch[2] - Iter: [0/298] Time 30.481 (30.481) Data 1622540463.503 (1622540463.503) Loss 4.3571 (4.3571) Prec 9.375 (9.375) LR 0.02 INFO - 08/24/21 16:15:11 - 0:28:49 - Epoch[2] - Iter: [50/298] Time 2.522 (3.056) Data 1622540435.687 (1622540436.233) Loss 4.3466 (4.3461) Prec 12.500 (16.238) LR 0.02 INFO - 08/24/21 16:17:17 - 0:30:54 - Epoch[2] - Iter: [100/298] Time 2.501 (2.784) Data 1622540435.687 (1622540435.963) Loss 4.2123 (4.3223) Prec 25.000 (16.677) LR 0.02 INFO - 08/24/21 16:19:22 - 0:33:00 - Epoch[2] - Iter: [150/298] Time 2.521 (2.693) Data 1622540435.688 (1622540435.872) Loss 4.1464 (4.2927) Prec 21.875 (17.632) LR 0.02 INFO - 08/24/21 16:21:28 - 0:35:05 - Epoch[2] - Iter: [200/298] Time 2.502 (2.647) Data 1622540435.687 (1622540435.826) Loss 4.1699 (4.2582) Prec 21.875 (18.968) LR 0.02 INFO - 08/24/21 16:23:33 - 0:37:10 - Epoch[2] - Iter: [250/298] Time 2.498 (2.618) Data 1622540435.688 (1622540435.798) Loss 3.9341 (4.2171) Prec 31.250 (20.319) LR 0.02 INFO - 08/24/21 16:25:30 - 0:39:07 - Start evaluating epoch: 2 INFO - 08/24/21 16:25:30 - 0:39:07 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:25:31 - 0:39:09 - Start training epoch: 3 INFO - 08/24/21 16:26:01 - 0:39:39 - Epoch[3] - Iter: [0/298] Time 29.839 (29.839) Data 1622540462.845 (1622540462.845) Loss 3.8673 (3.8673) Prec 34.375 (34.375) LR 0.02 INFO - 08/24/21 16:28:07 - 0:41:44 - Epoch[3] - Iter: [50/298] Time 2.520 (3.049) Data 1622540435.688 (1622540436.220) Loss 4.0330 (3.9060) Prec 21.875 (26.042) LR 0.02 INFO - 08/24/21 16:30:12 - 0:43:49 - Epoch[3] - Iter: [100/298] Time 2.488 (2.780) Data 1622540435.687 (1622540435.956) Loss 3.8576 (3.8755) Prec 15.625 (26.300) LR 0.02 INFO - 08/24/21 16:32:17 - 0:45:55 - Epoch[3] - Iter: [150/298] Time 2.495 (2.689) Data 1622540435.687 (1622540435.867) Loss 3.6733 (3.8285) Prec 37.500 (27.918) LR 0.02 INFO - 08/24/21 16:34:23 - 0:48:00 - Epoch[3] - Iter: [200/298] Time 2.535 (2.643) Data 1622540435.688 (1622540435.823) Loss 3.4600 (3.7857) Prec 34.375 (28.296) LR 0.02 INFO - 08/24/21 16:36:28 - 0:50:06 - Epoch[3] - Iter: [250/298] Time 2.496 (2.617) Data 1622540435.687 (1622540435.796) Loss 3.2958 (3.7393) Prec 34.375 (29.109) LR 0.02 INFO - 08/24/21 16:38:26 - 0:52:03 - Start evaluating epoch: 3 INFO - 08/24/21 16:38:26 - 0:52:03 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:38:27 - 0:52:05 - Start training epoch: 4 INFO - 08/24/21 16:38:43 - 0:52:20 - Epoch[4] - Iter: [0/298] Time 15.868 (15.868) Data 1622540448.829 (1622540448.829) Loss 3.5739 (3.5739) Prec 31.250 (31.250) LR 0.02 INFO - 08/24/21 16:40:55 - 0:54:32 - Epoch[4] - Iter: [50/298] Time 2.526 (2.889) Data 1622540435.688 (1622540436.030) Loss 3.4294 (3.3843) Prec 28.125 (32.353) LR 0.02 INFO - 08/24/21 16:43:00 - 0:56:37 - Epoch[4] - Iter: [100/298] Time 2.492 (2.700) Data 1622540435.687 (1622540435.860) Loss 3.1331 (3.3321) Prec 43.750 (33.694) LR 0.02 INFO - 08/24/21 16:45:05 - 0:58:43 - Epoch[4] - Iter: [150/298] Time 2.508 (2.636) Data 1622540435.688 (1622540435.803) Loss 3.5533 (3.2995) Prec 31.250 (34.085) LR 0.02 INFO - 08/24/21 16:47:10 - 1:00:48 - Epoch[4] - Iter: [200/298] Time 2.507 (2.602) Data 1622540435.687 (1622540435.774) Loss 3.0330 (3.2618) Prec 28.125 (34.686) LR 0.02 INFO - 08/24/21 16:49:16 - 1:02:53 - Epoch[4] - Iter: [250/298] Time 2.555 (2.584) Data 1622540435.688 (1622540435.757) Loss 3.2438 (3.2210) Prec 28.125 (35.545) LR 0.02 INFO - 08/24/21 16:51:14 - 1:04:51 - Start evaluating epoch: 4 INFO - 08/24/21 16:51:14 - 1:04:51 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:51:15 - 1:04:52 - Start training epoch: 5 INFO - 08/24/21 16:51:32 - 1:05:09 - Epoch[5] - Iter: [0/298] Time 16.828 (16.828) Data 1622540449.801 (1622540449.801) Loss 2.7837 (2.7837) Prec 46.875 (46.875) LR 0.02 INFO - 08/24/21 16:53:42 - 1:07:19 - Epoch[5] - Iter: [50/298] Time 2.554 (2.875) Data 1622540435.687 (1622540436.010) Loss 2.7985 (2.9303) Prec 46.875 (40.686) LR 0.02 INFO - 08/24/21 16:55:47 - 1:09:24 - Epoch[5] - Iter: [100/298] Time 2.498 (2.695) Data 1622540435.688 (1622540435.851) Loss 2.9752 (2.8821) Prec 31.250 (41.770) LR 0.02 INFO - 08/24/21 16:57:53 - 1:11:30 - Epoch[5] - Iter: [150/298] Time 2.511 (2.633) Data 1622540435.687 (1622540435.797) Loss 2.9061 (2.8535) Prec 34.375 (41.846) LR 0.02 INFO - 08/24/21 16:59:58 - 1:13:35 - Epoch[5] - Iter: [200/298] Time 2.488 (2.602) Data 1622540435.687 (1622540435.769) Loss 2.8021 (2.8290) Prec 40.625 (41.962) LR 0.02 INFO - 08/24/21 17:02:03 - 1:15:40 - Epoch[5] - Iter: [250/298] Time 2.508 (2.582) Data 1622540435.687 (1622540435.753) Loss 2.8373 (2.8024) Prec 43.750 (42.418) LR 0.02 INFO - 08/24/21 17:04:01 - 1:17:38 - Start evaluating epoch: 5 INFO - 08/24/21 17:04:01 - 1:17:38 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 17:04:02 - 1:17:39 - Start training epoch: 6 INFO - 08/24/21 17:04:31 - 1:18:08 - Epoch[6] - Iter: [0/298] Time 29.058 (29.058) Data 1622540462.031 (1622540462.031) Loss 2.4810 (2.4810) Prec 50.000 (50.000) LR 0.02 INFO - 08/24/21 17:06:37 - 1:20:14 - Epoch[6] - Iter: [50/298] Time 2.504 (3.039) Data 1622540435.687 (1622540436.204) Loss 2.4081 (2.5313) Prec 59.375 (49.142) LR 0.02 INFO - 08/24/21 17:08:43 - 1:22:20 - Epoch[6] - Iter: [100/298] Time 2.572 (2.777) Data 1622540435.687 (1622540435.948) Loss 2.4269 (2.5146) Prec 50.000 (49.196) LR 0.02 INFO - 08/24/21 17:10:48 - 1:24:26 - Epoch[6] - Iter: [150/298] Time 2.486 (2.690) Data 1622540435.687 (1622540435.862) Loss 2.1894 (2.4784) Prec 56.250 (49.400) LR 0.02 INFO - 08/24/21 17:12:54 - 1:26:32 - Epoch[6] - Iter: [200/298] Time 2.515 (2.647) Data 1622540435.687 (1622540435.819) Loss 2.4088 (2.4631) Prec 40.625 (49.114) LR 0.02 INFO - 08/24/21 17:15:00 - 1:28:37 - Epoch[6] - Iter: [250/298] Time 2.673 (2.620) Data 1622540435.688 (1622540435.792) Loss 2.2423 (2.4371) Prec 46.875 (49.589) LR 0.02 INFO - 08/24/21 17:16:57 - 1:30:35 - Start evaluating epoch: 6 INFO - 08/24/21 17:16:57 - 1:30:35 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 17:16:59 - 1:30:36 - Start training epoch: 7 INFO - 08/24/21 17:17:21 - 1:30:58 - Epoch[7] - Iter: [0/298] Time 22.183 (22.183) Data 1622540455.122 (1622540455.122) Loss 2.2278 (2.2278) Prec 50.000 (50.000) LR 0.001 INFO - 08/24/21 17:19:28 - 1:33:05 - Epoch[7] - Iter: [50/298] Time 2.526 (2.930) Data 1622540435.687 (1622540436.068) Loss 1.8457 (2.2670) Prec 75.000 (52.696) LR 0.001 INFO - 08/24/21 17:21:34 - 1:35:11 - Epoch[7] - Iter: [100/298] Time 2.531 (2.723) Data 1622540435.688 (1622540435.880) Loss 2.3264 (2.2734) Prec 56.250 (52.259) LR 0.001 INFO - 08/24/21 17:23:39 - 1:37:16 - Epoch[7] - Iter: [150/298] Time 2.518 (2.652) Data 1622540435.688 (1622540435.816) Loss 2.4978 (2.2473) Prec 50.000 (52.918) LR 0.001 INFO - 08/24/21 17:25:44 - 1:39:21 - Epoch[7] - Iter: [200/298] Time 2.524 (2.614) Data 1622540435.687 (1622540435.784) Loss 2.1839 (2.2458) Prec 37.500 (53.420) LR 0.001 INFO - 08/24/21 17:27:49 - 1:41:27 - Epoch[7] - Iter: [250/298] Time 2.501 (2.593) Data 1622540435.687 (1622540435.765) Loss 2.2702 (2.2481) Prec 43.750 (53.536) LR 0.001 INFO - 08/24/21 17:29:47 - 1:43:24 - Start evaluating epoch: 7 INFO - 08/24/21 18:25:12 - 2:38:49 - Test: Time 0.937 Loss 1.6872 ClipAcc@1 58.206 VidAcc@1 63.283 INFO - 08/24/21 18:25:12 - 2:38:49 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 18:25:13 - 2:38:50 - Start training epoch: 8 INFO - 08/24/21 18:25:34 - 2:39:11 - Epoch[8] - Iter: [0/298] Time 20.517 (20.517) Data 1622540451.545 (1622540451.545) Loss 2.4127 (2.4127) Prec 40.625 (40.625) LR 0.001 INFO - 08/24/21 18:27:38 - 2:41:16 - Epoch[8] - Iter: [50/298] Time 2.465 (2.850) Data 1622540435.687 (1622540435.998) Loss 2.1980 (2.2194) Prec 56.250 (57.782) LR 0.001 INFO - 08/24/21 18:29:42 - 2:43:20 - Epoch[8] - Iter: [100/298] Time 2.468 (2.665) Data 1622540435.687 (1622540435.844) Loss 2.4523 (2.2355) Prec 65.625 (56.528) LR 0.001 INFO - 08/24/21 18:31:47 - 2:45:24 - Epoch[8] - Iter: [150/298] Time 2.509 (2.606) Data 1622540435.688 (1622540435.792) Loss 1.9243 (2.2228) Prec 68.750 (55.877) LR 0.001 INFO - 08/24/21 18:33:51 - 2:47:28 - Epoch[8] - Iter: [200/298] Time 2.477 (2.576) Data 1622540435.687 (1622540435.766) Loss 2.2771 (2.2333) Prec 56.250 (55.084) LR 0.001 INFO - 08/24/21 18:35:55 - 2:49:33 - Epoch[8] - Iter: [250/298] Time 2.485 (2.558) Data 1622540435.687 (1622540435.751) Loss 2.2006 (2.2320) Prec 46.875 (54.918) LR 0.001 INFO - 08/24/21 18:37:52 - 2:51:29 - Start evaluating epoch: 8 INFO - 08/24/21 19:31:41 - 3:45:19 - Test: Time 0.910 Loss 1.6609 ClipAcc@1 58.079 VidAcc@1 63.574 INFO - 08/24/21 19:31:42 - 3:45:19 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 19:31:43 - 3:45:20 - Start training epoch: 9 INFO - 08/24/21 19:32:02 - 3:45:39 - Epoch[9] - Iter: [0/298] Time 19.203 (19.203) Data 1622540452.256 (1622540452.256) Loss 2.4376 (2.4376) Prec 34.375 (34.375) LR 0.001 INFO - 08/24/21 19:34:07 - 3:47:44 - Epoch[9] - Iter: [50/298] Time 2.502 (2.829) Data 1622540435.687 (1622540436.012) Loss 2.1187 (2.2550) Prec 56.250 (54.289) LR 0.001 INFO - 08/24/21 19:36:11 - 3:49:49 - Epoch[9] - Iter: [100/298] Time 2.496 (2.660) Data 1622540435.687 (1622540435.851) Loss 2.2576 (2.2123) Prec 59.375 (55.600) LR 0.001 INFO - 08/24/21 19:38:16 - 3:51:53 - Epoch[9] - Iter: [150/298] Time 2.525 (2.605) Data 1622540435.687 (1622540435.797) Loss 2.2289 (2.2205) Prec 56.250 (54.843) LR 0.001 INFO - 08/24/21 19:40:20 - 3:53:58 - Epoch[9] - Iter: [200/298] Time 2.480 (2.575) Data 1622540435.687 (1622540435.770) Loss 2.1648 (2.2221) Prec 46.875 (54.773) LR 0.001 INFO - 08/24/21 19:42:25 - 3:56:02 - Epoch[9] - Iter: [250/298] Time 2.496 (2.558) Data 1622540435.687 (1622540435.753) Loss 2.1566 (2.2231) Prec 59.375 (54.905) LR 0.001 INFO - 08/24/21 19:44:22 - 3:57:59 - Start evaluating epoch: 9 INFO - 08/24/21 20:38:41 - 4:52:18 - Test: Time 0.919 Loss 1.6590 ClipAcc@1 58.412 VidAcc@1 62.781 INFO - 08/24/21 20:38:41 - 4:52:18 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 20:38:42 - 4:52:19 - Start training epoch: 10 INFO - 08/24/21 20:39:00 - 4:52:37 - Epoch[10] - Iter: [0/298] Time 18.026 (18.026) Data 1622540451.023 (1622540451.023) Loss 2.1097 (2.1097) Prec 65.625 (65.625) LR 0.001 INFO - 08/24/21 20:41:08 - 4:54:45 - Epoch[10] - Iter: [50/298] Time 2.502 (2.856) Data 1622540435.687 (1622540436.024) Loss 2.4694 (2.1863) Prec 46.875 (56.924) LR 0.001 INFO - 08/24/21 20:43:12 - 4:56:49 - Epoch[10] - Iter: [100/298] Time 2.467 (2.672) Data 1622540435.687 (1622540435.857) Loss 2.2242 (2.2072) Prec 56.250 (55.600) LR 0.001 INFO - 08/24/21 20:45:16 - 4:58:54 - Epoch[10] - Iter: [150/298] Time 2.476 (2.612) Data 1622540435.687 (1622540435.801) Loss 2.2163 (2.2074) Prec 56.250 (55.277) LR 0.001 INFO - 08/24/21 20:47:21 - 5:00:58 - Epoch[10] - Iter: [200/298] Time 2.467 (2.581) Data 1622540435.687 (1622540435.773) Loss 2.0947 (2.2096) Prec 56.250 (55.317) LR 0.001 INFO - 08/24/21 20:49:25 - 5:03:02 - Epoch[10] - Iter: [250/298] Time 2.480 (2.562) Data 1622540435.687 (1622540435.756) Loss 2.3103 (2.2035) Prec 53.125 (55.640) LR 0.001 INFO - 08/24/21 20:51:22 - 5:04:59 - Start evaluating epoch: 10 INFO - 08/24/21 21:46:14 - 5:59:51 - Test: Time 0.928 Loss 1.6248 ClipAcc@1 59.569 VidAcc@1 65.054 INFO - 08/24/21 21:46:14 - 5:59:51 - Saving checkpoint to: . Saving checkpoint to: . INFO - 08/24/21 21:46:16 - 5:59:53 - Start training epoch: 11 INFO - 08/24/21 21:46:37 - 6:00:14 - Epoch[11] - Iter: [0/298] Time 20.990 (20.990) Data 1622540454.127 (1622540454.127) Loss 2.4158 (2.4158) Prec 46.875 (46.875) LR 5.000000000000001e-05 INFO - 08/24/21 21:48:42 - 6:02:19 - Epoch[11] - Iter: [50/298] Time 2.492 (2.859) Data 1622540435.687 (1622540436.049) Loss 2.0902 (2.2294) Prec 62.500 (53.799) LR 5.000000000000001e-05 INFO - 08/24/21 21:50:45 - 6:04:22 - Epoch[11] - Iter: [100/298] Time 2.476 (2.666) Data 1622540435.687 (1622540435.870) Loss 1.9649 (2.2114) Prec 71.875 (54.920) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:52:49 - 6:06:26 - Epoch[11] - Iter: [150/298] Time 2.461 (2.603) Data 1622540435.687 (1622540435.809) Loss 2.1620 (2.1977) Prec 53.125 (55.671) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:54:53 - 6:08:30 - Epoch[11] - Iter: [200/298] Time 2.481 (2.573) Data 1622540435.687 (1622540435.779) Loss 2.1617 (2.1944) Prec 59.375 (55.955) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:56:57 - 6:10:35 - Epoch[11] - Iter: [250/298] Time 2.478 (2.555) Data 1622540435.688 (1622540435.761) Loss 2.1179 (2.1925) Prec 65.625 (56.387) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:58:54 - 6:12:31 - Start evaluating epoch: 11 INFO - 08/24/21 22:50:11 - 7:03:49 - Test: Time 0.867 Loss 1.6048 ClipAcc@1 59.339 VidAcc@1 64.261 INFO - 08/24/21 22:50:11 - 7:03:49 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 22:50:13 - 7:03:50 - Training time 7:03:44 INFO - 08/24/21 22:50:13 - 7:03:50 - 3-Fold (ucf101): Vid Acc@1 65.054, Video Acc@5 89.902

Can you provide some insights? I

Can't reproduce ucf-101 finetune resuts

Hi, Thanks for releasing the code. I tried to reproduce the ucf101 fine-tuning results on your GDT_kinetics pretrained model. However, the results seem quite far away. I didn't change any hyperparameters while finetuning. Here is my command CUDA_VISIBLE_DEVICES=0,1,2,3 python3 eval_video.py --dataset ucf101 --fold 1 --weights-path ./pretrained/gdt_K400.pth --model av_gdt --root_dir /local-ssd/fmthoker/ucf101/video/ --ucf101-annotation-path /localssd/fmthoker/ucf101/ucfTrainTestlist/

Logs

Evaluating on folds: [1] INFO - 08/24/21 15:46:23 - 0:00:00 - ============ Initialized logger ============ INFO - 08/24/21 15:46:23 - 0:00:00 - agg_model: False aud_base_arch: resnet9 aud_sample_rate: 24000 aud_spec_type: 2 audio_augtype: none base_lr: 0.00025 batch_size: 32 ckpt_epoch: 0 clip_len: 32 colorjitter: True cross_modal_alpha: 0.5 cross_modal_nce: True dataset: ucf101 dp: 0.0 dump_checkpoints: ./checkpoints dump_path: . epochs: 12 feature_extract: False fm_crop: False fold: 1 head_lr: 0.0025 headcount: 1 hmdb51_annotation_path: /datasets01/hmdb51/112018/splits/ lr_gamma: 0.05 lr_milestones: 6,10 lr_warmup_epochs: 2 mlptype: 0 model: av_gdt momentum: 0.9 multi_crop: False num_data_samples: None num_frames: 32 num_head: 4 num_large_crops: 1 num_layer: 2 num_sec: 2 num_sec_aud: 1 num_small_crops: 0 num_spatial_crops: 3 optim_name: sgd output_dir: . positional_emb: False pretrained: False print_freq: 10 qkv_mha: False rank: 0 resume: root_dir: /local-ssd/fmthoker/ucf101/video/ sample_rate: 1 start_epoch: 0 steps_bet_clips: 1 supervised: False target_fps: 30 test_crop_size: 128 test_only: False test_time_cj: False train_clips_per_video: 10 train_crop_size: 128 transformer_time_dim: 8 tsf_lr: 0.00025 ucf101_annotation_path: /local-ssd/fmthoker/ucf101/ucfTrainTestlist/ use_audio_temp_jittering: False use_bn: False use_dropout: False use_gaussian: False use_grayscale: False use_l2_norm: False use_larger_last: False use_mlp: False use_random_resize_crop: True use_scheduler: True use_volume_jittering: True val_clips_per_video: 10 vid_base_arch: r2plus1d_18 wd_base: 0.005 wd_tsf: 0.005 weight_decay: 0.005 weights_path: ./pretrained/gdt_K400.pth workers: 16 z_normalize: False

INFO - 08/24/21 15:46:23 - 0:00:00 - Loading model Using Audio-Visual GDT Using GDT model {'block': <class 'src.vmz.BasicBlock'>, 'conv_makers': [<class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>], 'layers': [2, 2, 2, 2], 'stem': <class 'src.vmz.R2Plus1dStem'>, 'larger_last': False} Randomy initializing models resnet9, duration: 1 Using Linear Layer INFO - 08/24/21 15:46:23 - 0:00:01 - Loading model weights INFO - 08/24/21 15:46:27 - 0:00:04 - Epoch checkpoint: 101 didnt load mlp_v.block_forward.2.weight didnt load mlp_v.block_forward.4.weight didnt load mlp_v.block_forward.4.bias didnt load mlp_v.block_forward.4.running_mean didnt load mlp_v.block_forward.4.running_var didnt load mlp_v.block_forward.4.num_batches_tracked didnt load mlp_v.block_forward.8.weight didnt load mlp_v.block_forward.8.bias didnt load mlp_a.block_forward.2.weight didnt load mlp_a.block_forward.4.weight didnt load mlp_a.block_forward.4.bias didnt load mlp_a.block_forward.4.running_mean didnt load mlp_a.block_forward.4.running_var didnt load mlp_a.block_forward.4.num_batches_tracked didnt load mlp_a.block_forward.8.weight didnt load mlp_a.block_forward.8.bias INFO - 08/24/21 15:46:27 - 0:00:04 - Loading model done Using non-agg GDT model Classifier to 101 classes; INFO - 08/24/21 15:46:27 - 0:00:04 - Getting params for finetuning INFO - 08/24/21 15:46:27 - 0:00:04 - ('weight', torch.Size([101, 512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('bias', torch.Size([101])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.0.weight', torch.Size([45, 3, 1, 7, 7])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.1.weight', torch.Size([45])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.1.bias', torch.Size([45])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.3.weight', torch.Size([64, 45, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.4.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.4.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.0.weight', torch.Size([230, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.1.weight', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.1.bias', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.3.weight', torch.Size([128, 230, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.0.weight', torch.Size([230, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.1.weight', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.1.bias', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.3.weight', torch.Size([128, 230, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.0.weight', torch.Size([128, 64, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.0.weight', torch.Size([288, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.1.weight', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.1.bias', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.3.weight', torch.Size([128, 288, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.0.weight', torch.Size([288, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.1.weight', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.1.bias', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.3.weight', torch.Size([128, 288, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.0.weight', torch.Size([460, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.1.weight', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.1.bias', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.3.weight', torch.Size([256, 460, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.0.weight', torch.Size([460, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.1.weight', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.1.bias', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.3.weight', torch.Size([256, 460, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.0.weight', torch.Size([256, 128, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.0.weight', torch.Size([576, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.1.weight', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.1.bias', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.3.weight', torch.Size([256, 576, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.0.weight', torch.Size([576, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.1.weight', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.1.bias', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.3.weight', torch.Size([256, 576, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.0.weight', torch.Size([921, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.1.weight', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.1.bias', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.3.weight', torch.Size([512, 921, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.0.weight', torch.Size([921, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.1.weight', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.1.bias', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.3.weight', torch.Size([512, 921, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.0.weight', torch.Size([512, 256, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.0.weight', torch.Size([1152, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.1.weight', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.1.bias', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.3.weight', torch.Size([512, 1152, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.0.weight', torch.Size([1152, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.1.weight', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.1.bias', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.3.weight', torch.Size([512, 1152, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ===========Check Grad============ INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.4.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.4.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('classifier.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('classifier.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - =================================

INFO - 08/24/21 15:46:27 - 0:00:04 - Creating AV Datasets Constructing ucf101 train... /local-ssd/fmthoker/ucf101/video/ datasets/data/ucf101_train.txt ['/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeM akeup/v_ApplyEyeMakeup_g01_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi', '/local -ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c06.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ ApplyEyeMakeup_g02_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c04.avi'] Constructing ucf101 dataloader (size: 13320) from datasets/data/ucf101_train.txt /local-ssd/fmthoker/ucf101/ucfTrainTestlist/trainlist01.txt Total number of videos: 13320, Valid videos: 9537 Constructing ucf101 test... /local-ssd/fmthoker/ucf101/video/ datasets/data/ucf101_test.txt ['/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeM akeup/v_ApplyEyeMakeup_g01_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi', '/local -ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c06.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ ApplyEyeMakeup_g02_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c04.avi'] Constructing ucf101 dataloader (size: 399600) from datasets/data/ucf101_test.txt /local-ssd/fmthoker/ucf101/ucfTrainTestlist/testlist01.txt Total number of videos: 399600, Valid videos: 113490 INFO - 08/24/21 15:46:28 - 0:00:06 - Creating data loaders INFO - 08/24/21 15:46:28 - 0:00:06 - Using SGD with lr: 0.0025, wd: 0.005 INFO - 08/24/21 15:46:28 - 0:00:06 - Num. of Epochs: 12, Milestones: [4, 8] INFO - 08/24/21 15:46:28 - 0:00:06 - Using scheduler with 2 warmup epochs INFO - 08/24/21 15:46:28 - 0:00:06 - Start training epoch: 0 INFO - 08/24/21 15:47:20 - 0:00:57 - Epoch[0] - Iter: [0/298] Time 51.689 (51.689) Data 1622540465.039 (1622540465.039) Loss 4.6286 (4.6286) Prec 3.125 (3.125) LR 0.0025 INFO - 08/24/21 15:49:26 - 0:03:03 - Epoch[0] - Iter: [50/298] Time 2.506 (3.487) Data 1622540435.687 (1622540436.263) Loss 4.5635 (4.6110) Prec 0.000 (1.164) LR 0.0025 INFO - 08/24/21 15:51:31 - 0:05:08 - Epoch[0] - Iter: [100/298] Time 2.493 (2.998) Data 1622540435.687 (1622540435.978) Loss 4.6392 (4.6082) Prec 0.000 (1.145) LR 0.0025 INFO - 08/24/21 15:53:36 - 0:07:13 - Epoch[0] - Iter: [150/298] Time 2.481 (2.832) Data 1622540435.687 (1622540435.882) Loss 4.5959 (4.6050) Prec 0.000 (1.407) LR 0.0025 INFO - 08/24/21 15:55:41 - 0:09:18 - Epoch[0] - Iter: [200/298] Time 2.482 (2.748) Data 1622540435.687 (1622540435.833) Loss 4.5906 (4.6011) Prec 0.000 (1.632) LR 0.0025 INFO - 08/24/21 15:57:45 - 0:11:23 - Epoch[0] - Iter: [250/298] Time 2.542 (2.698) Data 1622540435.687 (1622540435.804) Loss 4.6088 (4.5984) Prec 3.125 (1.755) LR 0.0025 INFO - 08/24/21 15:59:43 - 0:13:20 - Start evaluating epoch: 0 INFO - 08/24/21 15:59:43 - 0:13:20 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 15:59:45 - 0:13:22 - Start training epoch: 1 INFO - 08/24/21 16:00:10 - 0:13:47 - Epoch[1] - Iter: [0/298] Time 25.480 (25.480) Data 1622540458.407 (1622540458.407) Loss 4.5279 (4.5279) Prec 9.375 (9.375) LR 0.01125 INFO - 08/24/21 16:02:16 - 0:15:53 - Epoch[1] - Iter: [50/298] Time 2.515 (2.965) Data 1622540435.688 (1622540436.133) Loss 4.5346 (4.5540) Prec 12.500 (3.799) LR 0.01125 INFO - 08/24/21 16:04:21 - 0:17:58 - Epoch[1] - Iter: [100/298] Time 2.510 (2.735) Data 1622540435.688 (1622540435.912) Loss 4.5357 (4.5358) Prec 6.250 (5.972) LR 0.01125 INFO - 08/24/21 16:06:26 - 0:20:03 - Epoch[1] - Iter: [150/298] Time 2.532 (2.659) Data 1622540435.687 (1622540435.838) Loss 4.4710 (4.5186) Prec 15.625 (7.554) LR 0.01125 INFO - 08/24/21 16:08:31 - 0:22:09 - Epoch[1] - Iter: [200/298] Time 2.522 (2.621) Data 1622540435.687 (1622540435.801) Loss 4.3993 (4.5033) Prec 31.250 (8.893) LR 0.01125 INFO - 08/24/21 16:10:37 - 0:24:14 - Epoch[1] - Iter: [250/298] Time 2.525 (2.597) Data 1622540435.688 (1622540435.778) Loss 4.4011 (4.4881) Prec 6.250 (9.587) LR 0.01125 INFO - 08/24/21 16:12:34 - 0:26:11 - Start evaluating epoch: 1 INFO - 08/24/21 16:12:34 - 0:26:11 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:12:36 - 0:26:13 - Start training epoch: 2 INFO - 08/24/21 16:13:06 - 0:26:43 - Epoch[2] - Iter: [0/298] Time 30.481 (30.481) Data 1622540463.503 (1622540463.503) Loss 4.3571 (4.3571) Prec 9.375 (9.375) LR 0.02 INFO - 08/24/21 16:15:11 - 0:28:49 - Epoch[2] - Iter: [50/298] Time 2.522 (3.056) Data 1622540435.687 (1622540436.233) Loss 4.3466 (4.3461) Prec 12.500 (16.238) LR 0.02 INFO - 08/24/21 16:17:17 - 0:30:54 - Epoch[2] - Iter: [100/298] Time 2.501 (2.784) Data 1622540435.687 (1622540435.963) Loss 4.2123 (4.3223) Prec 25.000 (16.677) LR 0.02 INFO - 08/24/21 16:19:22 - 0:33:00 - Epoch[2] - Iter: [150/298] Time 2.521 (2.693) Data 1622540435.688 (1622540435.872) Loss 4.1464 (4.2927) Prec 21.875 (17.632) LR 0.02 INFO - 08/24/21 16:21:28 - 0:35:05 - Epoch[2] - Iter: [200/298] Time 2.502 (2.647) Data 1622540435.687 (1622540435.826) Loss 4.1699 (4.2582) Prec 21.875 (18.968) LR 0.02 INFO - 08/24/21 16:23:33 - 0:37:10 - Epoch[2] - Iter: [250/298] Time 2.498 (2.618) Data 1622540435.688 (1622540435.798) Loss 3.9341 (4.2171) Prec 31.250 (20.319) LR 0.02 INFO - 08/24/21 16:25:30 - 0:39:07 - Start evaluating epoch: 2 INFO - 08/24/21 16:25:30 - 0:39:07 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:25:31 - 0:39:09 - Start training epoch: 3 INFO - 08/24/21 16:26:01 - 0:39:39 - Epoch[3] - Iter: [0/298] Time 29.839 (29.839) Data 1622540462.845 (1622540462.845) Loss 3.8673 (3.8673) Prec 34.375 (34.375) LR 0.02 INFO - 08/24/21 16:28:07 - 0:41:44 - Epoch[3] - Iter: [50/298] Time 2.520 (3.049) Data 1622540435.688 (1622540436.220) Loss 4.0330 (3.9060) Prec 21.875 (26.042) LR 0.02 INFO - 08/24/21 16:30:12 - 0:43:49 - Epoch[3] - Iter: [100/298] Time 2.488 (2.780) Data 1622540435.687 (1622540435.956) Loss 3.8576 (3.8755) Prec 15.625 (26.300) LR 0.02 INFO - 08/24/21 16:32:17 - 0:45:55 - Epoch[3] - Iter: [150/298] Time 2.495 (2.689) Data 1622540435.687 (1622540435.867) Loss 3.6733 (3.8285) Prec 37.500 (27.918) LR 0.02 INFO - 08/24/21 16:34:23 - 0:48:00 - Epoch[3] - Iter: [200/298] Time 2.535 (2.643) Data 1622540435.688 (1622540435.823) Loss 3.4600 (3.7857) Prec 34.375 (28.296) LR 0.02 INFO - 08/24/21 16:36:28 - 0:50:06 - Epoch[3] - Iter: [250/298] Time 2.496 (2.617) Data 1622540435.687 (1622540435.796) Loss 3.2958 (3.7393) Prec 34.375 (29.109) LR 0.02 INFO - 08/24/21 16:38:26 - 0:52:03 - Start evaluating epoch: 3 INFO - 08/24/21 16:38:26 - 0:52:03 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:38:27 - 0:52:05 - Start training epoch: 4 INFO - 08/24/21 16:38:43 - 0:52:20 - Epoch[4] - Iter: [0/298] Time 15.868 (15.868) Data 1622540448.829 (1622540448.829) Loss 3.5739 (3.5739) Prec 31.250 (31.250) LR 0.02 INFO - 08/24/21 16:40:55 - 0:54:32 - Epoch[4] - Iter: [50/298] Time 2.526 (2.889) Data 1622540435.688 (1622540436.030) Loss 3.4294 (3.3843) Prec 28.125 (32.353) LR 0.02 INFO - 08/24/21 16:43:00 - 0:56:37 - Epoch[4] - Iter: [100/298] Time 2.492 (2.700) Data 1622540435.687 (1622540435.860) Loss 3.1331 (3.3321) Prec 43.750 (33.694) LR 0.02 INFO - 08/24/21 16:45:05 - 0:58:43 - Epoch[4] - Iter: [150/298] Time 2.508 (2.636) Data 1622540435.688 (1622540435.803) Loss 3.5533 (3.2995) Prec 31.250 (34.085) LR 0.02 INFO - 08/24/21 16:47:10 - 1:00:48 - Epoch[4] - Iter: [200/298] Time 2.507 (2.602) Data 1622540435.687 (1622540435.774) Loss 3.0330 (3.2618) Prec 28.125 (34.686) LR 0.02 INFO - 08/24/21 16:49:16 - 1:02:53 - Epoch[4] - Iter: [250/298] Time 2.555 (2.584) Data 1622540435.688 (1622540435.757) Loss 3.2438 (3.2210) Prec 28.125 (35.545) LR 0.02 INFO - 08/24/21 16:51:14 - 1:04:51 - Start evaluating epoch: 4 INFO - 08/24/21 16:51:14 - 1:04:51 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:51:15 - 1:04:52 - Start training epoch: 5 INFO - 08/24/21 16:51:32 - 1:05:09 - Epoch[5] - Iter: [0/298] Time 16.828 (16.828) Data 1622540449.801 (1622540449.801) Loss 2.7837 (2.7837) Prec 46.875 (46.875) LR 0.02 INFO - 08/24/21 16:53:42 - 1:07:19 - Epoch[5] - Iter: [50/298] Time 2.554 (2.875) Data 1622540435.687 (1622540436.010) Loss 2.7985 (2.9303) Prec 46.875 (40.686) LR 0.02 INFO - 08/24/21 16:55:47 - 1:09:24 - Epoch[5] - Iter: [100/298] Time 2.498 (2.695) Data 1622540435.688 (1622540435.851) Loss 2.9752 (2.8821) Prec 31.250 (41.770) LR 0.02 INFO - 08/24/21 16:57:53 - 1:11:30 - Epoch[5] - Iter: [150/298] Time 2.511 (2.633) Data 1622540435.687 (1622540435.797) Loss 2.9061 (2.8535) Prec 34.375 (41.846) LR 0.02 INFO - 08/24/21 16:59:58 - 1:13:35 - Epoch[5] - Iter: [200/298] Time 2.488 (2.602) Data 1622540435.687 (1622540435.769) Loss 2.8021 (2.8290) Prec 40.625 (41.962) LR 0.02 INFO - 08/24/21 17:02:03 - 1:15:40 - Epoch[5] - Iter: [250/298] Time 2.508 (2.582) Data 1622540435.687 (1622540435.753) Loss 2.8373 (2.8024) Prec 43.750 (42.418) LR 0.02 INFO - 08/24/21 17:04:01 - 1:17:38 - Start evaluating epoch: 5 INFO - 08/24/21 17:04:01 - 1:17:38 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 17:04:02 - 1:17:39 - Start training epoch: 6 INFO - 08/24/21 17:04:31 - 1:18:08 - Epoch[6] - Iter: [0/298] Time 29.058 (29.058) Data 1622540462.031 (1622540462.031) Loss 2.4810 (2.4810) Prec 50.000 (50.000) LR 0.02 INFO - 08/24/21 17:06:37 - 1:20:14 - Epoch[6] - Iter: [50/298] Time 2.504 (3.039) Data 1622540435.687 (1622540436.204) Loss 2.4081 (2.5313) Prec 59.375 (49.142) LR 0.02 INFO - 08/24/21 17:08:43 - 1:22:20 - Epoch[6] - Iter: [100/298] Time 2.572 (2.777) Data 1622540435.687 (1622540435.948) Loss 2.4269 (2.5146) Prec 50.000 (49.196) LR 0.02 INFO - 08/24/21 17:10:48 - 1:24:26 - Epoch[6] - Iter: [150/298] Time 2.486 (2.690) Data 1622540435.687 (1622540435.862) Loss 2.1894 (2.4784) Prec 56.250 (49.400) LR 0.02 INFO - 08/24/21 17:12:54 - 1:26:32 - Epoch[6] - Iter: [200/298] Time 2.515 (2.647) Data 1622540435.687 (1622540435.819) Loss 2.4088 (2.4631) Prec 40.625 (49.114) LR 0.02 INFO - 08/24/21 17:15:00 - 1:28:37 - Epoch[6] - Iter: [250/298] Time 2.673 (2.620) Data 1622540435.688 (1622540435.792) Loss 2.2423 (2.4371) Prec 46.875 (49.589) LR 0.02 INFO - 08/24/21 17:16:57 - 1:30:35 - Start evaluating epoch: 6 INFO - 08/24/21 17:16:57 - 1:30:35 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 17:16:59 - 1:30:36 - Start training epoch: 7 INFO - 08/24/21 17:17:21 - 1:30:58 - Epoch[7] - Iter: [0/298] Time 22.183 (22.183) Data 1622540455.122 (1622540455.122) Loss 2.2278 (2.2278) Prec 50.000 (50.000) LR 0.001 INFO - 08/24/21 17:19:28 - 1:33:05 - Epoch[7] - Iter: [50/298] Time 2.526 (2.930) Data 1622540435.687 (1622540436.068) Loss 1.8457 (2.2670) Prec 75.000 (52.696) LR 0.001 INFO - 08/24/21 17:21:34 - 1:35:11 - Epoch[7] - Iter: [100/298] Time 2.531 (2.723) Data 1622540435.688 (1622540435.880) Loss 2.3264 (2.2734) Prec 56.250 (52.259) LR 0.001 INFO - 08/24/21 17:23:39 - 1:37:16 - Epoch[7] - Iter: [150/298] Time 2.518 (2.652) Data 1622540435.688 (1622540435.816) Loss 2.4978 (2.2473) Prec 50.000 (52.918) LR 0.001 INFO - 08/24/21 17:25:44 - 1:39:21 - Epoch[7] - Iter: [200/298] Time 2.524 (2.614) Data 1622540435.687 (1622540435.784) Loss 2.1839 (2.2458) Prec 37.500 (53.420) LR 0.001 INFO - 08/24/21 17:27:49 - 1:41:27 - Epoch[7] - Iter: [250/298] Time 2.501 (2.593) Data 1622540435.687 (1622540435.765) Loss 2.2702 (2.2481) Prec 43.750 (53.536) LR 0.001 INFO - 08/24/21 17:29:47 - 1:43:24 - Start evaluating epoch: 7 INFO - 08/24/21 18:25:12 - 2:38:49 - Test: Time 0.937 Loss 1.6872 ClipAcc@1 58.206 VidAcc@1 63.283 INFO - 08/24/21 18:25:12 - 2:38:49 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 18:25:13 - 2:38:50 - Start training epoch: 8 INFO - 08/24/21 18:25:34 - 2:39:11 - Epoch[8] - Iter: [0/298] Time 20.517 (20.517) Data 1622540451.545 (1622540451.545) Loss 2.4127 (2.4127) Prec 40.625 (40.625) LR 0.001 INFO - 08/24/21 18:27:38 - 2:41:16 - Epoch[8] - Iter: [50/298] Time 2.465 (2.850) Data 1622540435.687 (1622540435.998) Loss 2.1980 (2.2194) Prec 56.250 (57.782) LR 0.001 INFO - 08/24/21 18:29:42 - 2:43:20 - Epoch[8] - Iter: [100/298] Time 2.468 (2.665) Data 1622540435.687 (1622540435.844) Loss 2.4523 (2.2355) Prec 65.625 (56.528) LR 0.001 INFO - 08/24/21 18:31:47 - 2:45:24 - Epoch[8] - Iter: [150/298] Time 2.509 (2.606) Data 1622540435.688 (1622540435.792) Loss 1.9243 (2.2228) Prec 68.750 (55.877) LR 0.001 INFO - 08/24/21 18:33:51 - 2:47:28 - Epoch[8] - Iter: [200/298] Time 2.477 (2.576) Data 1622540435.687 (1622540435.766) Loss 2.2771 (2.2333) Prec 56.250 (55.084) LR 0.001 INFO - 08/24/21 18:35:55 - 2:49:33 - Epoch[8] - Iter: [250/298] Time 2.485 (2.558) Data 1622540435.687 (1622540435.751) Loss 2.2006 (2.2320) Prec 46.875 (54.918) LR 0.001 INFO - 08/24/21 18:37:52 - 2:51:29 - Start evaluating epoch: 8 INFO - 08/24/21 19:31:41 - 3:45:19 - Test: Time 0.910 Loss 1.6609 ClipAcc@1 58.079 VidAcc@1 63.574 INFO - 08/24/21 19:31:42 - 3:45:19 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 19:31:43 - 3:45:20 - Start training epoch: 9 INFO - 08/24/21 19:32:02 - 3:45:39 - Epoch[9] - Iter: [0/298] Time 19.203 (19.203) Data 1622540452.256 (1622540452.256) Loss 2.4376 (2.4376) Prec 34.375 (34.375) LR 0.001 INFO - 08/24/21 19:34:07 - 3:47:44 - Epoch[9] - Iter: [50/298] Time 2.502 (2.829) Data 1622540435.687 (1622540436.012) Loss 2.1187 (2.2550) Prec 56.250 (54.289) LR 0.001 INFO - 08/24/21 19:36:11 - 3:49:49 - Epoch[9] - Iter: [100/298] Time 2.496 (2.660) Data 1622540435.687 (1622540435.851) Loss 2.2576 (2.2123) Prec 59.375 (55.600) LR 0.001 INFO - 08/24/21 19:38:16 - 3:51:53 - Epoch[9] - Iter: [150/298] Time 2.525 (2.605) Data 1622540435.687 (1622540435.797) Loss 2.2289 (2.2205) Prec 56.250 (54.843) LR 0.001 INFO - 08/24/21 19:40:20 - 3:53:58 - Epoch[9] - Iter: [200/298] Time 2.480 (2.575) Data 1622540435.687 (1622540435.770) Loss 2.1648 (2.2221) Prec 46.875 (54.773) LR 0.001 INFO - 08/24/21 19:42:25 - 3:56:02 - Epoch[9] - Iter: [250/298] Time 2.496 (2.558) Data 1622540435.687 (1622540435.753) Loss 2.1566 (2.2231) Prec 59.375 (54.905) LR 0.001 INFO - 08/24/21 19:44:22 - 3:57:59 - Start evaluating epoch: 9 INFO - 08/24/21 20:38:41 - 4:52:18 - Test: Time 0.919 Loss 1.6590 ClipAcc@1 58.412 VidAcc@1 62.781 INFO - 08/24/21 20:38:41 - 4:52:18 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 20:38:42 - 4:52:19 - Start training epoch: 10 INFO - 08/24/21 20:39:00 - 4:52:37 - Epoch[10] - Iter: [0/298] Time 18.026 (18.026) Data 1622540451.023 (1622540451.023) Loss 2.1097 (2.1097) Prec 65.625 (65.625) LR 0.001 INFO - 08/24/21 20:41:08 - 4:54:45 - Epoch[10] - Iter: [50/298] Time 2.502 (2.856) Data 1622540435.687 (1622540436.024) Loss 2.4694 (2.1863) Prec 46.875 (56.924) LR 0.001 INFO - 08/24/21 20:43:12 - 4:56:49 - Epoch[10] - Iter: [100/298] Time 2.467 (2.672) Data 1622540435.687 (1622540435.857) Loss 2.2242 (2.2072) Prec 56.250 (55.600) LR 0.001 INFO - 08/24/21 20:45:16 - 4:58:54 - Epoch[10] - Iter: [150/298] Time 2.476 (2.612) Data 1622540435.687 (1622540435.801) Loss 2.2163 (2.2074) Prec 56.250 (55.277) LR 0.001 INFO - 08/24/21 20:47:21 - 5:00:58 - Epoch[10] - Iter: [200/298] Time 2.467 (2.581) Data 1622540435.687 (1622540435.773) Loss 2.0947 (2.2096) Prec 56.250 (55.317) LR 0.001 INFO - 08/24/21 20:49:25 - 5:03:02 - Epoch[10] - Iter: [250/298] Time 2.480 (2.562) Data 1622540435.687 (1622540435.756) Loss 2.3103 (2.2035) Prec 53.125 (55.640) LR 0.001 INFO - 08/24/21 20:51:22 - 5:04:59 - Start evaluating epoch: 10 INFO - 08/24/21 21:46:14 - 5:59:51 - Test: Time 0.928 Loss 1.6248 ClipAcc@1 59.569 VidAcc@1 65.054 INFO - 08/24/21 21:46:14 - 5:59:51 - Saving checkpoint to: . Saving checkpoint to: . INFO - 08/24/21 21:46:16 - 5:59:53 - Start training epoch: 11 INFO - 08/24/21 21:46:37 - 6:00:14 - Epoch[11] - Iter: [0/298] Time 20.990 (20.990) Data 1622540454.127 (1622540454.127) Loss 2.4158 (2.4158) Prec 46.875 (46.875) LR 5.000000000000001e-05 INFO - 08/24/21 21:48:42 - 6:02:19 - Epoch[11] - Iter: [50/298] Time 2.492 (2.859) Data 1622540435.687 (1622540436.049) Loss 2.0902 (2.2294) Prec 62.500 (53.799) LR 5.000000000000001e-05 INFO - 08/24/21 21:50:45 - 6:04:22 - Epoch[11] - Iter: [100/298] Time 2.476 (2.666) Data 1622540435.687 (1622540435.870) Loss 1.9649 (2.2114) Prec 71.875 (54.920) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:52:49 - 6:06:26 - Epoch[11] - Iter: [150/298] Time 2.461 (2.603) Data 1622540435.687 (1622540435.809) Loss 2.1620 (2.1977) Prec 53.125 (55.671) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:54:53 - 6:08:30 - Epoch[11] - Iter: [200/298] Time 2.481 (2.573) Data 1622540435.687 (1622540435.779) Loss 2.1617 (2.1944) Prec 59.375 (55.955) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:56:57 - 6:10:35 - Epoch[11] - Iter: [250/298] Time 2.478 (2.555) Data 1622540435.688 (1622540435.761) Loss 2.1179 (2.1925) Prec 65.625 (56.387) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:58:54 - 6:12:31 - Start evaluating epoch: 11 INFO - 08/24/21 22:50:11 - 7:03:49 - Test: Time 0.867 Loss 1.6048 ClipAcc@1 59.339 VidAcc@1 64.261 INFO - 08/24/21 22:50:11 - 7:03:49 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 22:50:13 - 7:03:50 - Training time 7:03:44 INFO - 08/24/21 22:50:13 - 7:03:50 - 3-Fold (ucf101): Vid Acc@1 65.054, Video Acc@5 89.902

Can you provide some insights? I

opened by fmthoker 4
AttributeError: 'AVideoDataset' object has no attribute '_split_idx'

Thank you for sharing such a good work! I am trying to run the evaluation on UCF101, however I encountered the error

Traceback (most recent call last): File "/home/huwang/huwang/experiments/multimodal/GDT/eval_video.py", line 837, in <module> best_acc1, best_acc5, best_epoch = main(args, writer) File "/home/huwang/huwang/experiments/multimodal/GDT/eval_video.py", line 321, in main args=args, File "/media/data/huwang/experiments/multimodal/GDT/datasets/AVideoDataset.py", line 251, in __init__ self._construct_loader() File "/media/data/huwang/experiments/multimodal/GDT/datasets/AVideoDataset.py", line 288, in _construct_loader self.ds_name, self._split_idx, path_to_file AttributeError: 'AVideoDataset' object has no attribute '_split_idx'

Could you shed some light on how to solve this issue? Thank you!

opened by billhhh 2
No valid videos are found

Whenever I try to pretrain using the code , the valid videos are shown as 0 . Well I have tried working with the code available in the supplementary material for the paper "Multi-modal Self-Supervision from Generalized Data Transformations" , though there were few errors , but valid videos were not zero. Is there in any difference in that code , for checking valid video?

opened by asharani97 0
Adding Code of Conduct file

This is pull request was created automatically because we noticed your project was missing a Code of Conduct file.

Code of Conduct files facilitate respectful and constructive communities by establishing expected behaviors for project contributors.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0
Adding Contributing file

This is pull request was created automatically because we noticed your project was missing a Contributing file.

CONTRIBUTING files explain how a developer can contribute to the project - which you should actively encourage.

This PR was crafted with love by Facebook's Open Source Team.
CLA Signed

opened by facebook-github-bot 0

VERY SLOW training on audio-video dataset like kinetics400 and UCF101

Hi authors! Thank you for making the paper and code open source. It is very helpful. I am trying to pretrain the GDT model on kinetics400 dataset, while I spent more than 1 day on each epoch. I run on the 8 3090 GPU server and set the batch size on each GPU to 16, and the total batch size is 128, which is a quarter of the original setting in the paper. According to the paper, the authors spent 3 days on pretraining with 512 batch size, under normal circumstances it should not cost more than 3 hours on each epoch. I change the video decode method from pyav to decord, which brings a bit of improvement in training speed. I wonder if the speed of the provided code is tested before release? What should I do to find the cues for speeding up training?

Some logs below:

Epoch: [0]  [  360/14961]  eta: 13:42:52  lr: 0.01  clips/s: 16.263  loss: 2.7961 (2.8411)  batch_t/s: 1.0088 (1.4428)  time: 2.8681  data: 1.3705  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:51  lr: 0.01  clips/s: 13.694  loss: 2.7992 (2.8464)  batch_t/s: 1.0067 (1.0740)  time: 4.3781  data: 3.3474  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:48  lr: 0.01  clips/s: 13.769  loss: 2.7919 (2.8454)  batch_t/s: 1.0110 (1.7200)  time: 4.3779  data: 1.3611  max mem: 20040
Epoch: [0]  [  370/14961]  eta: 13:46:48  lr: 0.01  clips/s: 13.532  loss: 2.7913 (2.8402)  batch_t/s: 1.0089 (1.4563)  time: 4.3786  data: 2.4327  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:23  lr: 0.01  clips/s: 14.072  loss: 2.7891 (2.8451)  batch_t/s: 1.0196 (1.0736)  time: 2.5644  data: 1.5199  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:20  lr: 0.01  clips/s: 14.029  loss: 2.7738 (2.8434)  batch_t/s: 1.0512 (1.7027)  time: 2.5646  data: 0.5402  max mem: 20040
Epoch: [0]  [  380/14961]  eta: 13:31:19  lr: 0.01  clips/s: 14.026  loss: 2.7874 (2.8387)  batch_t/s: 1.0548 (1.4459)  time: 2.5643  data: 1.0631  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:54  lr: 0.01  clips/s: 15.097  loss: 2.7765 (2.8417)  batch_t/s: 1.0534 (1.7432)  time: 2.6929  data: 0.5196  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:56  lr: 0.01  clips/s: 14.988  loss: 2.7927 (2.8441)  batch_t/s: 1.0630 (1.0732)  time: 2.6932  data: 1.6344  max mem: 20040
Epoch: [0]  [  390/14961]  eta: 13:36:53  lr: 0.01  clips/s: 16.121  loss: 2.7775 (2.8376)  batch_t/s: 1.0481 (1.4640)  time: 2.6923  data: 1.0834  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:48  lr: 0.01  clips/s: 16.551  loss: 2.7957 (2.8433)  batch_t/s: 1.0546 (1.0725)  time: 4.4575  data: 3.4058  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:45  lr: 0.01  clips/s: 1.458  loss: 2.7986 (2.8373)  batch_t/s: 1.0390 (1.4786)  time: 4.4577  data: 2.3538  max mem: 20040
Epoch: [0]  [  400/14961]  eta: 13:43:46  lr: 0.01  clips/s: 0.679  loss: 2.7963 (2.8410)  batch_t/s: 1.0598 (1.7822)  time: 4.4580  data: 1.1610  max mem: 20040
Epoch: [0]  [  410/14961]  eta: 13:29:18  lr: 0.01  clips/s: 15.575  loss: 2.7954 (2.8418)  batch_t/s: 1.0273 (1.0715)  time: 2.8114  data: 1.7718  max mem: 20040
Epoch: [0]  [  410/14961]  eta: 13:29:15  lr: 0.01  clips/s: 15.525  loss: 2.7892 (2.8399)  batch_t/s: 1.0306 (1.7639)  time: 2.8114  data: 0.6421  max mem: 20040

Sincerely yours.

opened by XinyuSun 3

Pretrained STiCA model

Hi, Can you share the Fully supervised Kinetics trained R(2+1D)-18 and the Kinetics pretrained STiCA models? I am doing a self-supervised learning survey where I am comparing different self-supervised methods. I would like to include your STiCa method and a comparison with fully supervised learning too. Hoping for a positive response.

opened by fmthoker 0

We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Related tags

Overview

Multi-Modal Self-Supervision using GDT and StiCa

Highlights

Model Zoo

Installation

Step 1

Step 2

Step 3

Step 4

Step 5

Data Preperation

Usage

GDT pretraining

STiCA pretraining

Benchmarking

License

Contributing

Comments

Can't reproduce ucf-101 finetune resuts

Logs

AttributeError: 'AVideoDataset' object has no attribute '_split_idx'

No valid videos are found

Adding Code of Conduct file

Adding Contributing file

VERY SLOW training on audio-video dataset like kinetics400 and UCF101

Pretrained STiCA model

Owner

Facebook Research

Learning Representational Invariances for Data-Efficient Action Recognition

Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

A pytorch-based deep learning framework for multi-modal 2D/3D medical image segmentation

Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

nnDetection is a self-configuring framework for 3D (volumetric) medical object detection which can be applied to new data sets without manual intervention. It includes guides for 12 data sets that were used to develop and evaluate the performance of the proposed method.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Machine learning framework for both deep learning and traditional algorithms

Deep Learning applied to Integral data analysis

Framework that uses artificial intelligence applied to mathematical models to make predictions

[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift