We present a framework for training multi-modal deep learning models on unlabelled video data by forcing the network to learn invariances to transformations applied to both the audio and video streams.

Related tags

Deep Learning GDT
Overview

Multi-Modal Self-Supervision using GDT and StiCa

This is an official pytorch implementation of papers: Multi-modal Self-Supervision from Generalized Data Transformations and Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning. In this repository, we provide PyTorch code for pretraining and testing our proposed GDT and StiCa models.

If you find GDT and STiCA useful in your research, please use the following BibTeX entries for citation.

@misc{patrick2020multimodal,
      title={Multi-modal Self-Supervision from Generalized Data Transformations}, 
      author={Mandela Patrick and Yuki M. Asano and Polina Kuznetsova and Ruth Fong and João F. Henriques and Geoffrey Zweig and Andrea Vedaldi},
      year={2021},
      booktitle={International Conference on Computer Vision (ICCV)},
}

@misc{m2021spacetime,
    title={Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning},
    author={Mandela Patrick and Yuki M. Asano and Bernie Huang and Ishan Misra and Florian Metze and Joao Henriques and Andrea Vedaldi},
    year={2021},
    booktitle={International Conference on Computer Vision (ICCV)},
}

Highlights

(1) GDT: Formulate and generalize most pretext tasks in a NCE objective.

Using this formulation, we test various pretext tasks previously unexplored and achieve SOTA downstream performance.

(2) STiCA: Importance of incorporating within-modal invariance in cross-modal learning

We show how to efficiently incorporate within-modal invariance learning using feature crops and achieve SOTA downstream performance.

Model Zoo

We provide GDT models pretrained on Kinetics-400 (K400), HowTo100M (HT100M), and Instagram-65M (IG65M) datasets, and StiCa models pretrained on Kinetics-400 (K400).

name dataset # of frames spatial crop HMDB51 Top1 UCF101 Top1 url
GDT K400 30 112 62.3 90.9 model
GDT HT100M 30 112 94.1 67.4 model
GDT IG65M 30 112 72.8 95.2 model
name dataset # of frames spatial crop HMDB51 Top1 UCF101 Top1 url
STiCA K400 60 112 67.0 93.1 Coming Soon

Installation

This repo was tested with Ubuntu 16.04.5 LTS, Python 3.7.5, PyTorch 1.3.1, Torchvision 0.4.1, and CUDA 10.0.

Step 1

  • Clone this repo to your local machine

Step 2

  • Install required packages using conda env create -f environment.yml

Step 3

  • Activate conda environment using conda activate GDT

Step 4

  • Install kornia library pip install kornia==0.1.4

Step 5

  • See below for how to pretrain GDT / StiCa or benchmark pretrained models

Data Preperation

For Kinetics-400/600, HMDB-51 and UCF-101 datasets:

  1. Ensure all datasets are in the format:
  2. $ROOT_DIR/$SPLIT/$CLASS/*
    

To prepare How-To-100M dataset, do the following:

  1. Download the word2vec matrix and dictionary, unzip the file, and place in datasets/data folder.
  2. wget https://www.rocq.inria.fr/cluster-willow/amiech/word2vec.zip
    unzip word2vec.zip
    mv word2vec.pth datasets/data/word2vec.pth 
    
  3. Download the csv files of captions.
  4. wget https://www.rocq.inria.fr/cluster-willow/amiech/howto100m/howto100m_captions.zip
    unzip howto100m_captions.zip
    
  5. Download the preprocessed HowTo100M videos (12TB in total) by filling this Google form: https://forms.gle/hztrfnFQUJWBtiki8.

Usage

GDT pretraining

To pretrain audio-visual GDT on K-400

Multi-node distributed training with SLURM cluster:

sbatch pretraining_scripts/pretrain_gdt_k400.sh ${HYPOTHESIS_DESC} ${HYPOTHESIS} 

Single-node distributed training:

python -m torch.distributed.launch --master_port=$RANDOM --nproc_per_node=2 --use_env main_gdt.py --batch_size $BS --lr $LR --hypothesis {1,2,3,4,5,6,7,8,9}

To pretrain video-text GDT on HT100M

Multi-node training with SLURM cluster:

sbatch pretraining_scripts/pretrain_gdt_ht100m.sh ${HYPOTHESIS_DESC} ${HYPOTHESIS} 

Single-node distributed training:

python -m torch.distributed.launch --master_port=$RANDOM --nproc_per_node=2 --use_env main_gdt.py --batch_size $BS --lr $LR --hypothesis {1,2,3,4,5,6,7,8,9} --dataset ht100m --decode_audio False --model vid_text_gdt --sample_rate 2

$HYPOTHESIS refers to the hypotheses explored in GDT. We experiment with the following:

1 - cross-modal baseline (cross_modal_baseline)
2 - variant to time reversal (v_reversal)
3 - invariant to time reversal (i_reversal)
4 - variant to time shift (v_shift)
5 - invariant to time shift (i_shift)
6 - variant to time reversal and variant to time shift (v_reversal_v_shift)
7 - invariant to time reversal, variant to time shift (i_reversal_v_shift)
8 - variant to time reversal, and invariant to time shift (v_reversal_i_shift)
9 - invariant to time reversal, invariant to time shift (i_reversal_i_shift)

Please modify the following in SLURM script:

  • SBATCH directives (e.g. partition, nodes, constraint,)
  • SAV_FOLDER
  • --root_dir (path of K-400 / HT100M train directory)

All experiments were run with 8 nodes (64 GPUs, volta32). Please scale batch-size and learning-rate appropriately.

STiCA pretraining

To pretrain audio-visual STiCA on K-400

Multi-node training with SLURM cluster:

sbatch scripts/pretrain_stica.sh $NUM_FRAMES $AUD_NUM_SEC $NUM_LARGE_CROPS $NUM_SMALL_CROPS $NUM_SMALL_TCROPS $NUM_LARGE_TCROPS $NUM_LAYER

Single-node distributed training:

python -m torch.distributed.launch --master_port=$RANDOM --nproc_per_node=2 --use_env main_stica.py --batch_size $BS --base_lr $LR

Hyper-parameters:

NUM_FRAMES - number of frames (e.g. 30)
AUD_NUM_SEC - number of seconds (30f: 1sec, 60f: 2s)
NUM_LARGE_CROPS - num of large feature spatial crops (e.g. 2)
NUM_SMALL_CROPS - num of small feature spatial crops (e.g. 4)
NUM_SMALL_TCROPS - num of large feature spatial crops (e.g. 1)
NUM_LARGE_TCROPS - num of small feature spatial crops (e.g. 2)
NUM_LAYER - num of transformer pooling layers (0 == GAP, >1 is num. of transformer layers)
e.g. sbatch scripts/pretrain_stica.sh 30 1 2 4 1 2 0

Please modify the following in SLURM script:

  • SBATCH directives (e.g. partition, nodes, constraint,)
  • SAV_FOLDER
  • --root_dir (path of K-400 / HT100M train directory)

All experiments were run with 8 nodes (64 GPUs, volta32). Please scale batch-size and learning-rate appropriately.

Benchmarking

To evaluate pretraining on video action recognition on UCF-101 and HMDB-51 datasets,

Locally:

python3 eval_video.py --dataset {ucf101, hmdb51} --fold {1,2,3} --weights-path {WEIGHTS_PATH} --model ${vid_text_gdt, stica, av_gdt}

On SLURM:

bash scripts/eval.sh ${WEIGHTS_PATH} ${OUTPUT_DIR} ${CKPT_NUM} ${CLIP_LEN} ${vid_text_gdt, stica, av_gdt} ${1, 2, 3}

Modify --root_dir, --ucf101-annotation-path, and --hmdb51-annotation-path in eval_video.py.

License

The majority of this work is licensed under CC-NC 4.0 International license.

Contributing

We actively welcome your pull requests. Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

Comments
  • Can't reproduce ucf-101 finetune resuts

    Can't reproduce ucf-101 finetune resuts

    Hi, Thanks for releasing the code. I tried to reproduce the ucf101 fine-tuning results on your GDT_kinetics pretrained model. However, the results seem quite far away. I didn't change any hyperparameters while finetuning. Here is my command CUDA_VISIBLE_DEVICES=0,1,2,3 python3 eval_video.py --dataset ucf101 --fold 1 --weights-path ./pretrained/gdt_K400.pth --model av_gdt --root_dir /local-ssd/fmthoker/ucf101/video/ --ucf101-annotation-path /localssd/fmthoker/ucf101/ucfTrainTestlist/

    Logs

    Evaluating on folds: [1] INFO - 08/24/21 15:46:23 - 0:00:00 - ============ Initialized logger ============ INFO - 08/24/21 15:46:23 - 0:00:00 - agg_model: False aud_base_arch: resnet9 aud_sample_rate: 24000 aud_spec_type: 2 audio_augtype: none base_lr: 0.00025 batch_size: 32 ckpt_epoch: 0 clip_len: 32 colorjitter: True cross_modal_alpha: 0.5 cross_modal_nce: True dataset: ucf101 dp: 0.0 dump_checkpoints: ./checkpoints dump_path: . epochs: 12 feature_extract: False fm_crop: False fold: 1 head_lr: 0.0025 headcount: 1 hmdb51_annotation_path: /datasets01/hmdb51/112018/splits/ lr_gamma: 0.05 lr_milestones: 6,10 lr_warmup_epochs: 2 mlptype: 0 model: av_gdt momentum: 0.9 multi_crop: False num_data_samples: None num_frames: 32 num_head: 4 num_large_crops: 1 num_layer: 2 num_sec: 2 num_sec_aud: 1 num_small_crops: 0 num_spatial_crops: 3 optim_name: sgd output_dir: . positional_emb: False pretrained: False print_freq: 10 qkv_mha: False rank: 0 resume: root_dir: /local-ssd/fmthoker/ucf101/video/ sample_rate: 1 start_epoch: 0 steps_bet_clips: 1 supervised: False target_fps: 30 test_crop_size: 128 test_only: False test_time_cj: False train_clips_per_video: 10 train_crop_size: 128 transformer_time_dim: 8 tsf_lr: 0.00025 ucf101_annotation_path: /local-ssd/fmthoker/ucf101/ucfTrainTestlist/ use_audio_temp_jittering: False use_bn: False use_dropout: False use_gaussian: False use_grayscale: False use_l2_norm: False use_larger_last: False use_mlp: False use_random_resize_crop: True use_scheduler: True use_volume_jittering: True val_clips_per_video: 10 vid_base_arch: r2plus1d_18 wd_base: 0.005 wd_tsf: 0.005 weight_decay: 0.005 weights_path: ./pretrained/gdt_K400.pth workers: 16 z_normalize: False

    INFO - 08/24/21 15:46:23 - 0:00:00 - Loading model Using Audio-Visual GDT Using GDT model {'block': <class 'src.vmz.BasicBlock'>, 'conv_makers': [<class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>, <class 'src.vmz.Conv2Plus1D'>], 'layers': [2, 2, 2, 2], 'stem': <class 'src.vmz.R2Plus1dStem'>, 'larger_last': False} Randomy initializing models resnet9, duration: 1 Using Linear Layer INFO - 08/24/21 15:46:23 - 0:00:01 - Loading model weights INFO - 08/24/21 15:46:27 - 0:00:04 - Epoch checkpoint: 101 didnt load mlp_v.block_forward.2.weight didnt load mlp_v.block_forward.4.weight didnt load mlp_v.block_forward.4.bias didnt load mlp_v.block_forward.4.running_mean didnt load mlp_v.block_forward.4.running_var didnt load mlp_v.block_forward.4.num_batches_tracked didnt load mlp_v.block_forward.8.weight didnt load mlp_v.block_forward.8.bias didnt load mlp_a.block_forward.2.weight didnt load mlp_a.block_forward.4.weight didnt load mlp_a.block_forward.4.bias didnt load mlp_a.block_forward.4.running_mean didnt load mlp_a.block_forward.4.running_var didnt load mlp_a.block_forward.4.num_batches_tracked didnt load mlp_a.block_forward.8.weight didnt load mlp_a.block_forward.8.bias INFO - 08/24/21 15:46:27 - 0:00:04 - Loading model done Using non-agg GDT model Classifier to 101 classes; INFO - 08/24/21 15:46:27 - 0:00:04 - Getting params for finetuning INFO - 08/24/21 15:46:27 - 0:00:04 - ('weight', torch.Size([101, 512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('bias', torch.Size([101])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.0.weight', torch.Size([45, 3, 1, 7, 7])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.1.weight', torch.Size([45])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.1.bias', torch.Size([45])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.3.weight', torch.Size([64, 45, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.4.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('stem.4.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv1.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.0.conv2.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv1.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.0.weight', torch.Size([144, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.1.weight', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.1.bias', torch.Size([144])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.0.3.weight', torch.Size([64, 144, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.1.weight', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer1.1.conv2.1.bias', torch.Size([64])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.0.weight', torch.Size([230, 64, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.1.weight', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.1.bias', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.0.3.weight', torch.Size([128, 230, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv1.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.0.weight', torch.Size([230, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.1.weight', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.1.bias', torch.Size([230])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.0.3.weight', torch.Size([128, 230, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.conv2.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.0.weight', torch.Size([128, 64, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.0.downsample.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.0.weight', torch.Size([288, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.1.weight', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.1.bias', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.0.3.weight', torch.Size([128, 288, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv1.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.0.weight', torch.Size([288, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.1.weight', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.1.bias', torch.Size([288])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.0.3.weight', torch.Size([128, 288, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.1.weight', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer2.1.conv2.1.bias', torch.Size([128])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.0.weight', torch.Size([460, 128, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.1.weight', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.1.bias', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.0.3.weight', torch.Size([256, 460, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv1.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.0.weight', torch.Size([460, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.1.weight', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.1.bias', torch.Size([460])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.0.3.weight', torch.Size([256, 460, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.conv2.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.0.weight', torch.Size([256, 128, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.0.downsample.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.0.weight', torch.Size([576, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.1.weight', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.1.bias', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.0.3.weight', torch.Size([256, 576, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv1.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.0.weight', torch.Size([576, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.1.weight', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.1.bias', torch.Size([576])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.0.3.weight', torch.Size([256, 576, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.1.weight', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer3.1.conv2.1.bias', torch.Size([256])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.0.weight', torch.Size([921, 256, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.1.weight', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.1.bias', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.0.3.weight', torch.Size([512, 921, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv1.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.0.weight', torch.Size([921, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.1.weight', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.1.bias', torch.Size([921])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.0.3.weight', torch.Size([512, 921, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.conv2.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.0.weight', torch.Size([512, 256, 1, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.0.downsample.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.0.weight', torch.Size([1152, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.1.weight', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.1.bias', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.0.3.weight', torch.Size([512, 1152, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv1.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.0.weight', torch.Size([1152, 512, 1, 3, 3])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.1.weight', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.1.bias', torch.Size([1152])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.0.3.weight', torch.Size([512, 1152, 3, 1, 1])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.1.weight', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ('layer4.1.conv2.1.bias', torch.Size([512])) INFO - 08/24/21 15:46:27 - 0:00:04 - ===========Check Grad============ INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.4.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.stem.4.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer1.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer2.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer3.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.0.downsample.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv1.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.0.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.0.3.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.1.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('base.layer4.1.conv2.1.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('classifier.weight', True) INFO - 08/24/21 15:46:27 - 0:00:04 - ('classifier.bias', True) INFO - 08/24/21 15:46:27 - 0:00:04 - =================================

    INFO - 08/24/21 15:46:27 - 0:00:04 - Creating AV Datasets Constructing ucf101 train... /local-ssd/fmthoker/ucf101/video/ datasets/data/ucf101_train.txt ['/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeM akeup/v_ApplyEyeMakeup_g01_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi', '/local -ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c06.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ ApplyEyeMakeup_g02_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c04.avi'] Constructing ucf101 dataloader (size: 13320) from datasets/data/ucf101_train.txt /local-ssd/fmthoker/ucf101/ucfTrainTestlist/trainlist01.txt Total number of videos: 13320, Valid videos: 9537 Constructing ucf101 test... /local-ssd/fmthoker/ucf101/video/ datasets/data/ucf101_test.txt ['/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeM akeup/v_ApplyEyeMakeup_g01_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c04.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c05.avi', '/local -ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g01_c06.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c01.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ ApplyEyeMakeup_g02_c02.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c03.avi', '/local-ssd/fmthoker/ucf101/video/ApplyEyeMakeup/v_ApplyEyeMakeup_g02_c04.avi'] Constructing ucf101 dataloader (size: 399600) from datasets/data/ucf101_test.txt /local-ssd/fmthoker/ucf101/ucfTrainTestlist/testlist01.txt Total number of videos: 399600, Valid videos: 113490 INFO - 08/24/21 15:46:28 - 0:00:06 - Creating data loaders INFO - 08/24/21 15:46:28 - 0:00:06 - Using SGD with lr: 0.0025, wd: 0.005 INFO - 08/24/21 15:46:28 - 0:00:06 - Num. of Epochs: 12, Milestones: [4, 8] INFO - 08/24/21 15:46:28 - 0:00:06 - Using scheduler with 2 warmup epochs INFO - 08/24/21 15:46:28 - 0:00:06 - Start training epoch: 0 INFO - 08/24/21 15:47:20 - 0:00:57 - Epoch[0] - Iter: [0/298] Time 51.689 (51.689) Data 1622540465.039 (1622540465.039) Loss 4.6286 (4.6286) Prec 3.125 (3.125) LR 0.0025 INFO - 08/24/21 15:49:26 - 0:03:03 - Epoch[0] - Iter: [50/298] Time 2.506 (3.487) Data 1622540435.687 (1622540436.263) Loss 4.5635 (4.6110) Prec 0.000 (1.164) LR 0.0025 INFO - 08/24/21 15:51:31 - 0:05:08 - Epoch[0] - Iter: [100/298] Time 2.493 (2.998) Data 1622540435.687 (1622540435.978) Loss 4.6392 (4.6082) Prec 0.000 (1.145) LR 0.0025 INFO - 08/24/21 15:53:36 - 0:07:13 - Epoch[0] - Iter: [150/298] Time 2.481 (2.832) Data 1622540435.687 (1622540435.882) Loss 4.5959 (4.6050) Prec 0.000 (1.407) LR 0.0025 INFO - 08/24/21 15:55:41 - 0:09:18 - Epoch[0] - Iter: [200/298] Time 2.482 (2.748) Data 1622540435.687 (1622540435.833) Loss 4.5906 (4.6011) Prec 0.000 (1.632) LR 0.0025 INFO - 08/24/21 15:57:45 - 0:11:23 - Epoch[0] - Iter: [250/298] Time 2.542 (2.698) Data 1622540435.687 (1622540435.804) Loss 4.6088 (4.5984) Prec 3.125 (1.755) LR 0.0025 INFO - 08/24/21 15:59:43 - 0:13:20 - Start evaluating epoch: 0 INFO - 08/24/21 15:59:43 - 0:13:20 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 15:59:45 - 0:13:22 - Start training epoch: 1 INFO - 08/24/21 16:00:10 - 0:13:47 - Epoch[1] - Iter: [0/298] Time 25.480 (25.480) Data 1622540458.407 (1622540458.407) Loss 4.5279 (4.5279) Prec 9.375 (9.375) LR 0.01125 INFO - 08/24/21 16:02:16 - 0:15:53 - Epoch[1] - Iter: [50/298] Time 2.515 (2.965) Data 1622540435.688 (1622540436.133) Loss 4.5346 (4.5540) Prec 12.500 (3.799) LR 0.01125 INFO - 08/24/21 16:04:21 - 0:17:58 - Epoch[1] - Iter: [100/298] Time 2.510 (2.735) Data 1622540435.688 (1622540435.912) Loss 4.5357 (4.5358) Prec 6.250 (5.972) LR 0.01125 INFO - 08/24/21 16:06:26 - 0:20:03 - Epoch[1] - Iter: [150/298] Time 2.532 (2.659) Data 1622540435.687 (1622540435.838) Loss 4.4710 (4.5186) Prec 15.625 (7.554) LR 0.01125 INFO - 08/24/21 16:08:31 - 0:22:09 - Epoch[1] - Iter: [200/298] Time 2.522 (2.621) Data 1622540435.687 (1622540435.801) Loss 4.3993 (4.5033) Prec 31.250 (8.893) LR 0.01125 INFO - 08/24/21 16:10:37 - 0:24:14 - Epoch[1] - Iter: [250/298] Time 2.525 (2.597) Data 1622540435.688 (1622540435.778) Loss 4.4011 (4.4881) Prec 6.250 (9.587) LR 0.01125 INFO - 08/24/21 16:12:34 - 0:26:11 - Start evaluating epoch: 1 INFO - 08/24/21 16:12:34 - 0:26:11 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:12:36 - 0:26:13 - Start training epoch: 2 INFO - 08/24/21 16:13:06 - 0:26:43 - Epoch[2] - Iter: [0/298] Time 30.481 (30.481) Data 1622540463.503 (1622540463.503) Loss 4.3571 (4.3571) Prec 9.375 (9.375) LR 0.02 INFO - 08/24/21 16:15:11 - 0:28:49 - Epoch[2] - Iter: [50/298] Time 2.522 (3.056) Data 1622540435.687 (1622540436.233) Loss 4.3466 (4.3461) Prec 12.500 (16.238) LR 0.02 INFO - 08/24/21 16:17:17 - 0:30:54 - Epoch[2] - Iter: [100/298] Time 2.501 (2.784) Data 1622540435.687 (1622540435.963) Loss 4.2123 (4.3223) Prec 25.000 (16.677) LR 0.02 INFO - 08/24/21 16:19:22 - 0:33:00 - Epoch[2] - Iter: [150/298] Time 2.521 (2.693) Data 1622540435.688 (1622540435.872) Loss 4.1464 (4.2927) Prec 21.875 (17.632) LR 0.02 INFO - 08/24/21 16:21:28 - 0:35:05 - Epoch[2] - Iter: [200/298] Time 2.502 (2.647) Data 1622540435.687 (1622540435.826) Loss 4.1699 (4.2582) Prec 21.875 (18.968) LR 0.02 INFO - 08/24/21 16:23:33 - 0:37:10 - Epoch[2] - Iter: [250/298] Time 2.498 (2.618) Data 1622540435.688 (1622540435.798) Loss 3.9341 (4.2171) Prec 31.250 (20.319) LR 0.02 INFO - 08/24/21 16:25:30 - 0:39:07 - Start evaluating epoch: 2 INFO - 08/24/21 16:25:30 - 0:39:07 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:25:31 - 0:39:09 - Start training epoch: 3 INFO - 08/24/21 16:26:01 - 0:39:39 - Epoch[3] - Iter: [0/298] Time 29.839 (29.839) Data 1622540462.845 (1622540462.845) Loss 3.8673 (3.8673) Prec 34.375 (34.375) LR 0.02 INFO - 08/24/21 16:28:07 - 0:41:44 - Epoch[3] - Iter: [50/298] Time 2.520 (3.049) Data 1622540435.688 (1622540436.220) Loss 4.0330 (3.9060) Prec 21.875 (26.042) LR 0.02 INFO - 08/24/21 16:30:12 - 0:43:49 - Epoch[3] - Iter: [100/298] Time 2.488 (2.780) Data 1622540435.687 (1622540435.956) Loss 3.8576 (3.8755) Prec 15.625 (26.300) LR 0.02 INFO - 08/24/21 16:32:17 - 0:45:55 - Epoch[3] - Iter: [150/298] Time 2.495 (2.689) Data 1622540435.687 (1622540435.867) Loss 3.6733 (3.8285) Prec 37.500 (27.918) LR 0.02 INFO - 08/24/21 16:34:23 - 0:48:00 - Epoch[3] - Iter: [200/298] Time 2.535 (2.643) Data 1622540435.688 (1622540435.823) Loss 3.4600 (3.7857) Prec 34.375 (28.296) LR 0.02 INFO - 08/24/21 16:36:28 - 0:50:06 - Epoch[3] - Iter: [250/298] Time 2.496 (2.617) Data 1622540435.687 (1622540435.796) Loss 3.2958 (3.7393) Prec 34.375 (29.109) LR 0.02 INFO - 08/24/21 16:38:26 - 0:52:03 - Start evaluating epoch: 3 INFO - 08/24/21 16:38:26 - 0:52:03 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:38:27 - 0:52:05 - Start training epoch: 4 INFO - 08/24/21 16:38:43 - 0:52:20 - Epoch[4] - Iter: [0/298] Time 15.868 (15.868) Data 1622540448.829 (1622540448.829) Loss 3.5739 (3.5739) Prec 31.250 (31.250) LR 0.02 INFO - 08/24/21 16:40:55 - 0:54:32 - Epoch[4] - Iter: [50/298] Time 2.526 (2.889) Data 1622540435.688 (1622540436.030) Loss 3.4294 (3.3843) Prec 28.125 (32.353) LR 0.02 INFO - 08/24/21 16:43:00 - 0:56:37 - Epoch[4] - Iter: [100/298] Time 2.492 (2.700) Data 1622540435.687 (1622540435.860) Loss 3.1331 (3.3321) Prec 43.750 (33.694) LR 0.02 INFO - 08/24/21 16:45:05 - 0:58:43 - Epoch[4] - Iter: [150/298] Time 2.508 (2.636) Data 1622540435.688 (1622540435.803) Loss 3.5533 (3.2995) Prec 31.250 (34.085) LR 0.02 INFO - 08/24/21 16:47:10 - 1:00:48 - Epoch[4] - Iter: [200/298] Time 2.507 (2.602) Data 1622540435.687 (1622540435.774) Loss 3.0330 (3.2618) Prec 28.125 (34.686) LR 0.02 INFO - 08/24/21 16:49:16 - 1:02:53 - Epoch[4] - Iter: [250/298] Time 2.555 (2.584) Data 1622540435.688 (1622540435.757) Loss 3.2438 (3.2210) Prec 28.125 (35.545) LR 0.02 INFO - 08/24/21 16:51:14 - 1:04:51 - Start evaluating epoch: 4 INFO - 08/24/21 16:51:14 - 1:04:51 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 16:51:15 - 1:04:52 - Start training epoch: 5 INFO - 08/24/21 16:51:32 - 1:05:09 - Epoch[5] - Iter: [0/298] Time 16.828 (16.828) Data 1622540449.801 (1622540449.801) Loss 2.7837 (2.7837) Prec 46.875 (46.875) LR 0.02 INFO - 08/24/21 16:53:42 - 1:07:19 - Epoch[5] - Iter: [50/298] Time 2.554 (2.875) Data 1622540435.687 (1622540436.010) Loss 2.7985 (2.9303) Prec 46.875 (40.686) LR 0.02 INFO - 08/24/21 16:55:47 - 1:09:24 - Epoch[5] - Iter: [100/298] Time 2.498 (2.695) Data 1622540435.688 (1622540435.851) Loss 2.9752 (2.8821) Prec 31.250 (41.770) LR 0.02 INFO - 08/24/21 16:57:53 - 1:11:30 - Epoch[5] - Iter: [150/298] Time 2.511 (2.633) Data 1622540435.687 (1622540435.797) Loss 2.9061 (2.8535) Prec 34.375 (41.846) LR 0.02 INFO - 08/24/21 16:59:58 - 1:13:35 - Epoch[5] - Iter: [200/298] Time 2.488 (2.602) Data 1622540435.687 (1622540435.769) Loss 2.8021 (2.8290) Prec 40.625 (41.962) LR 0.02 INFO - 08/24/21 17:02:03 - 1:15:40 - Epoch[5] - Iter: [250/298] Time 2.508 (2.582) Data 1622540435.687 (1622540435.753) Loss 2.8373 (2.8024) Prec 43.750 (42.418) LR 0.02 INFO - 08/24/21 17:04:01 - 1:17:38 - Start evaluating epoch: 5 INFO - 08/24/21 17:04:01 - 1:17:38 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 17:04:02 - 1:17:39 - Start training epoch: 6 INFO - 08/24/21 17:04:31 - 1:18:08 - Epoch[6] - Iter: [0/298] Time 29.058 (29.058) Data 1622540462.031 (1622540462.031) Loss 2.4810 (2.4810) Prec 50.000 (50.000) LR 0.02 INFO - 08/24/21 17:06:37 - 1:20:14 - Epoch[6] - Iter: [50/298] Time 2.504 (3.039) Data 1622540435.687 (1622540436.204) Loss 2.4081 (2.5313) Prec 59.375 (49.142) LR 0.02 INFO - 08/24/21 17:08:43 - 1:22:20 - Epoch[6] - Iter: [100/298] Time 2.572 (2.777) Data 1622540435.687 (1622540435.948) Loss 2.4269 (2.5146) Prec 50.000 (49.196) LR 0.02 INFO - 08/24/21 17:10:48 - 1:24:26 - Epoch[6] - Iter: [150/298] Time 2.486 (2.690) Data 1622540435.687 (1622540435.862) Loss 2.1894 (2.4784) Prec 56.250 (49.400) LR 0.02 INFO - 08/24/21 17:12:54 - 1:26:32 - Epoch[6] - Iter: [200/298] Time 2.515 (2.647) Data 1622540435.687 (1622540435.819) Loss 2.4088 (2.4631) Prec 40.625 (49.114) LR 0.02 INFO - 08/24/21 17:15:00 - 1:28:37 - Epoch[6] - Iter: [250/298] Time 2.673 (2.620) Data 1622540435.688 (1622540435.792) Loss 2.2423 (2.4371) Prec 46.875 (49.589) LR 0.02 INFO - 08/24/21 17:16:57 - 1:30:35 - Start evaluating epoch: 6 INFO - 08/24/21 17:16:57 - 1:30:35 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 17:16:59 - 1:30:36 - Start training epoch: 7 INFO - 08/24/21 17:17:21 - 1:30:58 - Epoch[7] - Iter: [0/298] Time 22.183 (22.183) Data 1622540455.122 (1622540455.122) Loss 2.2278 (2.2278) Prec 50.000 (50.000) LR 0.001 INFO - 08/24/21 17:19:28 - 1:33:05 - Epoch[7] - Iter: [50/298] Time 2.526 (2.930) Data 1622540435.687 (1622540436.068) Loss 1.8457 (2.2670) Prec 75.000 (52.696) LR 0.001 INFO - 08/24/21 17:21:34 - 1:35:11 - Epoch[7] - Iter: [100/298] Time 2.531 (2.723) Data 1622540435.688 (1622540435.880) Loss 2.3264 (2.2734) Prec 56.250 (52.259) LR 0.001 INFO - 08/24/21 17:23:39 - 1:37:16 - Epoch[7] - Iter: [150/298] Time 2.518 (2.652) Data 1622540435.688 (1622540435.816) Loss 2.4978 (2.2473) Prec 50.000 (52.918) LR 0.001 INFO - 08/24/21 17:25:44 - 1:39:21 - Epoch[7] - Iter: [200/298] Time 2.524 (2.614) Data 1622540435.687 (1622540435.784) Loss 2.1839 (2.2458) Prec 37.500 (53.420) LR 0.001 INFO - 08/24/21 17:27:49 - 1:41:27 - Epoch[7] - Iter: [250/298] Time 2.501 (2.593) Data 1622540435.687 (1622540435.765) Loss 2.2702 (2.2481) Prec 43.750 (53.536) LR 0.001 INFO - 08/24/21 17:29:47 - 1:43:24 - Start evaluating epoch: 7 INFO - 08/24/21 18:25:12 - 2:38:49 - Test: Time 0.937 Loss 1.6872 ClipAcc@1 58.206 VidAcc@1 63.283 INFO - 08/24/21 18:25:12 - 2:38:49 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 18:25:13 - 2:38:50 - Start training epoch: 8 INFO - 08/24/21 18:25:34 - 2:39:11 - Epoch[8] - Iter: [0/298] Time 20.517 (20.517) Data 1622540451.545 (1622540451.545) Loss 2.4127 (2.4127) Prec 40.625 (40.625) LR 0.001 INFO - 08/24/21 18:27:38 - 2:41:16 - Epoch[8] - Iter: [50/298] Time 2.465 (2.850) Data 1622540435.687 (1622540435.998) Loss 2.1980 (2.2194) Prec 56.250 (57.782) LR 0.001 INFO - 08/24/21 18:29:42 - 2:43:20 - Epoch[8] - Iter: [100/298] Time 2.468 (2.665) Data 1622540435.687 (1622540435.844) Loss 2.4523 (2.2355) Prec 65.625 (56.528) LR 0.001 INFO - 08/24/21 18:31:47 - 2:45:24 - Epoch[8] - Iter: [150/298] Time 2.509 (2.606) Data 1622540435.688 (1622540435.792) Loss 1.9243 (2.2228) Prec 68.750 (55.877) LR 0.001 INFO - 08/24/21 18:33:51 - 2:47:28 - Epoch[8] - Iter: [200/298] Time 2.477 (2.576) Data 1622540435.687 (1622540435.766) Loss 2.2771 (2.2333) Prec 56.250 (55.084) LR 0.001 INFO - 08/24/21 18:35:55 - 2:49:33 - Epoch[8] - Iter: [250/298] Time 2.485 (2.558) Data 1622540435.687 (1622540435.751) Loss 2.2006 (2.2320) Prec 46.875 (54.918) LR 0.001 INFO - 08/24/21 18:37:52 - 2:51:29 - Start evaluating epoch: 8 INFO - 08/24/21 19:31:41 - 3:45:19 - Test: Time 0.910 Loss 1.6609 ClipAcc@1 58.079 VidAcc@1 63.574 INFO - 08/24/21 19:31:42 - 3:45:19 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 19:31:43 - 3:45:20 - Start training epoch: 9 INFO - 08/24/21 19:32:02 - 3:45:39 - Epoch[9] - Iter: [0/298] Time 19.203 (19.203) Data 1622540452.256 (1622540452.256) Loss 2.4376 (2.4376) Prec 34.375 (34.375) LR 0.001 INFO - 08/24/21 19:34:07 - 3:47:44 - Epoch[9] - Iter: [50/298] Time 2.502 (2.829) Data 1622540435.687 (1622540436.012) Loss 2.1187 (2.2550) Prec 56.250 (54.289) LR 0.001 INFO - 08/24/21 19:36:11 - 3:49:49 - Epoch[9] - Iter: [100/298] Time 2.496 (2.660) Data 1622540435.687 (1622540435.851) Loss 2.2576 (2.2123) Prec 59.375 (55.600) LR 0.001 INFO - 08/24/21 19:38:16 - 3:51:53 - Epoch[9] - Iter: [150/298] Time 2.525 (2.605) Data 1622540435.687 (1622540435.797) Loss 2.2289 (2.2205) Prec 56.250 (54.843) LR 0.001 INFO - 08/24/21 19:40:20 - 3:53:58 - Epoch[9] - Iter: [200/298] Time 2.480 (2.575) Data 1622540435.687 (1622540435.770) Loss 2.1648 (2.2221) Prec 46.875 (54.773) LR 0.001 INFO - 08/24/21 19:42:25 - 3:56:02 - Epoch[9] - Iter: [250/298] Time 2.496 (2.558) Data 1622540435.687 (1622540435.753) Loss 2.1566 (2.2231) Prec 59.375 (54.905) LR 0.001 INFO - 08/24/21 19:44:22 - 3:57:59 - Start evaluating epoch: 9 INFO - 08/24/21 20:38:41 - 4:52:18 - Test: Time 0.919 Loss 1.6590 ClipAcc@1 58.412 VidAcc@1 62.781 INFO - 08/24/21 20:38:41 - 4:52:18 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 20:38:42 - 4:52:19 - Start training epoch: 10 INFO - 08/24/21 20:39:00 - 4:52:37 - Epoch[10] - Iter: [0/298] Time 18.026 (18.026) Data 1622540451.023 (1622540451.023) Loss 2.1097 (2.1097) Prec 65.625 (65.625) LR 0.001 INFO - 08/24/21 20:41:08 - 4:54:45 - Epoch[10] - Iter: [50/298] Time 2.502 (2.856) Data 1622540435.687 (1622540436.024) Loss 2.4694 (2.1863) Prec 46.875 (56.924) LR 0.001 INFO - 08/24/21 20:43:12 - 4:56:49 - Epoch[10] - Iter: [100/298] Time 2.467 (2.672) Data 1622540435.687 (1622540435.857) Loss 2.2242 (2.2072) Prec 56.250 (55.600) LR 0.001 INFO - 08/24/21 20:45:16 - 4:58:54 - Epoch[10] - Iter: [150/298] Time 2.476 (2.612) Data 1622540435.687 (1622540435.801) Loss 2.2163 (2.2074) Prec 56.250 (55.277) LR 0.001 INFO - 08/24/21 20:47:21 - 5:00:58 - Epoch[10] - Iter: [200/298] Time 2.467 (2.581) Data 1622540435.687 (1622540435.773) Loss 2.0947 (2.2096) Prec 56.250 (55.317) LR 0.001 INFO - 08/24/21 20:49:25 - 5:03:02 - Epoch[10] - Iter: [250/298] Time 2.480 (2.562) Data 1622540435.687 (1622540435.756) Loss 2.3103 (2.2035) Prec 53.125 (55.640) LR 0.001 INFO - 08/24/21 20:51:22 - 5:04:59 - Start evaluating epoch: 10 INFO - 08/24/21 21:46:14 - 5:59:51 - Test: Time 0.928 Loss 1.6248 ClipAcc@1 59.569 VidAcc@1 65.054 INFO - 08/24/21 21:46:14 - 5:59:51 - Saving checkpoint to: . Saving checkpoint to: . INFO - 08/24/21 21:46:16 - 5:59:53 - Start training epoch: 11 INFO - 08/24/21 21:46:37 - 6:00:14 - Epoch[11] - Iter: [0/298] Time 20.990 (20.990) Data 1622540454.127 (1622540454.127) Loss 2.4158 (2.4158) Prec 46.875 (46.875) LR 5.000000000000001e-05 INFO - 08/24/21 21:48:42 - 6:02:19 - Epoch[11] - Iter: [50/298] Time 2.492 (2.859) Data 1622540435.687 (1622540436.049) Loss 2.0902 (2.2294) Prec 62.500 (53.799) LR 5.000000000000001e-05 INFO - 08/24/21 21:50:45 - 6:04:22 - Epoch[11] - Iter: [100/298] Time 2.476 (2.666) Data 1622540435.687 (1622540435.870) Loss 1.9649 (2.2114) Prec 71.875 (54.920) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:52:49 - 6:06:26 - Epoch[11] - Iter: [150/298] Time 2.461 (2.603) Data 1622540435.687 (1622540435.809) Loss 2.1620 (2.1977) Prec 53.125 (55.671) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:54:53 - 6:08:30 - Epoch[11] - Iter: [200/298] Time 2.481 (2.573) Data 1622540435.687 (1622540435.779) Loss 2.1617 (2.1944) Prec 59.375 (55.955) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:56:57 - 6:10:35 - Epoch[11] - Iter: [250/298] Time 2.478 (2.555) Data 1622540435.688 (1622540435.761) Loss 2.1179 (2.1925) Prec 65.625 (56.387) LR 5.000000000000001e-0 5 INFO - 08/24/21 21:58:54 - 6:12:31 - Start evaluating epoch: 11 INFO - 08/24/21 22:50:11 - 7:03:49 - Test: Time 0.867 Loss 1.6048 ClipAcc@1 59.339 VidAcc@1 64.261 INFO - 08/24/21 22:50:11 - 7:03:49 - Saving checkpoint to: . Saving checkpoint to: . Checkpoint saved INFO - 08/24/21 22:50:13 - 7:03:50 - Training time 7:03:44 INFO - 08/24/21 22:50:13 - 7:03:50 - 3-Fold (ucf101): Vid Acc@1 65.054, Video Acc@5 89.902

    Can you provide some insights? I

    opened by fmthoker 4
  • AttributeError: 'AVideoDataset' object has no attribute '_split_idx'

    AttributeError: 'AVideoDataset' object has no attribute '_split_idx'

    Thank you for sharing such a good work! I am trying to run the evaluation on UCF101, however I encountered the error

    Traceback (most recent call last): File "/home/huwang/huwang/experiments/multimodal/GDT/eval_video.py", line 837, in <module> best_acc1, best_acc5, best_epoch = main(args, writer) File "/home/huwang/huwang/experiments/multimodal/GDT/eval_video.py", line 321, in main args=args, File "/media/data/huwang/experiments/multimodal/GDT/datasets/AVideoDataset.py", line 251, in __init__ self._construct_loader() File "/media/data/huwang/experiments/multimodal/GDT/datasets/AVideoDataset.py", line 288, in _construct_loader self.ds_name, self._split_idx, path_to_file AttributeError: 'AVideoDataset' object has no attribute '_split_idx'

    Could you shed some light on how to solve this issue? Thank you!

    opened by billhhh 2
  • No valid videos are found

    No valid videos are found

    Whenever I try to pretrain using the code , the valid videos are shown as 0 . Well I have tried working with the code available in the supplementary material for the paper "Multi-modal Self-Supervision from Generalized Data Transformations" , though there were few errors , but valid videos were not zero. Is there in any difference in that code , for checking valid video?

    opened by asharani97 0
  • Adding Code of Conduct file

    Adding Code of Conduct file

    This is pull request was created automatically because we noticed your project was missing a Code of Conduct file.

    Code of Conduct files facilitate respectful and constructive communities by establishing expected behaviors for project contributors.

    This PR was crafted with love by Facebook's Open Source Team.

    CLA Signed 
    opened by facebook-github-bot 0
  • Adding Contributing file

    Adding Contributing file

    This is pull request was created automatically because we noticed your project was missing a Contributing file.

    CONTRIBUTING files explain how a developer can contribute to the project - which you should actively encourage.

    This PR was crafted with love by Facebook's Open Source Team.

    CLA Signed 
    opened by facebook-github-bot 0
  • VERY SLOW training on audio-video dataset like kinetics400 and UCF101

    VERY SLOW training on audio-video dataset like kinetics400 and UCF101

    Hi authors! Thank you for making the paper and code open source. It is very helpful. I am trying to pretrain the GDT model on kinetics400 dataset, while I spent more than 1 day on each epoch. I run on the 8 3090 GPU server and set the batch size on each GPU to 16, and the total batch size is 128, which is a quarter of the original setting in the paper. According to the paper, the authors spent 3 days on pretraining with 512 batch size, under normal circumstances it should not cost more than 3 hours on each epoch. I change the video decode method from pyav to decord, which brings a bit of improvement in training speed. I wonder if the speed of the provided code is tested before release? What should I do to find the cues for speeding up training?

    Some logs below:

    Epoch: [0]  [  360/14961]  eta: 13:42:52  lr: 0.01  clips/s: 16.263  loss: 2.7961 (2.8411)  batch_t/s: 1.0088 (1.4428)  time: 2.8681  data: 1.3705  max mem: 20040
    Epoch: [0]  [  370/14961]  eta: 13:46:51  lr: 0.01  clips/s: 13.694  loss: 2.7992 (2.8464)  batch_t/s: 1.0067 (1.0740)  time: 4.3781  data: 3.3474  max mem: 20040
    Epoch: [0]  [  370/14961]  eta: 13:46:48  lr: 0.01  clips/s: 13.769  loss: 2.7919 (2.8454)  batch_t/s: 1.0110 (1.7200)  time: 4.3779  data: 1.3611  max mem: 20040
    Epoch: [0]  [  370/14961]  eta: 13:46:48  lr: 0.01  clips/s: 13.532  loss: 2.7913 (2.8402)  batch_t/s: 1.0089 (1.4563)  time: 4.3786  data: 2.4327  max mem: 20040
    Epoch: [0]  [  380/14961]  eta: 13:31:23  lr: 0.01  clips/s: 14.072  loss: 2.7891 (2.8451)  batch_t/s: 1.0196 (1.0736)  time: 2.5644  data: 1.5199  max mem: 20040
    Epoch: [0]  [  380/14961]  eta: 13:31:20  lr: 0.01  clips/s: 14.029  loss: 2.7738 (2.8434)  batch_t/s: 1.0512 (1.7027)  time: 2.5646  data: 0.5402  max mem: 20040
    Epoch: [0]  [  380/14961]  eta: 13:31:19  lr: 0.01  clips/s: 14.026  loss: 2.7874 (2.8387)  batch_t/s: 1.0548 (1.4459)  time: 2.5643  data: 1.0631  max mem: 20040
    Epoch: [0]  [  390/14961]  eta: 13:36:54  lr: 0.01  clips/s: 15.097  loss: 2.7765 (2.8417)  batch_t/s: 1.0534 (1.7432)  time: 2.6929  data: 0.5196  max mem: 20040
    Epoch: [0]  [  390/14961]  eta: 13:36:56  lr: 0.01  clips/s: 14.988  loss: 2.7927 (2.8441)  batch_t/s: 1.0630 (1.0732)  time: 2.6932  data: 1.6344  max mem: 20040
    Epoch: [0]  [  390/14961]  eta: 13:36:53  lr: 0.01  clips/s: 16.121  loss: 2.7775 (2.8376)  batch_t/s: 1.0481 (1.4640)  time: 2.6923  data: 1.0834  max mem: 20040
    Epoch: [0]  [  400/14961]  eta: 13:43:48  lr: 0.01  clips/s: 16.551  loss: 2.7957 (2.8433)  batch_t/s: 1.0546 (1.0725)  time: 4.4575  data: 3.4058  max mem: 20040
    Epoch: [0]  [  400/14961]  eta: 13:43:45  lr: 0.01  clips/s: 1.458  loss: 2.7986 (2.8373)  batch_t/s: 1.0390 (1.4786)  time: 4.4577  data: 2.3538  max mem: 20040
    Epoch: [0]  [  400/14961]  eta: 13:43:46  lr: 0.01  clips/s: 0.679  loss: 2.7963 (2.8410)  batch_t/s: 1.0598 (1.7822)  time: 4.4580  data: 1.1610  max mem: 20040
    Epoch: [0]  [  410/14961]  eta: 13:29:18  lr: 0.01  clips/s: 15.575  loss: 2.7954 (2.8418)  batch_t/s: 1.0273 (1.0715)  time: 2.8114  data: 1.7718  max mem: 20040
    Epoch: [0]  [  410/14961]  eta: 13:29:15  lr: 0.01  clips/s: 15.525  loss: 2.7892 (2.8399)  batch_t/s: 1.0306 (1.7639)  time: 2.8114  data: 0.6421  max mem: 20040
    

    Sincerely yours.

    opened by XinyuSun 3
  •  Pretrained STiCA model

    Pretrained STiCA model

    Hi, Can you share the Fully supervised Kinetics trained R(2+1D)-18 and the Kinetics pretrained STiCA models? I am doing a self-supervised learning survey where I am comparing different self-supervised methods. I would like to include your STiCa method and a comparison with fully supervised learning too. Hoping for a positive response.

    opened by fmthoker 0
Owner
Facebook Research
Facebook Research
Learning Representational Invariances for Data-Efficient Action Recognition

Learning Representational Invariances for Data-Efficient Action Recognition Official PyTorch implementation for Learning Representational Invariances

Virginia Tech Vision and Learning Lab 27 Nov 22, 2022
Composable transformations of Python+NumPy programsComposable transformations of Python+NumPy programs

Chex Chex is a library of utilities for helping to write reliable JAX code. This includes utils to help: Instrument your code (e.g. assertions) Debug

DeepMind 506 Jan 8, 2023
Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

U2Fusion Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal (VIS-IR, medical), multi

Han Xu 129 Dec 11, 2022
Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

BlockGAN Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images BlockGAN: Learning 3D Object-aware Scene Rep

null 41 May 18, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
A pytorch-based deep learning framework for multi-modal 2D/3D medical image segmentation

A 3D multi-modal medical image segmentation library in PyTorch We strongly believe in open and reproducible deep learning research. Our goal is to imp

Adaloglou Nikolas 1.2k Dec 27, 2022
Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Codes-for-Algorithms Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Tracy (Shengmin) Tao 1 Apr 12, 2022
Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Time-stretch audio clips quickly with PyTorch (CUDA supported)! Additional utilities for searching efficient transformations are included.

Kento Nishi 22 Jul 7, 2022
Machine learning framework for both deep learning and traditional algorithms

NeoML is an end-to-end machine learning framework that allows you to build, train, and deploy ML models. This framework is used by ABBYY engineers for

NeoML 704 Dec 27, 2022
Deep Learning applied to Integral data analysis

DeepIntegralCompton Deep Learning applied to Integral data analysis Module installation Move to the root directory of the project and execute : pip in

Thomas Vuillaume 1 Dec 10, 2021
Framework that uses artificial intelligence applied to mathematical models to make predictions

LiconIA Framework that uses artificial intelligence applied to mathematical models to make predictions Interface Overview Table of contents [TOC] 1 Ar

null 4 Jun 20, 2021
[CVPR 2022 Oral] Versatile Multi-Modal Pre-Training for Human-Centric Perception

Versatile Multi-Modal Pre-Training for Human-Centric Perception Fangzhou Hong1  Liang Pan1  Zhongang Cai1,2,3  Ziwei Liu1* 1S-Lab, Nanyang Technologic

Fangzhou Hong 96 Jan 3, 2023
AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition

AdaMML: Adaptive Multi-Modal Learning for Efficient Video Recognition [ArXiv] [Project Page] This repository is the official implementation of AdaMML:

International Business Machines 43 Dec 26, 2022
Self-supervised Multi-modal Hybrid Fusion Network for Brain Tumor Segmentation

JBHI-Pytorch This repository contains a reference implementation of the algorithms described in our paper "Self-supervised Multi-modal Hybrid Fusion N

FeiyiFANG 5 Dec 13, 2021
Image transformations designed for Scene Text Recognition (STR) data augmentation. Published at ICCV 2021 Workshop on Interactive Labeling and Data Augmentation for Vision.

Data Augmentation for Scene Text Recognition (ICCV 2021 Workshop) (Pronounced as "strog") Paper Arxiv Why it matters? Scene Text Recognition (STR) req

Rowel Atienza 152 Dec 28, 2022
Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral)

DSA^2 F: Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion (CVPR'2021, Oral) This repo is the official imp

如今我已剑指天涯 46 Dec 21, 2022
MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift

MemStream Implementation of MemStream: Memory-Based Anomaly Detection in Multi-Aspect Streams with Concept Drift . Siddharth Bhatia, Arjit Jain, Shivi

Stream-AD 61 Dec 2, 2022