Code + pre-trained models for the paper Keeping Your Eye on the Ball Trajectory Attention in Video Transformers

Overview

Motionformer

This is an official pytorch implementation of paper Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers. In this repository, we provide PyTorch code for training and testing our proposed Motionformer model. Motionformer use proposed trajectory attention to achieve state-of-the-art results on several video action recognition benchmarks such as Kinetics-400 and Something-Something V2.

If you find Motionformer useful in your research, please use the following BibTeX entry for citation.

@misc{patrick2021keeping,
      title={Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers}, 
      author={Mandela Patrick and Dylan Campbell and Yuki M. Asano and Ishan Misra Florian Metze and Christoph Feichtenhofer and Andrea Vedaldi and Jo\ão F. Henriques},
      year={2021},
      eprint={2106.05392},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Model Zoo

We provide Motionformer models pretrained on Kinetics-400 (K400), Kinetics-600 (K600), Something-Something-V2 (SSv2), and Epic-Kitchens datasets.

name dataset # of frames spatial crop acc@1 acc@5 url
Joint K400 16 224 79.2 94.2 model
Divided K400 16 224 78.5 93.8 model
Motionformer K400 16 224 79.7 94.2 model
Motionformer-HR K400 16 336 81.1 95.2 model
Motionformer-L K400 32 224 80.2 94.8 model
name dataset # of frames spatial crop acc@1 acc@5 url
Motionformer K600 16 224 81.6 95.6 model
Motionformer-HR K600 16 336 82.7 96.1 model
Motionformer-L K600 32 224 82.2 96.0 model
name dataset # of frames spatial crop acc@1 acc@5 url
Joint SSv2 16 224 64.0 88.4 model
Divided SSv2 16 224 64.2 88.6 model
Motionformer SSv2 16 224 66.5 90.1 model
Motionformer-HR SSv2 16 336 67.1 90.6 model
Motionformer-L SSv2 32 224 68.1 91.2 model
name dataset # of frames spatial crop A acc N acc url
Motionformer EK 16 224 43.1 56.5 model
Motionformer-HR EK 16 336 44.5 58.5 model
Motionformer-L EK 32 224 44.1 57.6 model

Installation

First, create a conda virtual environment and activate it:

conda create -n motionformer python=3.8.5 -y
source activate motionformer

Then, install the following packages:

  • torchvision: pip install torchvision or conda install torchvision -c pytorch
  • fvcore: pip install 'git+https://github.com/facebookresearch/fvcore'
  • simplejson: pip install simplejson
  • einops: pip install einops
  • timm: pip install timm
  • PyAV: conda install av -c conda-forge
  • psutil: pip install psutil
  • scikit-learn: pip install scikit-learn
  • OpenCV: pip install opencv-python
  • tensorboard: pip install tensorboard
  • matplotlib: pip install matplotlib
  • pandas: pip install pandas
  • ffmeg: pip install ffmpeg-python

OR:

simply create conda environment with all packages just from yaml file:

conda env create -f environment.yml

Lastly, build the Motionformer codebase by running:

git clone https://github.com/facebookresearch/Motionformer
cd Motionformer
python setup.py build develop

Usage

Dataset Preparation

Please use the dataset preparation instructions provided in DATASET.md.

Training the Default Motionformer

Training the default Motionformer that uses trajectory attention, and operates on 16-frame clips cropped at 224x224 spatial resolution, can be done using the following command:

python tools/run_net.py \
  --cfg configs/K400/motionformer_224_16x4.yaml \
  DATA.PATH_TO_DATA_DIR path_to_your_dataset \
  NUM_GPUS 8 \
  TRAIN.BATCH_SIZE 8 \

You may need to pass location of your dataset in the command line by adding DATA.PATH_TO_DATA_DIR path_to_your_dataset, or you can simply modify

DATA:
  PATH_TO_DATA_DIR: path_to_your_dataset

To the yaml configs file, then you do not need to pass it to the command line every time.

Using a Different Number of GPUs

If you want to use a smaller number of GPUs, you need to modify .yaml configuration files in configs/. Specifically, you need to modify the NUM_GPUS, TRAIN.BATCH_SIZE, TEST.BATCH_SIZE, DATA_LOADER.NUM_WORKERS entries in each configuration file. The BATCH_SIZE entry should be the same or higher as the NUM_GPUS entry.

Using Different Self-Attention Schemes

If you want to experiment with different space-time self-attention schemes, e.g., joint space-time attention or divided space-time attention, use the following commands:

python tools/run_net.py \
  --cfg configs/K400/joint_224_16x4.yaml \
  DATA.PATH_TO_DATA_DIR path_to_your_dataset \
  NUM_GPUS 8 \
  TRAIN.BATCH_SIZE 8 \

and

python tools/run_net.py \
  --cfg configs/K400/divided_224_16x4.yaml \
  DATA.PATH_TO_DATA_DIR path_to_your_dataset \
  NUM_GPUS 8 \
  TRAIN.BATCH_SIZE 8 \

Training Different Motionformer Variants

If you want to train more powerful Motionformer variants, e.g., Motionformer-HR (operating on 16-frame clips sampled at 336x336 spatial resolution), and Motionformer-L (operating on 32-frame clips sampled at 224x224 spatial resolution), use the following commands:

python tools/run_net.py \
  --cfg configs/K400/motionformer_336_16x8.yaml \
  DATA.PATH_TO_DATA_DIR path_to_your_dataset \
  NUM_GPUS 8 \
  TRAIN.BATCH_SIZE 8 \

and

python tools/run_net.py \
  --cfg configs/K400/motionformer_224_32x3.yaml \
  DATA.PATH_TO_DATA_DIR path_to_your_dataset \
  NUM_GPUS 8 \
  TRAIN.BATCH_SIZE 8 \

Note that for these models you will need a set of GPUs with ~32GB of memory.

Inference

Use TRAIN.ENABLE and TEST.ENABLE to control whether training or testing is required for a given run. When testing, you also have to provide the path to the checkpoint model via TEST.CHECKPOINT_FILE_PATH.

python tools/run_net.py \
  --cfg configs/K400/motionformer_224_16x4.yaml \
  DATA.PATH_TO_DATA_DIR path_to_your_dataset \
  TEST.CHECKPOINT_FILE_PATH path_to_your_checkpoint \
  TRAIN.ENABLE False \

Alterantively, you can modify provided SLURM script and run following:

sbatch slurm_scripts/test.sh configs/K400/motionformer_224_16x4.yaml path_to_your_checkpoint

Single-Node Training via Slurm

To train Motionformer via Slurm, please check out our single node Slurm training script slurm_scripts/run_single_node_job.sh.

sbatch slurm_scripts/run_single_node_job.sh configs/K400/motionformer_224_16x4.yaml /your/job/dir/${JOB_NAME}/

Multi-Node Training via Submitit

Distributed training is available via Slurm and submitit

pip install submitit

To train Motionformer model on Kinetics using 8 nodes with 8 gpus each use the following command:

python run_with_submitit.py --cfg configs/K400/motionformer_224_16x4.yaml --job_dir  /your/job/dir/${JOB_NAME}/ --partition $PARTITION --num_shards 8 --use_volta32

We provide a script for launching slurm jobs in slurm_scripts/run_multi_node_job.sh.

sbatch slurm_scripts/run_multi_node_job.sh configs/K400/motionformer_224_16x4.yaml /your/job/dir/${JOB_NAME}/

Please note that hyper-parameters in configs were used with 8 nodes with 8 gpus (32 GB). Please scale batch-size, and learning-rate appropriately for your cluster configuration.

Finetuning

To finetune from an existing PyTorch checkpoint add the following line in the command line, or you can also add it in the YAML config:

TRAIN.CHECKPOINT_EPOCH_RESET: True
TRAIN.CHECKPOINT_FILE_PATH path_to_your_PyTorch_checkpoint

Environment

The code was developed using python 3.8.5 on Ubuntu 20.04. For training, we used eight GPU compute nodes each node containing 8 Tesla V100 GPUs (32 GPUs in total). Other platforms or GPU cards have not been fully tested.

License

The majority of this work is licensed under CC-NC 4.0 International license. However, portions of the project are available under separate license terms: SlowFast and pytorch-image-models are licensed under the Apache 2.0 license.

Contributing

We actively welcome your pull requests. Please see CONTRIBUTING.md and CODE_OF_CONDUCT.md for more info.

Acknowledgements

Motionformer is built on top of PySlowFast, Timesformer and pytorch-image-models by Ross Wightman. We thank the authors for releasing their code. If you use our model, please consider citing these works as well:

@misc{fan2020pyslowfast,
  author =       {Haoqi Fan and Yanghao Li and Bo Xiong and Wan-Yen Lo and
                  Christoph Feichtenhofer},
  title =        {PySlowFast},
  howpublished = {\url{https://github.com/facebookresearch/slowfast}},
  year =         {2020}
}
@inproceedings{gberta_2021_ICML,
    author  = {Gedas Bertasius and Heng Wang and Lorenzo Torresani},
    title = {Is Space-Time Attention All You Need for Video Understanding?},
    booktitle   = {Proceedings of the International Conference on Machine Learning (ICML)}, 
    month = {July},
    year = {2021}
}
@misc{rw2019timm,
  author = {Ross Wightman},
  title = {PyTorch Image Models},
  year = {2019},
  publisher = {GitHub},
  journal = {GitHub repository},
  doi = {10.5281/zenodo.4414861},
  howpublished = {\url{https://github.com/rwightman/pytorch-image-models}}
}
Comments
  • Can not reproduce the results

    Can not reproduce the results

    I run the motionformer for three different setting on Kinetics datasets.

    I run motionformer_224_16x4.yaml with batchsize 8 using 8GPU. Finally, I got 71.48 top-1 on val dataset.

    I run configs/K400/joint_224_16x4.yaml. I got 76.63 on val dataset.

    I run configs/K400/divided_224_16x4.yaml. I got 76.27 on val dataset.

    opened by lxtGH 8
  • temporal attention fix

    temporal attention fix

    A typo in the original code meant that the value tensors for the temporal attention step were identical to the input instead of being multiplied by a learned projection matrix (v = x rather than v = Wx). The original code is kept to facilitate replication and can be used by setting use_original_code=True, but is not recommended.

    CLA Signed 
    opened by dylan-campbell 2
  • An uncleared step in TrajectoryAttention.forward()

    An uncleared step in TrajectoryAttention.forward()

    Hello, thanks for this great work and the shared code. A small issue I have been trying to figure out: in vit_helper.py.TrajectoryAttention.forward(), when the temporal attention is applied:

    
    1.         x = rearrange(x, '(b h) s f d -> b s f (h d)', b=B)
    2.         x_diag = rearrange(x, 'b (g n) f d -> b g n f d', g=F)
    3.         x_diag = torch.diagonal(x_diag, dim1=-4, dim2=-2)
    4.         x_diag = rearrange(x_diag, f'b n d f -> b (f n) d', f=F)
    5.         q2 = self.proj_q(x_diag)
    6.         k2, v2 = self.proj_kv(x).chunk(2, dim=-1)
    7.         q2 = rearrange(q2, f'b s (h d) -> b h s d', h=h)
    8.         x, k2, v2 = map(
    9.             lambda t: rearrange(t, f'b s f (h d) -> b h s f d', f=F,  h=h), (x, k2, v2))
    10.         q2 *= self.scale
    11.         attn = torch.einsum('b h s d, b h s f d -> b h s f', q2, k2)
    12.         attn = attn.softmax(dim=-1)
    13.         x = torch.einsum('b h s f, b h s f d -> b h s d', attn, x)
    14.         x = rearrange(x, f'b h s d -> b s (h d)')
    
    

    in line 249 (here 13), why is EINSUM operation is applied on attn and _x? in the paper, the temporal attention is applied as usual: image but in the code it seem like v^{\sim}_{stt'} is replaced with the reshaped version x.

    In addition, I am confused since x is reshaped in line (here 8) together with k1, v1. So it might seem to be intentional.

    Thanks!

    opened by ofir1080 2
  • Loading pretrained weights(Epic-kitchens)

    Loading pretrained weights(Epic-kitchens)

    Hi!! Thank you for releasing the code.

    I tried to load the Epic-kitchens pretrained weights(ek_motionformer_224_16x4.pyth) but seems like there are some missing keys in the pytorch state_dict.

    Missing Keys: {'blocks.9.attn.proj_kv.bias', 'blocks.7.attn.proj_q.weight', 'blocks.4.attn.proj_kv.weight', 'head0.weight', 'blocks.5.attn.proj_q.weight', 'patch_embed_3d.proj.weight', 'blocks.1.attn.proj_q.weight', 'head1.weight', 'blocks.10.attn.proj_q.weight', 'temp_embed', 'blocks.1.attn.proj_q.bias', 'blocks.10.attn.proj_kv.bias', 'blocks.1.attn.proj_kv.weight', 'pre_logits.fc.bias', 'blocks.5.attn.proj_q.bias', 'blocks.11.attn.proj_q.bias', 'blocks.2.attn.proj_q.weight', 'blocks.6.attn.proj_kv.weight', 'blocks.2.attn.proj_kv.bias', 'blocks.3.attn.proj_kv.bias', 'blocks.11.attn.proj_q.weight', 'blocks.10.attn.proj_q.bias', 'patch_embed_3d.proj.bias', 'blocks.8.attn.proj_kv.bias', 'blocks.3.attn.proj_q.bias', 'blocks.5.attn.proj_kv.weight', 'blocks.2.attn.proj_kv.weight', 'blocks.3.attn.proj_q.weight', 'blocks.9.attn.proj_kv.weight', 'blocks.9.attn.proj_q.weight', 'pre_logits.fc.weight', 'blocks.10.attn.proj_kv.weight', 'blocks.8.attn.proj_q.bias', 'blocks.5.attn.proj_kv.bias', 'blocks.0.attn.proj_kv.bias', 'blocks.4.attn.proj_kv.bias', 'blocks.0.attn.proj_q.bias', 'blocks.11.attn.proj_kv.weight', 'blocks.6.attn.proj_q.bias', 'head1.bias', 'blocks.3.attn.proj_kv.weight', 'blocks.7.attn.proj_kv.bias', 'blocks.8.attn.proj_q.weight', 'blocks.8.attn.proj_kv.weight', 'blocks.4.attn.proj_q.weight', 'blocks.6.attn.proj_kv.bias', 'blocks.7.attn.proj_q.bias', 'blocks.0.attn.proj_kv.weight', 'blocks.6.attn.proj_q.weight', 'blocks.4.attn.proj_q.bias', 'blocks.2.attn.proj_q.bias', 'blocks.1.attn.proj_kv.bias', 'blocks.9.attn.proj_q.bias', 'blocks.0.attn.proj_q.weight', 'head0.bias', 'blocks.11.attn.proj_kv.bias', 'blocks.7.attn.proj_kv.weight'}

    Therefore, I could not reproduce the results with Epic-kitchens validation set. Could you please check the uploaded weights on Epic-kitchens?

    opened by JaesungHuh 1
  • Kinetics DataSet issues

    Kinetics DataSet issues

    Hi! Thanks for opensource the code. I wonder what is the size of the kinetics dataset 400 for training and validation? https://github.com/facebookresearch/SlowFast/issues/42

    opened by lxtGH 1
  • Reproduction for Sthv2

    Reproduction for Sthv2

    Hi, I just reproduced the experiment for Sthv2, with the config of motionformer_224_16x4.yaml. Specifically, I trained the model in one node with 8GPUS, which contains 32 samples in a mini-batch. The learning rate I used is 32 / 256 * 1e-4. However, the result I obtained is 64.3%, which is relatively lower than that reported in your paper. I wonder if the aforementioned learning rate is compatible with that setting since the original figure in that config is 1e-4.

    opened by PeiqinZhuang 0
  • Adding Code of Conduct file

    Adding Code of Conduct file

    This is pull request was created automatically because we noticed your project was missing a Code of Conduct file.

    Code of Conduct files facilitate respectful and constructive communities by establishing expected behaviors for project contributors.

    This PR was crafted with love by Facebook's Open Source Team.

    CLA Signed 
    opened by facebook-github-bot 0
  • Adding Contributing file

    Adding Contributing file

    This is pull request was created automatically because we noticed your project was missing a Contributing file.

    CONTRIBUTING files explain how a developer can contribute to the project - which you should actively encourage.

    This PR was crafted with love by Facebook's Open Source Team.

    CLA Signed 
    opened by facebook-github-bot 0
  • Strange RGB / BGR settings in ssv2 & kinetics data loader

    Strange RGB / BGR settings in ssv2 & kinetics data loader

    Hi. Thanks for the nice work.

    I have some questions regarding RGB / BGR standards used by ssv2 and kinetics loaders in this repo. Directly stating, I think RGB / BGR standards are mishandled in the current codebase.

    Specifically, SSv2 data loader initially reads frames in BGR standard (using OpenCV), however, the data-loader sometimes incorrectly applies functions that assume RGB input standards (e.g., ToPILImage). The final output is in BGR which is compatible with the pretrained ViT-B that assumes BGR standard.

    On the other hand, kinetics data loader initially reads frames in RGB standard (using PyAV), however, the data-loader sometimes incorrectly applies functions that assume BGR input standards (e.g., color_jitter). The final output is in RGB which is incompatible with the pretrained ViT-B that assumes BGR standard.

    I will try to point out problems in the order that the data loaders actually processes input files.

    1. Kinetics loader (https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/kinetics.py)

    1-1. Kinetics loader reads mp4 videos with PyAV backend, using VideoFrame.to_rgb method

    reference: https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/kinetics.py#L236-L246 https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/decoder.py#L269-L280

    VideoFrame.to_rgb reads mp4 frames in RGB standard

    1-2. frames_augmentation is applied, which assumes BGR standards.

    Specifically, contrast jitter relies on "BGR to Grayscale" transform, which is sensitive to the channel order. As a result, the augmentation is being incorrectly applied.

    1-3. The final output is RGB.

    Since ViT-B assumes BGR standards, the performance can be potentially sub-optimal (though we will finetune with video datasets)

    2. SSv2 loader (https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/ssv2.py)

    I will try to point out problems in the order that the SSv2 actually processes input files.

    2-1. SSv2 loader reads jpeg frames using cv2.imdecode method.

    reference: https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/ssv2.py#L246-L251 https://github.com/facebookresearch/Motionformer/blob/main/slowfast/datasets/utils.py#L41-L52

    cv2.imdecode reads jpeg files in BGR standard

    2-2. Frames are converted to PIL images using torchvision.transforms.ToPILImage method.

    As stated in the torchvision's documentation, torchvision.transforms.ToPILImage expects RGB standard, and the currently wrong channel order would lead to potentially incorrect color augmentations. Fortunately, I guess the current RandAug profile does not include channel-order sensitive augmentations.

    2-3. The final output is BGR

    ViT-B also follows BGR standards, hence there is no problem outputting BGR standard frames.

    opened by kami93 0
  • Question about the hyper-parameter of SSV2?

    Question about the hyper-parameter of SSV2?

    Hi, may I ask the extract learning rate when setting the batch size as 64 in SSV2. Currently, it is set as 1e-4 in the provided config, while it validates the scaling rule, e.g. 32/256 * 1e-4, which has been mentioned for K400.

    opened by PeiqinZhuang 0
  • Usage of keys in prototype selection

    Usage of keys in prototype selection

    Hi, First of all, thanks a lot for your work and for providing a clear and documented repository associated with your paper! While reading your paper I wondered how you selected your most orthogonal subset in detail. By looking at the code, I see you provide both keys and queries to the function orthogonal_landmarks. However, it seems you do not use keys to select your subset. Is that an intended behavior?

    Thanks !

    opened by hugoych 1
  • MF-LONG config for SSv2

    MF-LONG config for SSv2

    Hi, Thanks for providing this wonderful model.

    I'm trying to reproduce Motionformer-L on SSv2, I see that you use: https://github.com/facebookresearch/Motionformer/blob/6c860614a3b252c6163971ba20e61ea3184d5291/configs/SSV2/motionformer_224_32x3.yaml#L4

    But if I understand correctly this is equivalent to BATCH_SIZE=8, could you please clarify?

    Thanks, Elad.

    opened by eladb3 0
Owner
Facebook Research
Facebook Research
Make a Turtlebot3 follow a figure 8 trajectory and create a robot arm and make it follow a trajectory

HW2 - ME 495 Overview Part 1: Makes the robot move in a figure 8 shape. The robot starts moving when launched on a real turtlebot3 and can be paused a

Devesh Bhura 0 Oct 21, 2022
SE3 Pose Interp - Interpolate camera pose or trajectory in SE3, pose interpolation, trajectory interpolation

SE3 Pose Interpolation Pose estimated from SLAM system are always discrete, and

Ran Cheng 4 Dec 15, 2022
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts

t5-japanese Codes to pre-train T5 (Text-to-Text Transfer Transformer) models pre-trained on Japanese web texts. The following is a list of models that

Kimio Kuramitsu 1 Dec 13, 2021
Little Ball of Fur - A graph sampling extension library for NetworKit and NetworkX (CIKM 2020)

Little Ball of Fur is a graph sampling extension library for Python. Please look at the Documentation, relevant Paper, Promo video and External Resour

Benedek Rozemberczki 619 Dec 14, 2022
Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

THUDM 101 Dec 16, 2022
Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

ERICA Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive L

THUNLP 75 Nov 2, 2022
Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

THUNLP 31 Nov 19, 2022
Pre-trained model, code, and materials from the paper "Impact of Adversarial Examples on Deep Learning Models for Biomedical Image Segmentation" (MICCAI 2019).

Adaptive Segmentation Mask Attack This repository contains the implementation of the Adaptive Segmentation Mask Attack (ASMA), a targeted adversarial

Utku Ozbulak 53 Jul 4, 2022
The code repository for EMNLP 2021 paper "Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization".

Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization [Paper] accepted at the EMNLP 2021: Vision Guided Genera

CAiRE 42 Jan 7, 2023
Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

Ancient Greek BERT The first and only available Ancient Greek sub-word BERT model! State-of-the-art post fine-tuning on Part-of-Speech Tagging and Mor

Pranaydeep Singh 22 Dec 8, 2022
Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

pair-emnlp2020 Official repository for the paper: Xinyu Hua and Lu Wang: PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long

Xinyu Hua 31 Oct 13, 2022
Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition"

Tensorflow Implementation for "Pre-trained Deep Convolution Neural Network Model With Attention for Speech Emotion Recognition" Pre-trained Deep Convo

Ankush Malaker 5 Nov 11, 2022
Pytorch implementation of our paper under review — Lottery Jackpots Exist in Pre-trained Models

Lottery Jackpots Exist in Pre-trained Models (Paper Link) Requirements Python >= 3.7.4 Pytorch >= 1.6.1 Torchvision >= 0.4.1 Reproduce the Experiment

Yuxin Zhang 27 Jun 28, 2022
This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT).

Dynamic-Vision-Transformer (Pytorch) This repo contains the official code and pre-trained models for the Dynamic Vision Transformer (DVT). Not All Ima

null 210 Dec 18, 2022
Code and pre-trained models for MultiMAE: Multi-modal Multi-task Masked Autoencoders

MultiMAE: Multi-modal Multi-task Masked Autoencoders Roman Bachmann*, David Mizrahi*, Andrei Atanov, Amir Zamir Website | arXiv | BibTeX Official PyTo

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 385 Jan 6, 2023
Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Keeping it safe - AI Based COVID-19 Tracker using Deep Learning and facial recognition

Vansh Wassan 15 Jun 17, 2021
LaneDetectionAndLaneKeeping - Lane Detection And Lane Keeping

LaneDetectionAndLaneKeeping This project is part of my bachelor's thesis. The go

null 5 Jun 27, 2022