Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Phil Wang

Last update: Nov 24, 2022

Related tags

Deep Learning deep-learning transformers artificial-intelligence attention-mechanism video-classification 3d-convolutional-network

Overview

Uniformer - Pytorch

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Install

$ pip install uniformer-pytorch

Usage

Uniformer-S

import torch
from uniformer_pytorch import Uniformer

model = Uniformer(
    num_classes = 1000,                 # number of output classes
    dims = (64, 128, 256, 512),         # feature dimensions per stage (4 stages)
    depths = (3, 4, 8, 3),              # depth at each stage
    mhsa_types = ('l', 'l', 'g', 'g')   # aggregation type at each stage, 'l' stands for local, 'g' stands for global
)

video = torch.randn(1, 3, 8, 224, 224)  # (batch, channels, time, height, width)

logits = model(video) # (1, 1000)

Uniformer-B

import torch
from uniformer_pytorch import Uniformer

model = Uniformer(
    num_classes = 1000
    depths = (5, 8, 20, 7)
)

Citations

@inproceedings{anonymous2022uniformer,
    title   = {UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning},
    author  = {Anonymous},
    booktitle = {Submitted to The Tenth International Conference on Learning Representations },
    year    = {2022},
    url     = {https://openreview.net/forum?id=nBU_u6DLvoK},
    note    = {under review}
}

Comments

About the re-implementation of Layernorm

Hi @lucidrains ,

Thanks for the implementation of Uniformer. After checking the code, I found that you have re-implemented the layernorm operation in the following. https://github.com/lucidrains/uniformer-pytorch/blob/78397000e647fd4035a7fa2bede999d59d4c3ded/uniformer_pytorch/uniformer_pytorch.py#L13-L23

I wonder if there are any differences between your re-implementation and the official code provided by PyTorch (i.e., torch.nn.Layernorm(dim) ).

Looking forward to your reply. Thx.

opened by ChongjianGE 2
Pretrained weights

Did you pretrain the uniformer on any of the benchmark datasets? (Kinetics, something-something, ...) If you did, would you be able to share the state dicts?

Thanks in advance!

opened by Fritskee 3

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

182 Jan 3, 2023

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

104 Nov 25, 2022

U-2-Net: U Square Net - Modified for paired image training of style transfer

U2-Net: U Square Net Modified for paired image training of style transfer This is an unofficial repo making use of the code which was made available b

43 Oct 3, 2022

Implementation of TimeSformer, a pure attention-based solution for video classification

TimeSformer - Pytorch Implementation of TimeSformer, a pure and simple attention-based solution for reaching SOTA on video classification.

602 Jan 3, 2023

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

606 Dec 21, 2022

An implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional Neural Network"

Retina Blood Vessels Segmentation This is an implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional

23 Aug 20, 2022

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

U-Net Implementation By Christopher Ley This is my interpretation and implementation of the famous paper "U-Net: Convolutional Networks for Biomedical

1 Jan 6, 2022

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

556 Jan 4, 2023

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

🦩 Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

630 Dec 28, 2022

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Related tags

Overview

Uniformer - Pytorch

Install

Usage

Citations

You might also like...

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

U-2-Net: U Square Net - Modified for paired image training of style transfer

Implementation of TimeSformer, a pure attention-based solution for video classification

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

An implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional Neural Network"

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Comments

About the re-implementation of Layernorm

Pretrained weights

Releases(0.0.4)

0.0.4(Apr 22, 2022)

0.0.3(Nov 17, 2021)

0.0.2(Nov 16, 2021)

0.0.1(Nov 16, 2021)

Owner

Phil Wang

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Code for our ICASSP 2021 paper: SA-Net: Shuffle Attention for Deep Convolutional Neural Networks

A collection of SOTA Image Classification Models in PyTorch

RGBD-Net - This repository contains a pytorch lightning implementation for the 3DV 2021 RGBD-Net paper.

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

CT-Net: Channel Tensorization Network for Video Classification

The Medical Detection Toolkit contains 2D + 3D implementations of prevalent object detectors such as Mask R-CNN, Retina Net, Retina U-Net, as well as a training and inference framework focused on dealing with medical images.

Neural networks applied in recognizing guitar chords using python, AutoML.NET with C# and .NET Core

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding