Inflated i3d network with inception backbone, weights transfered from tensorflow

Yana

Last update: Dec 8, 2022

Related tags

Deep Learning pytorch weight kinetics 3d-convolutional-network i3d inception-v1 inflated-network pytorch-rgb-predictions pytorch-flow-predictions

Overview

I3D models transfered from Tensorflow to PyTorch

This repo contains several scripts that allow to transfer the weights from the tensorflow implementation of I3D from the paper Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset by Joao Carreira and Andrew Zisserman to PyTorch.

The original (and official!) tensorflow code can be found here.

The heart of the transfer is the i3d_tf_to_pt.py script

Launch it with python i3d_tf_to_pt.py --rgb to generate the rgb checkpoint weight pretrained from ImageNet inflated initialization.

To generate the flow weights, use python i3d_tf_to_pt.py --flow.

You can also generate both in one run by using both flags simultaneously python i3d_tf_to_pt.py --rgb --flow.

Note that the master version requires PyTorch 0.3 as it relies on the recent addition of ConstantPad3d that has been included in this latest release.

If you want to use pytorch 0.2 checkout the branch pytorch-02 which contains a simplified model with even padding on all sides (and the corresponding pytorch weight checkpoints). The difference is that the 'SAME' option for padding in tensorflow allows it to pad unevenly both sides of a dimension, an effect reproduced on the master branch.

This simpler model produces scores a bit closer to the original tensorflow model on the demo sample and is also a bit faster.

Demo

There is a slight drift in the weights that impacts the predictions, however, it seems to only marginally affect the final predictions, and therefore, the converted weights should serve as a valid initialization for further finetuning.

This can be observed by evaluating the same sample as the original implementation.

For a demo, launch python i3d_pt_demo.py --rgb --flow. This script will print the scores produced by the pytorch model.

Pytorch Flow + RGB predictions:

1.0          44.53513 playing cricket
1.432034e-09 24.17096 hurling (sport)
4.385328e-10 22.98754 catching or throwing baseball
1.675852e-10 22.02560 catching or throwing softball
1.113020e-10 21.61636 hitting baseball
9.361596e-12 19.14072 playing tennis

Tensorflow Flow + RGB predictions:

1.0         41.8137 playing cricket
1.49717e-09 21.4943 hurling sport
3.84311e-10 20.1341 catching or throwing baseball
1.54923e-10 19.2256 catching or throwing softball
1.13601e-10 18.9153 hitting baseball
8.80112e-11 18.6601 playing tennis

PyTorch RGB predictions:

[playing cricket]: 9.999987E-01
[playing kickball]: 4.187616E-07
[catching or throwing baseball]: 3.255321E-07
[catching or throwing softball]: 1.335190E-07
[shooting goal (soccer)]: 8.081449E-08

Tensorflow RGB predictions:

[playing cricket]: 0.999997
[playing kickball]: 1.33535e-06
[catching or throwing baseball]: 4.55313e-07
[shooting goal (soccer)]: 3.14343e-07
[catching or throwing softball]: 1.92433e-07

PyTorch Flow predictions:

[playing cricket]: 9.365287E-01
[hurling (sport)]: 5.201872E-02
[playing squash or racquetball]: 3.165054E-03
[playing tennis]: 2.550464E-03
[hitting baseball]: 1.729896E-03

Tensorflow Flow predictions:

[playing cricket]: 0.928604
[hurling (sport)]: 0.0406825
[playing tennis]: 0.00415417
[playing squash or racquetbal]: 0.00247407
[hitting baseball]: 0.00138002

Time profiling

To time the forward and backward passes, you can install kernprof, an efficient line profiler, and then launch

kernprof -lv i3d_pt_profiling.py --frame_nb 16

This launches a basic pytorch training script on a dummy dataset that consists of replicated images as spatio-temporal inputs.

On my GeForce GTX TITAN Black (6Giga) a forward+backward pass takes roughly 0.25-0.3 seconds.

Some visualizations

Visualization of the weights and matching activations for the first convolutions

RGB

Weights

Activations

Flow

Weights

Activations

Comments

transfer learning with custom dataset that has different video size

Hi @hassony2 ,

First of all, thank you for posting your code!

I have a small question, i'm trying to do transfer learning to the model using my own dataset, but the difference is that my input shape is much different from kinetics or ucf101, each sample in my dataset has 64 frames, each frame is 600x600 with 3 channels with 8 classes. I tried to just to finetune the last Unit3Dpy but it didn't do well, do you think i'm missing out something?

Yana

opened by yana25 5
Transfer Learning

How can I change the Module to adapt it to the UCF101 dataset? If I changed the model (ie changed the out number of classes) , the pretrain weight is still work?

opened by vateye 3
I think it should be ceil_mode=True

https://github.com/hassony2/kinetics_i3d_pytorch/blob/c2b54db2368e136abe414d24aacd508c37b333a9/src/i3dpt.py#L115

tensorflow SAME padding must ceil https://www.tensorflow.org/api_docs/python/tf/nn/pool If padding = "SAME": output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

when I ceil_mode=False, I observed

a = torch.autograd.Variable(torch.ones((1,3,16,300,150)), requires_grad=False) out = i3d(a)

torch.Size([1, 64, 8, 150, 75]) torch.Size([1, 64, 8, 75, 37]) torch.Size([1, 64, 8, 75, 37]) torch.Size([1, 192, 8, 75, 37]) torch.Size([1, 192, 8, 37, 18]) torch.Size([1, 256, 8, 37, 18]) torch.Size([1, 480, 8, 37, 18]) torch.Size([1, 480, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 528, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 1024, 4, 18, 9]) torch.Size([1, 1024, 4, 18, 9])

when I ceil_mode=True a = torch.autograd.Variable(torch.ones((1,3,16,300,150)), requires_grad=False) out = i3d(a)

torch.Size([1, 64, 8, 150, 75]) torch.Size([1, 64, 8, 75, 38]) torch.Size([1, 64, 8, 75, 38]) torch.Size([1, 192, 8, 75, 38]) torch.Size([1, 192, 8, 38, 19]) torch.Size([1, 256, 8, 38, 19]) torch.Size([1, 480, 8, 38, 19]) torch.Size([1, 480, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 528, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 1024, 4, 19, 10]) torch.Size([1, 1024, 4, 19, 10])

I think ceil_mode=True is correct

opened by rimchang 3
python i3d_tf_to_pt.py --rgb

Traceback (most recent call last): File "i3d_tf_to_pt.py", line 9, in from src.i3dtf import InceptionI3d File "/home/zhuyisheng/kinetics_i3d_pytorch/src/i3dtf.py", line 32, in class Unit3Dtf(snt.AbstractModule): AttributeError: 'module' object has no attribute 'AbstractModule'

opened by Zhuysheng 2
could i ask the question on 2d conv?

i have searched a lot of info in google about 3d conv. it is easy to understand the 3d conv in vedio segments of single channel. but i do not know 3d conv in vedio segments of rgb channel. what about 3d conv kernel shape ? 333 or input channels * 333.

could you explain it ? thank you very. !

opened by Zhang-O 1

Arch mod

Thanks for sharing the implementation! These are just a few small changes to achieve numerical consistency with the TF version by:

Modifying the padding algorithm to match Tensorflow "SAME" for Unit3Dpy and MaxPool3dTFPadding.
Updating the batch norm hyperparams.

Tested on PyTorch 0.4.1. Running the i3d_pt_demo.py produces:

Top 5 classes and associated probabilities:
[playing cricket]: 9.999968E-01
[playing kickball]: 1.335340E-06
[catching or throwing baseball]: 4.553116E-07
[shooting goal (soccer)]: 3.143419E-07
[catching or throwing softball]: 1.924329E-07
Top 5 classes and associated probabilities:
[playing cricket]: 9.477569E-01
[hurling (sport)]: 4.068211E-02
[playing tennis]: 4.154132E-03
[playing squash or racquetball]: 2.474060E-03
[hitting baseball]: 1.380014E-03
===== Final predictions ====
logits proba class
4.181368e+01 1.000000e+00 playing cricket
2.149398e+01 1.497148e-09 hurling (sport)
2.013410e+01 3.843058e-10 catching or throwing baseball
1.922558e+01 1.549212e-10 catching or throwing softball
1.891534e+01 1.135999e-10 hitting baseball

opened by albanie 1

Fine-tune on HMDB51 or other datasets

Thanks a lot for providing the open-source code!

Could you please provide the code for fine-tuning on HMDB51 or some other datasets such as Moments in Time or UCF101?

Highly appreciate your time and help!

opened by LiliMeng 1
Pretrained model on Kinetics

Hi, Thanks a lot for providing the open-source I3D pytorch code! May I ask do you provide the pretrained model on Kinetics?

Highly appreciate your time and help!

opened by LiliMeng 1
accuary on UCF101 or Kinetics

Hello! I use your code to train UCF101 dataset, but loss does not go down.Can you show me ｈ-yperparameters when you train the model?Does this code training model achieve the effect of the paper?

opened by Kathrine94 1

About data preprocessing

Hi,

I was wondering if you have reproduced the preprocessing of the data. I tried to reproduce the preprocessing following the instruction in the official one. But I have no idea why the results could not match with the provided .npy. Here is my code:

    import numpy as np


    from PIL import Image, ImageSequence
    size = 256,256
    new_width = 224
    new_height = 224
    img = Image.open('./data/v_CricketShot_g04_c01_rgb.gif');

    def my_resize(img):
        for frame in ImageSequence.Iterator(img):
            copy_frame = frame.copy()
            copy_frame = copy_frame.convert("RGB")
            resized_frame = copy_frame.resize(size, Image.BILINEAR)
            r, g, b = resized_frame.split()
            r = np.array(r)
            rescaled_r = np.interp(r, (r.min(), r.max()), (-1, +1))
            g = np.array(g)
            rescaled_g = np.interp(g, (g.min(), g.max()), (-1, +1))
            b = np.array(b)
            rescaled_b = np.interp(b, (b.min(), b.max()), (-1, +1))
            rescaled_array = np.zeros((size[0], size[1], 3))
            rescaled_array[..., 0] = rescaled_r
            rescaled_array[..., 1] = rescaled_g
            rescaled_array[..., 2] = rescaled_b

            width, height, color = rescaled_array.shape
            left = (width - new_width)/2
            right = (width + new_width)/2
            top = (height - new_height)/2
            bottom = (height + new_height)/2
            croped_frame = rescaled_array[left:right, top:bottom, :]

            yield croped_frame

    copy_frame = my_resize(img)

    frames = np.array([np.array(frame) for frame in copy_frame])

Do you have any idea about this?

opened by TianjiPang 1

Handling videos at higher res

Hello,

I was wondering if you have figured out how to use I3D for higher res videos without going out of memory. I am using I3D for video object segmentation (have modified the network to give a segmentation output) and finding it hard to run all frames at 480p resolution for even a single video. My batch size is one and the max number of frames I can process without filling up a 12GB GPU are 24 or so. If I resize the video down to 224x224 I can manage to run all 100 frames in the video, but I am losing a lot of information this way and my results are not as great.

While training I by-passed the problem by taking random 224x224 crops of the full res image. But for inference I can't use this trick and hence wondering how to solve this issue. Any ideas?

opened by siddhantjain 1
Grayscale Images

Hi @hassony2,

First of all thanks for the repo.

I wanted to use the pretrained kinetics RGB model to extract features from a dataset I created. Since my application should run in real time with limited computational resources I wanted to use grayscale images, since that way I only have to process one channel. My question was whether that would cause issue with the RGB model. If so, then I will convert my dataset to colour images.

Thank you

opened by mnik17 0
Moving model weight file to some storage

Great work @hassony2 . It will be really good if the model weights are moved to some storage and provide a script to download the weights. So that I can again fork the repository.

Reason: Storage limit on GitHub. This repository has approximately 245 MB size, and I have to delete the fork because of the size limit.

opened by priteshgohil 0
The padding of I3D model should be symmetrical

The Module of MaxPool3dTFPadding with kernel_size=(1,3,3), stride(1,2,2) can lead to asymmetrical padding. It would influence the output feature map, as the bottom right would be usually higher than other part of the feature map.

When I try to input a all zeros tensor into I3D model pretrained on Kinetics-400, someting strange happen, I average pooling the C and T dim and min-max norm to get a picture as below. The bottom right is much higher than other parts.

By checking each layer output, I find out bottom right is usually have higher activation value than other part but not obvious, until mixed_5b block.

I don't know whether it hurt the model's performance, but at least it hurt the Interpretability.

opened by fjchange 1
about extract features from my dataset

Hi, I want to ask how can i use this code to extract features from my own video datasets. Your input of your code is .npy. However, how can i get my .npy file from my dataset? I do not find any data process code in this repo.

Thanks

opened by galaxysan 3
convert kinetics-600 model failed

It seems kinetics-600 retrained-model herekinetics-i3d is the same as kinetics-400, but i meet error. Not found error: Key RGB/inception_i3d/Conv3d_1a_7x7/batch_norm/beta not found in checkpoint.

opened by FingerRec 2

Owner

Yana

PhD student at Inria Paris, focusing on action recognition in first person videos

GitHub

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

PSPNet-logits and feature-distillation Introduction This repository is based on PSPNet and modified from semseg and Pixelwise_Knowledge_Distillation_P

6 Dec 1, 2022

High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 >= TF >= 1.4.0). Guiding principles Applicability.

1k Dec 13, 2022

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

111 Dec 31, 2022

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

253 Dec 24, 2022

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

2.4k Dec 28, 2022

Efficient 3D Backbone Network for Temporal Modeling

VoV3D is an efficient and effective 3D backbone network for temporal modeling implemented on top of PySlowFast. Diverse Temporal Aggregation and

102 Dec 6, 2022

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Piggyback: https://arxiv.org/abs/1801.06519 Pretrained masks and backbones are available here: https://uofi.box.com/s/c5kixsvtrghu9yj51yb1oe853ltdfz4q

165 Nov 22, 2022

a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

9.9k Jan 8, 2023

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Res2Net The official pytorch implemention of the paper "Res2Net: A New Multi-scale Backbone Architecture" Our paper is accepted by IEEE Transactions o

928 Dec 29, 2022

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

409 Jan 6, 2023

yolox_backbone is a deep-learning library and is a collection of YOLOX Backbone models.

YOLOX-Backbone yolox-backbone is a deep-learning library and is a collection of YOLOX backbone models. Install pip install yolox-backbone Load a Pret

21 Dec 28, 2022

YOLOv5 Series Multi-backbone, Pruning and quantization Compression Tool Box.

YOLOv5-Compression Update News Requirements 环境安装 pip install -r requirements.txt Evaluation metric Visdrone Model mAP mAP@50 Parameters(M) GFLOPs FPS@

719 Jan 2, 2023

PyTorch Implementation of Backbone of PicoDet

PicoDet-Backbone PyTorch Implementation of Backbone of PicoDet Original Implementation is implemented on PaddlePaddle. Example picodet_l_backbone = ES

7 Jul 12, 2022

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

270 Dec 31, 2022

A repository that shares tuning results of trained models generated by TensorFlow / Keras. Post-training quantization (Weight Quantization, Integer Quantization, Full Integer Quantization, Float16 Quantization), Quantization-aware training. TensorFlow Lite. OpenVINO. CoreML. TensorFlow.js. TF-TRT. MediaPipe. ONNX. [.tflite,.h5,.pb,saved_model,tfjs,tftrt,mlmodel,.xml/.bin, .onnx]

PINTO_model_zoo Please read the contents of the LICENSE file located directly under each folder before using the model. My model conversion scripts ar

2.4k Jan 5, 2023

Inflated i3d network with inception backbone, weights transfered from tensorflow

Related tags

Overview

I3D models transfered from Tensorflow to PyTorch

Demo

Time profiling

Some visualizations

RGB

Flow

Comments

Owner

Yana

This repo uses a combination of logits and feature distillation method to teach the PSPNet model of ResNet18 backbone with the PSPNet model of ResNet50 backbone. All the models are trained and tested on the PASCAL-VOC2012 dataset.

High level network definitions with pre-trained weights in TensorFlow

Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

Efficient 3D Backbone Network for Temporal Modeling

Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

a general-purpose Transformer based vision backbone

(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

yolox_backbone is a deep-learning library and is a collection of YOLOX Backbone models.

YOLOv5 Series Multi-backbone, Pruning and quantization Compression Tool Box.

PyTorch Implementation of Backbone of PicoDet

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Adds timm pretrained backbone to pytorch's FasterRcnn model

A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

The backbone CSPDarkNet of YOLOX.

An example to implement a new backbone with OpenMMLab framework.