Inflated i3d network with inception backbone, weights transfered from tensorflow

Overview

I3D models transfered from Tensorflow to PyTorch

This repo contains several scripts that allow to transfer the weights from the tensorflow implementation of I3D from the paper Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset by Joao Carreira and Andrew Zisserman to PyTorch.

The original (and official!) tensorflow code can be found here.

The heart of the transfer is the i3d_tf_to_pt.py script

Launch it with python i3d_tf_to_pt.py --rgb to generate the rgb checkpoint weight pretrained from ImageNet inflated initialization.

To generate the flow weights, use python i3d_tf_to_pt.py --flow.

You can also generate both in one run by using both flags simultaneously python i3d_tf_to_pt.py --rgb --flow.

Note that the master version requires PyTorch 0.3 as it relies on the recent addition of ConstantPad3d that has been included in this latest release.

If you want to use pytorch 0.2 checkout the branch pytorch-02 which contains a simplified model with even padding on all sides (and the corresponding pytorch weight checkpoints). The difference is that the 'SAME' option for padding in tensorflow allows it to pad unevenly both sides of a dimension, an effect reproduced on the master branch.

This simpler model produces scores a bit closer to the original tensorflow model on the demo sample and is also a bit faster.

Demo

There is a slight drift in the weights that impacts the predictions, however, it seems to only marginally affect the final predictions, and therefore, the converted weights should serve as a valid initialization for further finetuning.

This can be observed by evaluating the same sample as the original implementation.

For a demo, launch python i3d_pt_demo.py --rgb --flow. This script will print the scores produced by the pytorch model.

Pytorch Flow + RGB predictions:

1.0          44.53513 playing cricket
1.432034e-09 24.17096 hurling (sport)
4.385328e-10 22.98754 catching or throwing baseball
1.675852e-10 22.02560 catching or throwing softball
1.113020e-10 21.61636 hitting baseball
9.361596e-12 19.14072 playing tennis

Tensorflow Flow + RGB predictions:

1.0         41.8137 playing cricket
1.49717e-09 21.4943 hurling sport
3.84311e-10 20.1341 catching or throwing baseball
1.54923e-10 19.2256 catching or throwing softball
1.13601e-10 18.9153 hitting baseball
8.80112e-11 18.6601 playing tennis

PyTorch RGB predictions:

[playing cricket]: 9.999987E-01
[playing kickball]: 4.187616E-07
[catching or throwing baseball]: 3.255321E-07
[catching or throwing softball]: 1.335190E-07
[shooting goal (soccer)]: 8.081449E-08

Tensorflow RGB predictions:

[playing cricket]: 0.999997
[playing kickball]: 1.33535e-06
[catching or throwing baseball]: 4.55313e-07
[shooting goal (soccer)]: 3.14343e-07
[catching or throwing softball]: 1.92433e-07

PyTorch Flow predictions:

[playing cricket]: 9.365287E-01
[hurling (sport)]: 5.201872E-02
[playing squash or racquetball]: 3.165054E-03
[playing tennis]: 2.550464E-03
[hitting baseball]: 1.729896E-03

Tensorflow Flow predictions:

[playing cricket]: 0.928604
[hurling (sport)]: 0.0406825
[playing tennis]: 0.00415417
[playing squash or racquetbal]: 0.00247407
[hitting baseball]: 0.00138002

Time profiling

To time the forward and backward passes, you can install kernprof, an efficient line profiler, and then launch

kernprof -lv i3d_pt_profiling.py --frame_nb 16

This launches a basic pytorch training script on a dummy dataset that consists of replicated images as spatio-temporal inputs.

On my GeForce GTX TITAN Black (6Giga) a forward+backward pass takes roughly 0.25-0.3 seconds.

Some visualizations

Visualization of the weights and matching activations for the first convolutions

RGB

rgb_sample

Weights

rgb_weights

Activations

rgb_activations

Flow

flow_sample

Weights

flow_weights

Activations

flow_activations

Comments
  • transfer learning with custom dataset that has different video size

    transfer learning with custom dataset that has different video size

    Hi @hassony2 ,

    First of all, thank you for posting your code!

    I have a small question, i'm trying to do transfer learning to the model using my own dataset, but the difference is that my input shape is much different from kinetics or ucf101, each sample in my dataset has 64 frames, each frame is 600x600 with 3 channels with 8 classes. I tried to just to finetune the last Unit3Dpy but it didn't do well, do you think i'm missing out something?

    Yana

    opened by yana25 5
  • Transfer Learning

    Transfer Learning

    How can I change the Module to adapt it to the UCF101 dataset? If I changed the model (ie changed the out number of classes) , the pretrain weight is still work?

    opened by vateye 3
  • I think it should be ceil_mode=True

    I think it should be ceil_mode=True

    https://github.com/hassony2/kinetics_i3d_pytorch/blob/c2b54db2368e136abe414d24aacd508c37b333a9/src/i3dpt.py#L115

    tensorflow SAME padding must ceil https://www.tensorflow.org/api_docs/python/tf/nn/pool If padding = "SAME": output_spatial_shape[i] = ceil(input_spatial_shape[i] / strides[i])

    when I ceil_mode=False, I observed

    a = torch.autograd.Variable(torch.ones((1,3,16,300,150)), requires_grad=False) out = i3d(a)

    torch.Size([1, 64, 8, 150, 75]) torch.Size([1, 64, 8, 75, 37]) torch.Size([1, 64, 8, 75, 37]) torch.Size([1, 192, 8, 75, 37]) torch.Size([1, 192, 8, 37, 18]) torch.Size([1, 256, 8, 37, 18]) torch.Size([1, 480, 8, 37, 18]) torch.Size([1, 480, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 512, 4, 18, 9]) torch.Size([1, 528, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 832, 4, 18, 9]) torch.Size([1, 1024, 4, 18, 9]) torch.Size([1, 1024, 4, 18, 9])

    when I ceil_mode=True a = torch.autograd.Variable(torch.ones((1,3,16,300,150)), requires_grad=False) out = i3d(a)

    torch.Size([1, 64, 8, 150, 75]) torch.Size([1, 64, 8, 75, 38]) torch.Size([1, 64, 8, 75, 38]) torch.Size([1, 192, 8, 75, 38]) torch.Size([1, 192, 8, 38, 19]) torch.Size([1, 256, 8, 38, 19]) torch.Size([1, 480, 8, 38, 19]) torch.Size([1, 480, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 512, 4, 19, 10]) torch.Size([1, 528, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 832, 4, 19, 10]) torch.Size([1, 1024, 4, 19, 10]) torch.Size([1, 1024, 4, 19, 10])

    I think ceil_mode=True is correct

    opened by rimchang 3
  • python i3d_tf_to_pt.py  --rgb

    python i3d_tf_to_pt.py --rgb

    Traceback (most recent call last): File "i3d_tf_to_pt.py", line 9, in from src.i3dtf import InceptionI3d File "/home/zhuyisheng/kinetics_i3d_pytorch/src/i3dtf.py", line 32, in class Unit3Dtf(snt.AbstractModule): AttributeError: 'module' object has no attribute 'AbstractModule'

    opened by Zhuysheng 2
  • could i ask the question on 2d conv?

    could i ask the question on 2d conv?

    i have searched a lot of info in google about 3d conv. it is easy to understand the 3d conv in vedio segments of single channel. but i do not know 3d conv in vedio segments of rgb channel. what about 3d conv kernel shape ? 333 or input channels * 333.

    could you explain it ? thank you very. !

    opened by Zhang-O 1
  • Arch mod

    Arch mod

    Thanks for sharing the implementation! These are just a few small changes to achieve numerical consistency with the TF version by:

    1. Modifying the padding algorithm to match Tensorflow "SAME" for Unit3Dpy and MaxPool3dTFPadding.
    2. Updating the batch norm hyperparams.

    Tested on PyTorch 0.4.1. Running the i3d_pt_demo.py produces:

    Top 5 classes and associated probabilities:
    [playing cricket]: 9.999968E-01
    [playing kickball]: 1.335340E-06
    [catching or throwing baseball]: 4.553116E-07
    [shooting goal (soccer)]: 3.143419E-07
    [catching or throwing softball]: 1.924329E-07
    Top 5 classes and associated probabilities:
    [playing cricket]: 9.477569E-01
    [hurling (sport)]: 4.068211E-02
    [playing tennis]: 4.154132E-03
    [playing squash or racquetball]: 2.474060E-03
    [hitting baseball]: 1.380014E-03
    ===== Final predictions ====
    logits proba class
    4.181368e+01 1.000000e+00 playing cricket
    2.149398e+01 1.497148e-09 hurling (sport)
    2.013410e+01 3.843058e-10 catching or throwing baseball
    1.922558e+01 1.549212e-10 catching or throwing softball
    1.891534e+01 1.135999e-10 hitting baseball
    
    opened by albanie 1
  • Fine-tune on HMDB51 or other datasets

    Fine-tune on HMDB51 or other datasets

    Thanks a lot for providing the open-source code!

    Could you please provide the code for fine-tuning on HMDB51 or some other datasets such as Moments in Time or UCF101?

    Highly appreciate your time and help!

    opened by LiliMeng 1
  • Pretrained model on Kinetics

    Pretrained model on Kinetics

    Hi, Thanks a lot for providing the open-source I3D pytorch code! May I ask do you provide the pretrained model on Kinetics?

    Highly appreciate your time and help!

    opened by LiliMeng 1
  • accuary on UCF101 or Kinetics

    accuary on UCF101 or Kinetics

    Hello! I use your code to train UCF101 dataset, but loss does not go down.Can you show me h-yperparameters when you train the model?Does this code training model achieve the effect of the paper?

    opened by Kathrine94 1
  • About data preprocessing

    About data preprocessing

    Hi,

    I was wondering if you have reproduced the preprocessing of the data. I tried to reproduce the preprocessing following the instruction in the official one. But I have no idea why the results could not match with the provided .npy. Here is my code:

        import numpy as np
    
    
        from PIL import Image, ImageSequence
        size = 256,256
        new_width = 224
        new_height = 224
        img = Image.open('./data/v_CricketShot_g04_c01_rgb.gif');
    
        def my_resize(img):
            for frame in ImageSequence.Iterator(img):
                copy_frame = frame.copy()
                copy_frame = copy_frame.convert("RGB")
                resized_frame = copy_frame.resize(size, Image.BILINEAR)
                r, g, b = resized_frame.split()
                r = np.array(r)
                rescaled_r = np.interp(r, (r.min(), r.max()), (-1, +1))
                g = np.array(g)
                rescaled_g = np.interp(g, (g.min(), g.max()), (-1, +1))
                b = np.array(b)
                rescaled_b = np.interp(b, (b.min(), b.max()), (-1, +1))
                rescaled_array = np.zeros((size[0], size[1], 3))
                rescaled_array[..., 0] = rescaled_r
                rescaled_array[..., 1] = rescaled_g
                rescaled_array[..., 2] = rescaled_b
    
                width, height, color = rescaled_array.shape
                left = (width - new_width)/2
                right = (width + new_width)/2
                top = (height - new_height)/2
                bottom = (height + new_height)/2
                croped_frame = rescaled_array[left:right, top:bottom, :]
    
                yield croped_frame
    
        copy_frame = my_resize(img)
    
        frames = np.array([np.array(frame) for frame in copy_frame])
    

    Do you have any idea about this?

    opened by TianjiPang 1
  • Handling videos at higher res

    Handling videos at higher res

    Hello,

    I was wondering if you have figured out how to use I3D for higher res videos without going out of memory. I am using I3D for video object segmentation (have modified the network to give a segmentation output) and finding it hard to run all frames at 480p resolution for even a single video. My batch size is one and the max number of frames I can process without filling up a 12GB GPU are 24 or so. If I resize the video down to 224x224 I can manage to run all 100 frames in the video, but I am losing a lot of information this way and my results are not as great.

    While training I by-passed the problem by taking random 224x224 crops of the full res image. But for inference I can't use this trick and hence wondering how to solve this issue. Any ideas?

    opened by siddhantjain 1
  • Grayscale Images

    Grayscale Images

    Hi @hassony2,

    First of all thanks for the repo.

    I wanted to use the pretrained kinetics RGB model to extract features from a dataset I created. Since my application should run in real time with limited computational resources I wanted to use grayscale images, since that way I only have to process one channel. My question was whether that would cause issue with the RGB model. If so, then I will convert my dataset to colour images.

    Thank you

    opened by mnik17 0
  • Moving model weight file to some storage

    Moving model weight file to some storage

    Great work @hassony2 . It will be really good if the model weights are moved to some storage and provide a script to download the weights. So that I can again fork the repository.

    Reason: Storage limit on GitHub. This repository has approximately 245 MB size, and I have to delete the fork because of the size limit.

    opened by priteshgohil 0
  • The padding of I3D model should be symmetrical

    The padding of I3D model should be symmetrical

    The Module of MaxPool3dTFPadding with kernel_size=(1,3,3), stride(1,2,2) can lead to asymmetrical padding. It would influence the output feature map, as the bottom right would be usually higher than other part of the feature map.

    When I try to input a all zeros tensor into I3D model pretrained on Kinetics-400, someting strange happen, I average pooling the C and T dim and min-max norm to get a picture as below. The bottom right is much higher than other parts. grad_cam

    By checking each layer output, I find out bottom right is usually have higher activation value than other part but not obvious, until mixed_5b block.

    I don't know whether it hurt the model's performance, but at least it hurt the Interpretability.

    opened by fjchange 1
  • about extract features from my dataset

    about extract features from my dataset

    Hi, I want to ask how can i use this code to extract features from my own video datasets. Your input of your code is .npy. However, how can i get my .npy file from my dataset? I do not find any data process code in this repo.

    Thanks

    opened by galaxysan 3
  • convert kinetics-600 model failed

    convert kinetics-600 model failed

    It seems kinetics-600 retrained-model herekinetics-i3d is the same as kinetics-400, but i meet error. Not found error: Key RGB/inception_i3d/Conv3d_1a_7x7/batch_norm/beta not found in checkpoint.

    opened by FingerRec 2
Owner
Yana
PhD student at Inria Paris, focusing on action recognition in first person videos
Yana
LIAO Shuiying 6 Dec 1, 2022
High level network definitions with pre-trained weights in TensorFlow

TensorNets High level network definitions with pre-trained weights in TensorFlow (tested with 2.1.0 >= TF >= 1.4.0). Guiding principles Applicability.

Taehoon Lee 1k Dec 13, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Diverse Branch Block: Building a Convolution as an Inception-like Unit

Diverse Branch Block: Building a Convolution as an Inception-like Unit (PyTorch) (CVPR-2021) DBB is a powerful ConvNet building block to replace regul

null 253 Dec 24, 2022
Base pretrained models and datasets in pytorch (MNIST, SVHN, CIFAR10, CIFAR100, STL10, AlexNet, VGG16, VGG19, ResNet, Inception, SqueezeNet)

This is a playground for pytorch beginners, which contains predefined models on popular dataset. Currently we support mnist, svhn cifar10, cifar100 st

Aaron Chen 2.4k Dec 28, 2022
Efficient 3D Backbone Network for Temporal Modeling

VoV3D is an efficient and effective 3D backbone network for temporal modeling implemented on top of PySlowFast. Diverse Temporal Aggregation and

null 102 Dec 6, 2022
Code for Piggyback: Adapting a Single Network to Multiple Tasks by Learning to Mask Weights

Piggyback: https://arxiv.org/abs/1801.06519 Pretrained masks and backbones are available here: https://uofi.box.com/s/c5kixsvtrghu9yj51yb1oe853ltdfz4q

Arun Mallya 165 Nov 22, 2022
a general-purpose Transformer based vision backbone

Swin Transformer By Ze Liu*, Yutong Lin*, Yue Cao*, Han Hu*, Yixuan Wei, Zheng Zhang, Stephen Lin and Baining Guo. This repo is the official implement

Microsoft 9.9k Jan 8, 2023
(ImageNet pretrained models) The official pytorch implemention of the TPAMI paper "Res2Net: A New Multi-scale Backbone Architecture"

Res2Net The official pytorch implemention of the paper "Res2Net: A New Multi-scale Backbone Architecture" Our paper is accepted by IEEE Transactions o

Res2Net Applications 928 Dec 29, 2022
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped

CSWin-Transformer This repo is the official implementation of "CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows". Th

Microsoft 409 Jan 6, 2023
yolox_backbone is a deep-learning library and is a collection of YOLOX Backbone models.

YOLOX-Backbone yolox-backbone is a deep-learning library and is a collection of YOLOX backbone models. Install pip install yolox-backbone Load a Pret

Yonghye Kwon 21 Dec 28, 2022
YOLOv5 Series Multi-backbone, Pruning and quantization Compression Tool Box.

YOLOv5-Compression Update News Requirements 环境安装 pip install -r requirements.txt Evaluation metric Visdrone Model mAP mAP@50 Parameters(M) GFLOPs FPS@

ZhangYuan 719 Jan 2, 2023
PyTorch Implementation of Backbone of PicoDet

PicoDet-Backbone PyTorch Implementation of Backbone of PicoDet Original Implementation is implemented on PaddlePaddle. Example picodet_l_backbone = ES

Yonghye Kwon 7 Jul 12, 2022
Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

Li Shengyan 270 Dec 31, 2022
Adds timm pretrained backbone to pytorch's FasterRcnn model

timmFasterRcnn model_config.py -> it returns the model,feat_sizes,output channel and the feat layer names, which is reqd by the Add_FPN.py file Add_FP

Mriganka Nath 12 Dec 3, 2022
A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

A repo to show how to use custom dataset to train s2anet, and change backbone to resnext101

jedibobo 3 Dec 28, 2022
The backbone CSPDarkNet of YOLOX.

YOLOX-Backbone The backbone CSPDarkNet of YOLOX. In this project, you can enjoy: CSPDarkNet-S CSPDarkNet-M CSPDarkNet-L CSPDarkNet-X CSPDarkNet-Tiny C

Jianhua Yang 9 Aug 22, 2022
An example to implement a new backbone with OpenMMLab framework.

Backbone example on OpenMMLab framework English | 简体中文 Introduction This is an template repo about how to use OpenMMLab framework to develop a new bac

Ma Zerun 22 Dec 29, 2022