Official Pytorch implementation for video neural representation (NeRV)

hao

Last update: Dec 28, 2022

Related tags

Deep Learning NeRV

Overview

NeRV: Neural Representations for Videos (NeurIPS 2021)

Project Page | Paper | UVG Data

Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava
This is the official implementation of the paper "NeRV: Neural Representations for Videos ".

Get started

We run with Python 3.8, you can set up a conda environment with all dependencies like so:

pip install -r requirements.txt

High-Level structure

The code is organized as follows:

train_nerv.py includes a generic traiing routine.
model_nerv.py contains the dataloader and neural network architecure
data/ directory video/imae dataset, we provide big buck bunny here
checkpoint/ directory contains some pre-trained model on big buck bunny dataset
log files (tensorboard, txt, state_dict etc.) will be saved in output directory (specified by --outf)

Reproducing experiments

Training experiments

The NeRV-S experiment on 'big buck bunny' can be reproduced with

python train_nerv.py -e 300 --cycles 1  --lower-width 96 --num-blocks 1 --dataset bunny --frame_gap 1 \
    --outf bunny_ab --embed 1.25_40 --stem_dim_num 512_1  --reduction 2  --fc_hw_dim 9_16_26 --expansion 1  \
    --single_res --loss Fusion6   --warmup 0.2 --lr_type cosine  --strides 5 2 2 2 2  --conv_type conv \
    -b 1  --lr 0.0005 --norm none --act swish

Evaluation experiments

To evaluate pre-trained model, just add --eval_Only and specify model path with --weight, you can specify model quantization with --quant_bit [bit_lenght], yuo can test decoding speed with --eval_fps, below we preovide sample commends for NeRV-S on bunny dataset

python train_nerv.py -e 300 --cycles 1  --lower-width 96 --num-blocks 1 --dataset bunny --frame_gap 1 \
    --outf bunny_ab --embed 1.25_40 --stem_dim_num 512_1  --reduction 2  --fc_hw_dim 9_16_26 --expansion 1  \
    --single_res --loss Fusion6   --warmup 0.2 --lr_type cosine  --strides 5 2 2 2 2  --conv_type conv \
    -b 1  --lr 0.0005 --norm none  --act swish \
    --weight checkpoints/nerv_S.pth --eval_only

Dump predictions with pre-trained model

To evaluate pre-trained model, just add --eval_Only and specify model path with --weight

python train_nerv.py -e 300 --cycles 1  --lower-width 96 --num-blocks 1 --dataset bunny --frame_gap 1 \
    --outf bunny_ab --embed 1.25_40 --stem_dim_num 512_1  --reduction 2  --fc_hw_dim 9_16_26 --expansion 1  \
    --single_res --loss Fusion6   --warmup 0.2 --lr_type cosine  --strides 5 2 2 2 2  --conv_type conv \
    -b 1  --lr 0.0005 --norm none  --act swish \
   --weight checkpoints/nerv_S.pth --eval_only  --dump_images

Citation

If you find our work useful in your research, please cite:

@inproceedings{hao2021nerv,
    author = {Hao Chen, Bo He, Hanyu Wang, Yixuan Ren, Ser-Nam Lim, Abhinav Shrivastava },
    title = {NeRV: Neural Representations for Videos s},
    booktitle = {NeurIPS},
    year={2021}
}

Contact

If you have any questions, please feel free to email the authors.

Comments

UVG dataset experiment options
Thanks for sharing your research.

I'm trying to reproduce the Figure 7 graph of your paper(PSNR vs. BPP on UVG dataset), but I couldn’t find the appropriate experiment options. Could you tell me (C1,C2) for that result? (among Appendix A.1’s values)

Other options I've tried so far are as follows.

Learning rate: op1. 5e-4 (paper 4.1) op2. 5e-4 x 6 (linear rule /w batch size 6)

Up-scale factor: 5,3,2,2,2 (paper 4.1)

Train epochs: 1500 epochs (paper 4.1)

Warmup epochs: op1. 300 epochs (train code's default. train epochs * 0.2) op2. 30 epochs (paper 4.1)
opened by applezoos 5
Missing UVG video identifiers

Hello, thanks again for sharing the results of your research!

I would like to make use of the results listed in psnr_bpp_results.csv for comparison, but I can't figure out which video relates to each line in the CSV file. The UVG dataset itself is bigger than the amount of results (7) listed in the first half.

Could you please add the video names in the first column? Thanks.

opened by aegroto 3
About weight pruning and entropy coding

https://github.com/haochen-rye/NeRV/blob/adf61b81fc192c64d2de7b93745b28ff1cf33a39/train_nerv.py#L442

When compressing the network's weights by Huffman codes, I confirmed that zero values are excepted. In this case, we can not know the position of pruned weights when reconstructing the model weights. I think additional information(such as indices of pruned weight) is required.

Can you explain the details?

Thank you for sharing the brilliant work!

opened by maincold2 3
UVG dataset reproduce

Hello, I have one question about reproducing the results, especially UVG.

To train NeRV for UVG dataset, I set the command as follows:

python train_nerv.py -e 150 --lower-width 96 --num-blocks 1 --dataset PATH --frame_gap 1 --outf bunny_ab --embed 1.25_80 --stem_dim_num 512_1 --reduction 2 --fc_hw_dim 9_16_112 --expansion 1 --single_res --loss Fusion6 --warmup 0.2 --lr_type cosine --strides 5 3 2 2 2 --conv_type conv -b 1 --lr 0.0005 --norm none --act gelu

Is there any suggestion for accurate reproduced results?

Thank you :)

opened by subin-kim-cv 3
Distrotion-Compression result
Hello, I really appreciate your impressive work.

I have one question about calculating bits per pixel.

You mentioned that bpp is calculated as follows: Model_Parameter∗(1−Prune_Ratio)∗Quant_Bit/Pixel_Num

Here, Pixel_Num means what? For example, if we have a video, which has 100 frames of 720x1280 resolution.

number of frames * width * height (100x720x1280)

width * height (720x1280)

Thanks.
opened by subin-kim-cv 2

RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR

I was trying to run the training script & I faced this error.


Use GPU: None for training
waiting
=> No resume checkpoint found at 'output/bunny_ab/bunny/embed1.25_40_512_1_fc_9_16_26__exp1.0_reduce2_low96_blk1_cycle1_gap1_e300_warm60_b1_conv_lr0.0005_cosine_Fusion6_Strd5,2,2,2,2_SinRes_actswish_/model_latest.pth'
Traceback (most recent call last):
  File "/home/sparsh/event_fit/NeRV/train_nerv.py", line 532, in <module>
    main()
  File "/home/sparsh/event_fit/NeRV/train_nerv.py", line 141, in main
    train(None, args)
  File "/home/sparsh/event_fit/NeRV/train_nerv.py", line 342, in train
    loss_sum.backward()
  File "/home/sparsh/anaconda3/envs/nerv/lib/python3.9/site-packages/torch/_tensor.py", line 307, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "/home/sparsh/anaconda3/envs/nerv/lib/python3.9/site-packages/torch/autograd/__init__.py", line 154, in backward
    Variable._execution_engine.run_backward(
RuntimeError: cuDNN error: CUDNN_STATUS_INTERNAL_ERROR
You can try to repro this exception using the following code snippet. If that doesn't trigger the error, please include your original repro script when reporting this issue.

import torch
torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.benchmark = True
torch.backends.cudnn.deterministic = False
torch.backends.cudnn.allow_tf32 = True
data = torch.randn([1, 96, 360, 640], dtype=torch.float, device='cuda', requires_grad=True)
net = torch.nn.Conv2d(96, 384, kernel_size=[3, 3], padding=[1, 1], stride=[1, 1], dilation=[1, 1], groups=1)
net = net.cuda().float()
out = net(data)
out.backward(torch.randn_like(out))
torch.cuda.synchronize()

ConvolutionParams 
    data_type = CUDNN_DATA_FLOAT
    padding = [1, 1, 0]
    stride = [1, 1, 0]
    dilation = [1, 1, 0]
    groups = 1
    deterministic = false
    allow_tf32 = true
input: TensorDescriptor 0x7fa204013f00
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 1, 96, 360, 640, 
    strideA = 22118400, 230400, 640, 1, 
output: TensorDescriptor 0x7fa204014420
    type = CUDNN_DATA_FLOAT
    nbDims = 4
    dimA = 1, 384, 360, 640, 
    strideA = 88473600, 230400, 640, 1, 
weight: FilterDescriptor 0x7fa204007fa0
    type = CUDNN_DATA_FLOAT
    tensor_format = CUDNN_TENSOR_NCHW
    nbDims = 4
    dimA = 384, 96, 3, 3, 
Pointer addresses: 
    input: 0x585de6000
    output: 0x58b246000
    weight: 0x50f18c000

As suggested, I ran the code snippet & the error is reproduced.

P.S: I had to make some changes to the conda env. I am using a machine with RTX3060. Before making the changes, it gave me the following error.

NVIDIA GeForce RTX 3060 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70.
If you want to use the NVIDIA GeForce RTX 3060 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

So, I installed the conda pytorch package from the official website. My current env looks like this

pytorch                   1.10.2          py3.9_cuda11.3_cudnn8.2.0_0    pytorch
pytorch-msssim            0.2.1                    pypi_0    pypi
pytorch-mutex             1.0                        cuda    pytorch
torchvision               0.11.3               py39_cu113    pytorch
cudatoolkit               11.3.1               h2bc3f7f_2

However, the GPU is still not being used for training (as shown in the first line of the first code snippet in this post).

P.P.S: Might be irrelevant, but the model is loading into the GPU memory (confirmed by nvidia-smi).

Thanks for the help!

opened by sparsh-b 2

Lack of instructions for decoding

Hello,

I would like to thank you for sharing your work, it's a very interesting concept and I can see a lot of promising research on the subject to be done in the future.

I've got a doubt about the decoding part, is there already a way to convert the resulting neural network back to a sequence of frames?

opened by aegroto 2
Some problems when processing with UVG dataset

Hi, thanks for you impressive work. I'm trying to reproduce your work, but I failed to convert 7 UVG videos into png files and put them into one folder. If possible , can you share command How to merge multiple y4m files into one file.

opened by maoqingyu1996 1
Possible mistake in ReadMe

Hi,

I think I may have found some mistake in the Readme (or in the paper, but I believe it is just the readme)

From the paper, I see that you choose to prune away 40% of the weights.

From the code, the prune_ratio parameter seems to mean the proportion of weights that are kept, as 1-prune_ratio is passed to the pytorch prune function as the amount parameter.

In the final section of the ReadME, you calculate bpp using 1 - prune_ratio however. Should this not be just prune_ratio? or 1-model_sparsity.

Furthermore, am I correct in thinking the value of the prune_ratio parameter to replicate the results in the paper should be 0.6, unlike the 0.4 in the ReadMe?

Thanks for any clarifications.

opened by CarlosGomes98 1
Interpolating between two frames

hi,

I'm interested in interpolating between frames. In the appendix A.4 of the paper there is a figure where you interpolate between two seen frames. I tried to reproduced that with your pretrained model but there are strong artifacts in the result:

The only thing i changed was embed_input = pe(norm_idx+(1/132)*0.5) in train_nerv.py l. 484. Is there something I didn't notice for reproducing the interpolation? I would be glad for any hint.

Its a great work and i saw there is a follow up paper in review, that seems to focus that problem too, right? Are there any planes for a approx. release date?

opened by Alpe6825 1
Some questions about the experimental details.

This is an impressive job. I try to reproduce the results in the paper. I found that there are two parameters that control the compression ratio, quantization (bit length) and pruning ratio. I was wondering how did you get the curves for the BDBR? Looking forward to your reply.

opened by JXH-SHU 1

Official Pytorch implementation for video neural representation (NeRV)

Related tags

Overview

NeRV: Neural Representations for Videos (NeurIPS 2021)

Project Page | Paper | UVG Data

Get started

High-Level structure

Reproducing experiments

Training experiments

Evaluation experiments

Dump predictions with pre-trained model

Citation

Contact

Comments

Owner

hao

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

This is the official pytorch implementation for the paper: Instance Similarity Learning for Unsupervised Feature Representation.

An official PyTorch implementation of the TKDE paper "Self-Supervised Graph Representation Learning via Topology Transformations".

[SIGIR22] Official PyTorch implementation for "CORE: Simple and Effective Session-based Recommendation within Consistent Representation Space".

Official Pytorch implementation of 6DRepNet: 6D Rotation representation for unconstrained head pose estimation.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Pytorch implementation of our method for high-resolution (e.g. 2048x1024) photorealistic video-to-video translation.

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

Official PyTorch code of Holistic 3D Scene Understanding from a Single Image with Implicit Representation (CVPR 2021)

The official PyTorch code for 'DER: Dynamically Expandable Representation for Class Incremental Learning' accepted by CVPR2021

Generative Query Network (GQN) in PyTorch as described in "Neural Scene Representation and Rendering"

Ἀνατομή is a PyTorch library to analyze representation of neural networks

The project is an official implementation of our CVPR2019 paper "Deep High-Resolution Representation Learning for Human Pose Estimation"

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

This is the official implementation for "Do Transformers Really Perform Bad for Graph Representation?".

The official implementation of CSG-Stump: A Learning Friendly CSG-Like Representation for Interpretable Shape Parsing

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

(CVPR 2022 Oral) Official implementation for "Surface Representation for Point Clouds"