Automatic 2D-to-3D Video Conversion with CNNs

Eric Junyuan Xie

Last update: Dec 30, 2022

Related tags

Deep Learning deep3d

Overview

Deep3D: Automatic 2D-to-3D Video Conversion with CNNs

How To Run

To run this code. Please install MXNet following the official document. Deep3D requires MXNet to be built with Cuda 7.0 and Cudnn 4 or above. Please open mxnet/config.mk and set USE_CUDA and USE_CUDNN to 1. Then, append EXTRA_OPERATORS=path/to/deep3d/operators to path/to/mxnet/config.mk and recompile MXNet.

Motivation

Since the debut of Avatar in 2008, 3D movies has rapidly developed into mainstream technology. Roughly 10 to 20 3D movies are produced each year and the launch of Oculus Rift and other VR head set is only going to drive up the demand.

Producing 3D movies, however, is still hard. There are two ways of doing this and in practice they are about equally popular: shooting with a special 3D camera or shooting in 2D and manually convert to 3D. But 3D cameras are expensive and unwieldy while manual conversion involves an army of "depth artists" who sit there and draw depth maps for each frame.

Wouldn't it be cool if 2D-to-3D conversion can be done automatically, if you can take a 3D selfie with an ordinary phone?

Teaser

In case you are already getting sleepy, here are some cool 3D images converted from 2D ones by Deep3D. Normally you need 3D glasses or VR display to watch 3D images, but since most readers won't have these we show the 3D images as GIFs.

Method

3D imagery has two views, one for the left eye and the other for the right. To convert an 2D image to 3D, you need to first estimate the distance from camera for each pixel (a.k.a depth map) and then wrap the image based on its depth map to create two views.

The difficult step is estimating the depth map. For automatic conversion, we would like to learn a model for it. There are several works on depth estimation from single 2D image with DNNs. However, they need to be trained on image-depth pairs which are hard to collect. As a result they can only use small datasets with a few hundred examples like NYU Depth and KITTI. Moreover, these datasets only has static scenes and it's hard to imagine they will generalize to photos with people in them.

In Contrast, Deep3D can be trained directly on 3D movies that have tens of millions frames in total. We do this by making the depth map an internal representation instead of the end prediction. Thus, instead of predicting an depth map and then use it to recreate the missing view with a separate algorithm, we train depth estimation and recreate end-to-end in the same neural network.

Here are some visualizations of our internal depth representation to help you understand how it works:

Following each image, there are 4-by-3 maps of depth layers, ordered from near to far. You can see that objects that are near to you appear in the first depth maps and objects that are far away appear in the last ones. This shows that the internal depth representation is learning to infer depth from 2D images without been directly trained on it.

Code

This work is done with MXNet, a flexible and efficient deep learning package. The trained model and a prediction script is in deep3d.ipynb. We will release the code for training shortly.

Comments

Cannot find Operator CuDNNBatchNorm in registry

[12:38:08] /root/mxnet/dmlc-core/include/dmlc/logging.h:245: [12:38:08] src/operator/operator.cc:19: Cannot find Operator CuDNNBatchNorm in registry Traceback (most recent call last): File "myMain.py", line 23, in model = mx.model.FeedForward.load('deep3d', 50, mx.gpu(0)) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.5.0-py2.7.egg/mxnet/model.py", line 822, in load symbol, arg_params, aux_params = load_checkpoint(prefix, epoch) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.5.0-py2.7.egg/mxnet/model.py", line 362, in load_checkpoint symbol = sym.load('%s-symbol.json' % prefix) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.5.0-py2.7.egg/mxnet/symbol.py", line 886, in load check_call(_LIB.MXSymbolCreateFromFile(c_str(fname), ctypes.byref(handle))) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.5.0-py2.7.egg/mxnet/base.py", line 77, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: Failed loading Op bn_pool1 of type CuDNNBatchNorm: [12:38:08] src/operator/operator.cc:19: Cannot find Operator CuDNNBatchNorm in registry

cuda7.5 cudnn4 running by official mxnet docker gpu image

Thank you piiswrong

opened by tzatter 11
Erro while training

I had create a database using parse.py and data.py, but when I run train.py I got this error:

$ python train.py [01:08:37] include/dmlc/logging.h:235: [01:08:37] src/io/local_filesys.cc:149: Check failed: allow_null LocalFileSystem: fail to open "vgg16-0001.params" Traceback (most recent call last): File "train.py", line 92, in <module> train(64, 'exp/deep3d') File "train.py", line 65, in train vgg16 = data.load_vgg(data_frames, flow_frames, two_stream=False) File "/home/salim/deep3d/data.py", line 443, in load_vgg vgg16 = {name: arr for name, arr in mx.nd.load('vgg16-0001.params').items() if name.startswith('arg:conv')} File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 956, in load ctypes.byref(names))) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/base.py", line 77, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [01:08:37] src/io/local_filesys.cc:149: Check failed: allow_null LocalFileSystem: fail to open "vgg16-0001.params"

What file is that vgg16-0001.params???? What have to do make it work? Thanks.

opened by zedomel 6
How to output the 4-by-3 maps from the depth layers?

Can you give me some codes about that? My ability is so poor that I can't find the way to output these maps. I try to output these maps from the deconvolution which is behind some pooling layer, but I failed, I can't find the way to output some values such as pred1, pred2, and so on in sym.py from the module. I just want the method or codes of the output of the 4-by-3 maps from the depth layers in your paper. Can you help me? Thank you.

opened by liyongrui 4
I want images larger than 384x160

I have got an error when I used raw_shape instead of shape as below code

shape = (384, 160) img = cv2.imread('demo.jpg') raw_shape = (img.shape[1], img.shape[0])

img = cv2.resize(img, shape)

img = cv2.resize(img, raw_shape)

Thank you piiswrong

opened by tzatter 2
codec problem

When I run the code to the place(in data.py next member function for class Mov3dStack): for j in range(max(1, self.data_frames)): sl = txn.get('%09d'%(idx+(j-self.data_frames/2)_self.stride), db=self.ldb) if sl is None: pass else: _, s = mx.recordio.unpack(sl) mx.nd.imdecode(s, clip_rect=(p[0], p[1], p[0] + self.data_shape[0], p[1] +self.data_shape[1]), out=ndleft, index=i_self.data_frames+j, channels=3, mean=self.left_mean_nd)

Then error thrown out as : File "/media/lqzhu/e/deep3d-master/data.py", line 215, in load_mean for batch in data_iter: File "/media/lqzhu/e/deep3d-master/data.py", line 392, in next out=ndleft, index=i*self.data_frames+j, channels=3, mean=self.left_mean_nd) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 1034, in imdecode out=out) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 1172, in generic_ndarray_function c_array(ctypes.c_char_p, [str(i).encode('ascii') for i in kwargs.values()]))) UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

I dig in the web and no useful dips can fix this problem, I've tried adding below lines: import sys reload(sys) sys.setdefaultencoding('utf8')

Still doesn't work. Any suggestion?

opened by Greenleaf88 1
AttributeError: 'module' object has no attribute 'StereoSGBM_MODE_HH

I ran the convert_movie.py file and it generateed this error. It seems that it is because of the version of opencv I used. Can you please check which opencv version are you using? I am using v2.4.8. Thanks a lot!

opened by lanzhzh 1
how the parameter prefix to be given?

@piiswrong I use the parse.py and data.py to create my own database. I just give one prefix. But when I run the data.py， it needs to get the prefix_list,. How can I solve it ? Thanks.

opened by liutianling 0
Memory Usage

I was wondering how much VRAM this network requires for the batchsize 64? The reason I ask is that I am working on porting this to tensorflow for a class project, and am currently restricted to a batch size of 32 due to memory issues on a K80 with ~11.2 gb available VRAM.

Also, were all the operations performed on GPU in this mxnet implementation? Any information you could provide on memory usage would be much appreciated!

opened by jhoh10 0
MXNetError: src/operator/cudnn_batch_norm.cc:20: CuDNNBatchNorm is merged into BatchNorm for cudnn version above v5.Use the later instead.

MXNetError: [16:38:44] src/operator/cudnn_batch_norm.cc:20: CuDNNBatchNorm is merged into BatchNorm for cudnn version above v5.Use the later instead.

any working solution? or patch?

opened by motypas 0
What is the right working configuration under Ubuntu?

Hi,

anyone can tell me the right, TESTED and WORKING configuration under Ubuntu?

This config is work??? Ubuntu 16.04LTS -64bit Cuda 8 DNN 5.1 Nvidia 1080-8gb mxnet from git deep3d from git

Best Moty

opened by motypas 0
AttributeError
Could anyone please help with fixing the following import error when trying to run urllib.urlretrieve('http://homes.cs.washington.edu/~jxie/download/deep3d-0050.params', 'deep3d-0050.params')? Thanks a lot.

----> 2 urllib.urlretrieve('http://homes.cs.washington.edu/~jxie/download/deep3d-0050.params', 'deep3d-0050.params') 3 model = mx.model.FeedForward.load('deep3d', 50, mx.gpu(0))

AttributeError: module 'urllib' has no attribute 'urlretrieve'
opened by OliveS9 1
Tensorflow Re-Implementation

Hello, I'm currently implementing your great in TF using the official params but the output is so bad and i don't know why ? Could you please give me any advice to make the output better? input> output>

opened by MahmoudSelmy 3
Can't find the params

Hello,I can't find these: http://homes.cs.washington.edu/~jxie/download/deep3d-0050.params && eep3d-0050.params I don't know what's the matter. can you send me your params? Tank you ! mail to : [email protected]

opened by zmHaiNan 5

Owner

Eric Junyuan Xie

Software Engineer @ Bytedance

GitHub

Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

auto-self-checker 자동으로 자가진단 해주는 프로그램(python 필요) 중요 이 프로그램이 실행될때에는 절대로 마우스포인터를 움직이거나 키보드를 건드리면 안된다(화면인식, 마우스포인터로 직접 클릭) 사용법 프로그램을 구동할 폴더 내의 cmd창에서 pip

1 Dec 30, 2021

Spherical CNNs

Spherical CNNs Equivariant CNNs for the sphere and SO(3) implemented in PyTorch Overview This library contains a PyTorch implementation of the rotatio

893 Dec 28, 2022

Study of human inductive biases in CNNs and Transformers.

Are Convolutional Neural Networks or Transformers more like human vision? This repository contains the code and fine-tuned models of popular Convoluti

39 Dec 8, 2022

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

6.6k Jan 6, 2023

CNNs for Sentence Classification in PyTorch

Introduction This is the implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in PyTorch. Kim's implementation of t

956 Dec 19, 2022

Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

2.1k Jan 1, 2023

GAN-generated image detection based on CNNs

GAN-image-detection This repository contains a GAN-generated image detector developed to distinguish real images from synthetic ones. The detector is

17 Dec 15, 2022

VOneNet: CNNs with a Primary Visual Cortex Front-End

VOneNet: CNNs with a Primary Visual Cortex Front-End A family of biologically-inspired Convolutional Neural Networks (CNNs). VOneNets have the followi

99 Dec 22, 2022

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

893 Dec 28, 2022

It's a implement of this paper：Relation extraction via Multi-Level attention CNNs

Relation Classification via Multi-Level Attention CNNs It's a implement of this paper：Relation Classification via Multi-Level Attention CNNs. Training

2 Nov 4, 2022

This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

386 Nov 26, 2022

A light weight data augmentation tool for training CNNs and Viola Jones detectors

hey-daug A light weight data augmentation tool for training CNNs and Viola Jones detectors (Haar Cascades). This tool inflates your data by up to six

2 Nov 23, 2019

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

270 Dec 31, 2022

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le

5 Jul 8, 2022

[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

49 Dec 11, 2022

📦 PyTorch based visualization package for generating layer-wise explanations for CNNs.

Explainable CNNs ?? Flexible visualization package for generating layer-wise explanations for CNNs. It is a common notion that a Deep Learning model i

183 Dec 15, 2022

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

1 Jan 23, 2022

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

5.7k Jan 9, 2023

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

110 Dec 24, 2022

Automatic 2D-to-3D Video Conversion with CNNs

Related tags

Overview

Deep3D: Automatic 2D-to-3D Video Conversion with CNNs

How To Run

Motivation

Teaser

Method

Code

Comments

img = cv2.resize(img, shape)

Owner

Eric Junyuan Xie

Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

Spherical CNNs

Study of human inductive biases in CNNs and Transformers.

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

CNNs for Sentence Classification in PyTorch

Training RNNs as Fast as CNNs

GAN-generated image detection based on CNNs

VOneNet: CNNs with a Primary Visual Cortex Front-End

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

It's a implement of this paper：Relation extraction via Multi-Level attention CNNs

This repository contains the source code of our work on designing efficient CNNs for computer vision

A light weight data augmentation tool for training CNNs and Viola Jones detectors

Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

📦 PyTorch based visualization package for generating layer-wise explanations for CNNs.

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

Voice Conversion by CycleGAN (语音克隆/语音转换)：CycleGAN-VC3