Automatic 2D-to-3D Video Conversion with CNNs

Related tags

Deep Learning deep3d
Overview

Deep3D: Automatic 2D-to-3D Video Conversion with CNNs

How To Run

To run this code. Please install MXNet following the official document. Deep3D requires MXNet to be built with Cuda 7.0 and Cudnn 4 or above. Please open mxnet/config.mk and set USE_CUDA and USE_CUDNN to 1. Then, append EXTRA_OPERATORS=path/to/deep3d/operators to path/to/mxnet/config.mk and recompile MXNet.

alt text

Motivation

Since the debut of Avatar in 2008, 3D movies has rapidly developed into mainstream technology. Roughly 10 to 20 3D movies are produced each year and the launch of Oculus Rift and other VR head set is only going to drive up the demand.

Producing 3D movies, however, is still hard. There are two ways of doing this and in practice they are about equally popular: shooting with a special 3D camera or shooting in 2D and manually convert to 3D. But 3D cameras are expensive and unwieldy while manual conversion involves an army of "depth artists" who sit there and draw depth maps for each frame.

Wouldn't it be cool if 2D-to-3D conversion can be done automatically, if you can take a 3D selfie with an ordinary phone?

Teaser

In case you are already getting sleepy, here are some cool 3D images converted from 2D ones by Deep3D. Normally you need 3D glasses or VR display to watch 3D images, but since most readers won't have these we show the 3D images as GIFs.

alt text alt text alt text alt text alt text alt text alt text alt text

Method

3D imagery has two views, one for the left eye and the other for the right. To convert an 2D image to 3D, you need to first estimate the distance from camera for each pixel (a.k.a depth map) and then wrap the image based on its depth map to create two views.

The difficult step is estimating the depth map. For automatic conversion, we would like to learn a model for it. There are several works on depth estimation from single 2D image with DNNs. However, they need to be trained on image-depth pairs which are hard to collect. As a result they can only use small datasets with a few hundred examples like NYU Depth and KITTI. Moreover, these datasets only has static scenes and it's hard to imagine they will generalize to photos with people in them.

In Contrast, Deep3D can be trained directly on 3D movies that have tens of millions frames in total. We do this by making the depth map an internal representation instead of the end prediction. Thus, instead of predicting an depth map and then use it to recreate the missing view with a separate algorithm, we train depth estimation and recreate end-to-end in the same neural network.

Here are some visualizations of our internal depth representation to help you understand how it works:

alt text alt text alt text alt text alt text alt text alt text alt text alt text

Following each image, there are 4-by-3 maps of depth layers, ordered from near to far. You can see that objects that are near to you appear in the first depth maps and objects that are far away appear in the last ones. This shows that the internal depth representation is learning to infer depth from 2D images without been directly trained on it.

Code

This work is done with MXNet, a flexible and efficient deep learning package. The trained model and a prediction script is in deep3d.ipynb. We will release the code for training shortly.

Comments
  • Cannot find Operator CuDNNBatchNorm in registry

    Cannot find Operator CuDNNBatchNorm in registry

    [12:38:08] /root/mxnet/dmlc-core/include/dmlc/logging.h:245: [12:38:08] src/operator/operator.cc:19: Cannot find Operator CuDNNBatchNorm in registry Traceback (most recent call last): File "myMain.py", line 23, in model = mx.model.FeedForward.load('deep3d', 50, mx.gpu(0)) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.5.0-py2.7.egg/mxnet/model.py", line 822, in load symbol, arg_params, aux_params = load_checkpoint(prefix, epoch) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.5.0-py2.7.egg/mxnet/model.py", line 362, in load_checkpoint symbol = sym.load('%s-symbol.json' % prefix) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.5.0-py2.7.egg/mxnet/symbol.py", line 886, in load check_call(_LIB.MXSymbolCreateFromFile(c_str(fname), ctypes.byref(handle))) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.5.0-py2.7.egg/mxnet/base.py", line 77, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: Failed loading Op bn_pool1 of type CuDNNBatchNorm: [12:38:08] src/operator/operator.cc:19: Cannot find Operator CuDNNBatchNorm in registry

    cuda7.5 cudnn4 running by official mxnet docker gpu image

    Thank you piiswrong

    opened by tzatter 11
  • Erro while training

    Erro while training

    I had create a database using parse.py and data.py, but when I run train.py I got this error:

    $ python train.py [01:08:37] include/dmlc/logging.h:235: [01:08:37] src/io/local_filesys.cc:149: Check failed: allow_null LocalFileSystem: fail to open "vgg16-0001.params" Traceback (most recent call last): File "train.py", line 92, in <module> train(64, 'exp/deep3d') File "train.py", line 65, in train vgg16 = data.load_vgg(data_frames, flow_frames, two_stream=False) File "/home/salim/deep3d/data.py", line 443, in load_vgg vgg16 = {name: arr for name, arr in mx.nd.load('vgg16-0001.params').items() if name.startswith('arg:conv')} File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 956, in load ctypes.byref(names))) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/base.py", line 77, in check_call raise MXNetError(py_str(_LIB.MXGetLastError())) mxnet.base.MXNetError: [01:08:37] src/io/local_filesys.cc:149: Check failed: allow_null LocalFileSystem: fail to open "vgg16-0001.params"

    What file is that vgg16-0001.params???? What have to do make it work? Thanks.

    opened by zedomel 6
  • How to output the  4-by-3 maps from the depth layers?

    How to output the 4-by-3 maps from the depth layers?

    Can you give me some codes about that? My ability is so poor that I can't find the way to output these maps. I try to output these maps from the deconvolution which is behind some pooling layer, but I failed, I can't find the way to output some values such as pred1, pred2, and so on in sym.py from the module. I just want the method or codes of the output of the 4-by-3 maps from the depth layers in your paper. Can you help me? Thank you.

    opened by liyongrui 4
  • I want images larger than 384x160

    I want images larger than 384x160

    I have got an error when I used raw_shape instead of shape as below code

    shape = (384, 160) img = cv2.imread('demo.jpg') raw_shape = (img.shape[1], img.shape[0])

    img = cv2.resize(img, shape)

    img = cv2.resize(img, raw_shape)

    Thank you piiswrong

    opened by tzatter 2
  • codec problem

    codec problem

    When I run the code to the place(in data.py next member function for class Mov3dStack): for j in range(max(1, self.data_frames)): sl = txn.get('%09d'%(idx+(j-self.data_frames/2)_self.stride), db=self.ldb) if sl is None: pass else: _, s = mx.recordio.unpack(sl) mx.nd.imdecode(s, clip_rect=(p[0], p[1], p[0] + self.data_shape[0], p[1] +self.data_shape[1]), out=ndleft, index=i_self.data_frames+j, channels=3, mean=self.left_mean_nd)

    Then error thrown out as : File "/media/lqzhu/e/deep3d-master/data.py", line 215, in load_mean for batch in data_iter: File "/media/lqzhu/e/deep3d-master/data.py", line 392, in next out=ndleft, index=i*self.data_frames+j, channels=3, mean=self.left_mean_nd) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 1034, in imdecode out=out) File "/usr/local/lib/python2.7/dist-packages/mxnet-0.7.0-py2.7.egg/mxnet/ndarray.py", line 1172, in generic_ndarray_function c_array(ctypes.c_char_p, [str(i).encode('ascii') for i in kwargs.values()]))) UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

    I dig in the web and no useful dips can fix this problem, I've tried adding below lines: import sys reload(sys) sys.setdefaultencoding('utf8')

    Still doesn't work. Any suggestion?

    opened by Greenleaf88 1
  • AttributeError: 'module' object has no attribute 'StereoSGBM_MODE_HH

    AttributeError: 'module' object has no attribute 'StereoSGBM_MODE_HH

    I ran the convert_movie.py file and it generateed this error. It seems that it is because of the version of opencv I used. Can you please check which opencv version are you using? I am using v2.4.8. Thanks a lot!

    opened by lanzhzh 1
  • how the parameter prefix to be given?

    how the parameter prefix to be given?

    @piiswrong I use the parse.py and data.py to create my own database. I just give one prefix. But when I run the data.py, it needs to get the prefix_list,. How can I solve it ? Thanks.

    opened by liutianling 0
  • Memory Usage

    Memory Usage

    I was wondering how much VRAM this network requires for the batchsize 64? The reason I ask is that I am working on porting this to tensorflow for a class project, and am currently restricted to a batch size of 32 due to memory issues on a K80 with ~11.2 gb available VRAM.

    Also, were all the operations performed on GPU in this mxnet implementation? Any information you could provide on memory usage would be much appreciated!

    opened by jhoh10 0
  • MXNetError: src/operator/cudnn_batch_norm.cc:20: CuDNNBatchNorm is merged into BatchNorm for cudnn version above v5.Use the later instead.

    MXNetError: src/operator/cudnn_batch_norm.cc:20: CuDNNBatchNorm is merged into BatchNorm for cudnn version above v5.Use the later instead.

    MXNetError: [16:38:44] src/operator/cudnn_batch_norm.cc:20: CuDNNBatchNorm is merged into BatchNorm for cudnn version above v5.Use the later instead.

    any working solution? or patch?

    opened by motypas 0
  • What is the right working configuration under Ubuntu?

    What is the right working configuration under Ubuntu?

    Hi,

    anyone can tell me the right, TESTED and WORKING configuration under Ubuntu?

    This config is work??? Ubuntu 16.04LTS -64bit Cuda 8 DNN 5.1 Nvidia 1080-8gb mxnet from git deep3d from git

    Best Moty

    opened by motypas 0
  •  AttributeError

    AttributeError

    Could anyone please help with fixing the following import error when trying to run urllib.urlretrieve('http://homes.cs.washington.edu/~jxie/download/deep3d-0050.params', 'deep3d-0050.params')? Thanks a lot.

    ----> 2 urllib.urlretrieve('http://homes.cs.washington.edu/~jxie/download/deep3d-0050.params', 'deep3d-0050.params') 3 model = mx.model.FeedForward.load('deep3d', 50, mx.gpu(0))

    AttributeError: module 'urllib' has no attribute 'urlretrieve'
    
    opened by OliveS9 1
  • Tensorflow Re-Implementation

    Tensorflow Re-Implementation

    Hello, I'm currently implementing your great in TF using the official params but the output is so bad and i don't know why ? Could you please give me any advice to make the output better? input> image output> output

    opened by MahmoudSelmy 3
  • Can't find the params

    Can't find the params

    Hello,I can't find these: http://homes.cs.washington.edu/~jxie/download/deep3d-0050.params && eep3d-0050.params I don't know what's the matter. can you send me your params? Tank you ! mail to : [email protected]

    opened by zmHaiNan 5
Owner
Eric Junyuan Xie
Software Engineer @ Bytedance
Eric Junyuan Xie
Automatic self-diagnosis program (python required)Automatic self-diagnosis program (python required)

auto-self-checker 자동으로 자가진단 해주는 프로그램(python 필요) 중요 이 프로그램이 실행될때에는 절대로 마우스포인터를 움직이거나 키보드를 건드리면 안된다(화면인식, 마우스포인터로 직접 클릭) 사용법 프로그램을 구동할 폴더 내의 cmd창에서 pip

null 1 Dec 30, 2021
Spherical CNNs

Spherical CNNs Equivariant CNNs for the sphere and SO(3) implemented in PyTorch Overview This library contains a PyTorch implementation of the rotatio

Jonas Köhler 893 Dec 28, 2022
Study of human inductive biases in CNNs and Transformers.

Are Convolutional Neural Networks or Transformers more like human vision? This repository contains the code and fine-tuned models of popular Convoluti

Shikhar Tuli 39 Dec 8, 2022
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

Jacob Gildenblat 6.6k Jan 6, 2023
CNNs for Sentence Classification in PyTorch

Introduction This is the implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in PyTorch. Kim's implementation of t

Shawn Ng 956 Dec 19, 2022
Training RNNs as Fast as CNNs

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

ASAPP Research 2.1k Jan 1, 2023
GAN-generated image detection based on CNNs

GAN-image-detection This repository contains a GAN-generated image detector developed to distinguish real images from synthetic ones. The detector is

Image and Sound Processing Lab 17 Dec 15, 2022
VOneNet: CNNs with a Primary Visual Cortex Front-End

VOneNet: CNNs with a Primary Visual Cortex Front-End A family of biologically-inspired Convolutional Neural Networks (CNNs). VOneNets have the followi

The DiCarlo Lab at MIT 99 Dec 22, 2022
Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Equivariant CNNs for the sphere and SO(3) implemented in PyTorch

Jonas Köhler 893 Dec 28, 2022
It's a implement of this paper:Relation extraction via Multi-Level attention CNNs

Relation Classification via Multi-Level Attention CNNs It's a implement of this paper:Relation Classification via Multi-Level Attention CNNs. Training

Aybss 2 Nov 4, 2022
This repository contains the source code of our work on designing efficient CNNs for computer vision

Efficient networks for Computer Vision This repo contains source code of our work on designing efficient networks for different computer vision tasks:

Sachin Mehta 386 Nov 26, 2022
A light weight data augmentation tool for training CNNs and Viola Jones detectors

hey-daug A light weight data augmentation tool for training CNNs and Viola Jones detectors (Haar Cascades). This tool inflates your data by up to six

Jaiyam Sharma 2 Nov 23, 2019
Pytorch Implementations of large number classical backbone CNNs, data enhancement, torch loss, attention, visualization and some common algorithms.

Torch-template-for-deep-learning Pytorch implementations of some **classical backbone CNNs, data enhancement, torch loss, attention, visualization and

Li Shengyan 270 Dec 31, 2022
This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le

Firas Laakom 5 Jul 8, 2022
[CVPRW 2022] Attentions Help CNNs See Better: Attention-based Hybrid Image Quality Assessment Network

Attention Helps CNN See Better: Hybrid Image Quality Assessment Network [CVPRW 2022] Code for Hybrid Image Quality Assessment Network [paper] [code] T

IIGROUP 49 Dec 11, 2022
📦 PyTorch based visualization package for generating layer-wise explanations for CNNs.

Explainable CNNs ?? Flexible visualization package for generating layer-wise explanations for CNNs. It is a common notion that a Deep Learning model i

Ashutosh Hathidara 183 Dec 15, 2022
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

null 1 Jan 23, 2022
MMdnn is a set of tools to help users inter-operate among different deep learning frameworks. E.g. model conversion and visualization. Convert models between Caffe, Keras, MXNet, Tensorflow, CNTK, PyTorch Onnx and CoreML.

MMdnn MMdnn is a comprehensive and cross-framework tool to convert, visualize and diagnose deep learning (DL) models. The "MM" stands for model manage

Microsoft 5.7k Jan 9, 2023
Voice Conversion by CycleGAN (语音克隆/语音转换):CycleGAN-VC3

CycleGAN-VC3-PyTorch 中文说明 | English This code is a PyTorch implementation for paper: CycleGAN-VC3: Examining and Improving CycleGAN-VCs for Mel-spectr

Kun Ma 110 Dec 24, 2022