Learning to Stylize Novel Views

Overview

Learning to Stylize Novel Views

[Project] [Paper]

Contact: Hsin-Ping Huang ([email protected])

Introduction

We tackle a 3D scene stylization problem - generating stylized images of a scene from arbitrary novel views given a set of images of the same scene and a reference image of the desired style as inputs. Direct solution of combining novel view synthesis and stylization approaches lead to results that are blurry or not consistent across different views. We propose a point cloud-based method for consistent 3D scene stylization. First, we construct the point cloud by back-projecting the image features to the 3D space. Second, we develop point cloud aggregation modules to gather the style information of the 3D scene, and then modulate the features in the point cloud with a linear transformation matrix. Finally, we project the transformed features to 2D space to obtain the novel views. Experimental results on two diverse datasets of real-world scenes validate that our method generates consistent stylized novel view synthesis results against other alternative approaches.

Paper

Learning to Stylize Novel Views
Hsin-Ping Huang, Hung-Yu Tseng, Saurabh Saini, Maneesh Singh, and Ming-Hsuan Yang
IEEE International Conference on Computer Vision (ICCV), 2021

Please cite our paper if you find it useful for your research.

@inproceedings{huang_2021_3d_scene_stylization,
   title = {Learning to Stylize Novel Views},
   author={Huang, Hsin-Ping and Tseng, Hung-Yu and Saini, Saurabh and Singh, Maneesh and Yang, Ming-Hsuan},
   booktitle = {ICCV},
   year={2021}
}

Installation and Usage

Kaggle account

  • To download the WikiArt dataset, you would need to register for a Kaggle account.
  1. Sign up for a Kaggle account at https://www.kaggle.com.
  2. Go to top right and select the 'Account' tab of your user profile (https://www.kaggle.com/username/account)
  3. Select 'Create API Token'. This will trigger the download of kaggle.json.
  4. Place this file in the location ~/.kaggle/kaggle.json
  5. chmod 600 ~/.kaggle/kaggle.json

Install

  • Clone this repo
git clone https://github.com/hhsinping/stylescene.git
cd stylescene
  • Create conda environment and install required packages
  1. Python 3.9
  2. Pytorch 1.7.1, Torchvision 0.8.2, Pytorch-lightning 0.7.1
  3. matplotlib, scikit-image, opencv-python, kaggle
  4. Pointnet2_Pytorch
  5. Pytorch3D 0.4.0
conda create -n stylescene python=3.9.1
conda activate stylescene
pip install torch==1.7.1+cu110 torchvision==0.8.2+cu110 -f https://download.pytorch.org/whl/torch_stable.html
pip install matplotlib==3.4.1 scikit-image==0.18.1 opencv-python==4.5.1.48 pytorch-lightning==0.7.1 kaggle
pip install "git+git://github.com/erikwijmans/Pointnet2_PyTorch.git#egg=pointnet2_ops&subdirectory=pointnet2_ops_lib"
curl -LO https://github.com/NVIDIA/cub/archive/1.10.0.tar.gz
tar xzf 1.10.0.tar.gz
export CUB_HOME=$PWD/cub-1.10.0
git clone https://github.com/facebookresearch/pytorch3d.git
cd pytorch3d
git checkout 340662e
pip install -e .
cd -

Our code has been tested on Ubuntu 20.04, CUDA 11.1 with a RTX 2080 Ti GPU.

Datasets

  • Download datasets, pretrained model, complie C++ code using the following script. This script will:
  1. Download Tanks and Temples dataset
  2. Download continous testing sequences of Truck, M60, Train, Playground scenes
  3. Download 120 testing styles
  4. Download WikiArt dataset from Kaggle
  5. Download pretrained models
  6. Complie the c++ code in preprocess/ext/preprocess/ and stylescene/ext/preprocess/
bash download_data.sh
  • Preprocess Tanks and Temples dataset

This script will generate points.npy and r31.npy for each training and testing scene.
points.npy records the 3D coordinates of the re-projected point cloud and its correspoinding 2D positions in source images
r31.npy contains the extracted VGG features of sources images

cd preprocess
python Get_feat.py
cd ..

Testing example

cd stylescene/exp
vim ../config.py
Set Train = False
Set Test_style = [0-119 (refer to the index of style images in ../../style_data/style120/)]

To evaluate the network you can run

python exp.py --net fixed_vgg16unet3_unet4.64.3 --cmd eval --iter [n_iter/last] --eval-dsets tat-subseq --eval-scale 0.25

Generated images can be found at experiments/tat_nbs5_s0.25_p192_fixed_vgg16unet3_unet4.64.3/tat_subseq_[sequence_name]_0.25_n4/

Training example

cd stylescene/exp
vim ../config.py
Set Train = True

To train the network from scratch you can run

python exp.py --net fixed_vgg16unet3_unet4.64.3 --cmd retrain

To train the network from a checkpoint you can run

python exp.py --net fixed_vgg16unet3_unet4.64.3 --cmd resume

Generated images can be found at ./log
Saved model and training log can be found at experiments/tat_nbs5_s0.25_p192_fixed_vgg16unet3_unet4.64.3/

Acknowledgement

The implementation is partly based on the following projects: Free View Synthesis, Linear Style Transfer, PointNet++, SynSin.

You might also like...
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.
Open source repository for the code accompanying the paper 'Non-Rigid Neural Radiance Fields Reconstruction and Novel View Synthesis of a Deforming Scene from Monocular Video'.

Non-Rigid Neural Radiance Fields This is the official repository for the project "Non-Rigid Neural Radiance Fields: Reconstruction and Novel View Synt

Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.

PyTorch Implementation of Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers 1 Using Colab Please notic

A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The original code is written in keras.

CasRel-pytorch-reimplement Pytorch reimplement of the paper "A Novel Cascade Binary Tagging Framework for Relational Triple Extraction" ACL2020. The o

The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.
The code for the CVPR 2021 paper Neural Deformation Graphs, a novel approach for globally-consistent deformation tracking and 3D reconstruction of non-rigid objects.

Neural Deformation Graphs Project Page | Paper | Video Neural Deformation Graphs for Globally-consistent Non-rigid Reconstruction Aljaž Božič, Pablo P

Episodic Transformer (E.T.) is a novel attention-based architecture for vision-and-language navigation. E.T. is based on a multimodal transformer that encodes language inputs and the full episode history of visual observations and actions. [CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach
[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation
clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation

README clDice - a Novel Topology-Preserving Loss Function for Tubular Structure Segmentation CVPR 2021 Authors: Suprosanna Shit and Johannes C. Paetzo

Analyzing basic network responses to novel classes
Analyzing basic network responses to novel classes

novelty-detection Analyzing how AlexNet responds to novel classes with varying degrees of similarity to pretrained classes from ImageNet. If you find

Comments
  • Missing files

    Missing files

    Hello! I am interested in this work and have downloaded the codes for running. However, when I tried to run the evaluation codes, it turned out that a file had been missing.

    The error is as follows: FileNotFoundError: [Errno 2] No such file or directory: '../../colmap_tat/training/Truck/dense/ibr3d_long/Ks.npy'

    I have found in the dataset and make sure this file is missing in the documents. I wonder if there is any way to generate this file. I only find the generating process in /stylescene/data/create_cumtom_track.py and create_data_pw.py Could you help me with this issue? Many thanks!

    opened by miaodd98 0
Owner
null
A JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short.

BraVe This is a JAX implementation of Broaden Your Views for Self-Supervised Video Learning, or BraVe for short. The model provided in this package wa

DeepMind 44 Nov 20, 2022
Deep Semisupervised Multiview Learning With Increasing Views (IEEE TCYB 2021, PyTorch Code)

Deep Semisupervised Multiview Learning With Increasing Views (ISVN, IEEE TCYB) Peng Hu, Xi Peng, Hongyuan Zhu, Liangli Zhen, Jie Lin, Huaibai Yan, Dez

null 3 Nov 19, 2022
[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning (CVPR 2022 Oral) 2022-03-29: The paper was selected as a CVPR 2022 Oral paper! 2

null 249 Dec 28, 2022
[ICCV 2021 (oral)] Planar Surface Reconstruction from Sparse Views

Planar Surface Reconstruction From Sparse Views Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey University of Michigan ICCV 2021 (Oral) This re

Linyi Jin 89 Jan 5, 2023
Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation

Look Closer: Bridging Egocentric and Third-Person Views with Transformers for Robotic Manipulation Official PyTorch implementation for the paper Look

Rishabh Jangir 20 Nov 24, 2022
Neighborhood Contrastive Learning for Novel Class Discovery

Neighborhood Contrastive Learning for Novel Class Discovery This repository contains the official implementation of our paper: Neighborhood Contrastiv

Zhun Zhong 56 Dec 9, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmentation

Self-Supervised Anomaly Segmentation Intorduction This is a PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmen

WuFan 2 Jan 27, 2022
Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction".

GNN_PPI Codes and models for the paper "Learning Unknown from Correlations: Graph Neural Network for Inter-novel-protein Interaction Prediction". Lear

Ursa Zrimsek 2 Dec 14, 2022
Implementation of "Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis"

Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis Abstract: This work targets at using a general deep lea

null 163 Dec 14, 2022