This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

Meta Research

Last update: Dec 26, 2022

Related tags

Computer Vision StyleNeRF

Overview

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis
Jiatao Gu, Lingjie Liu, Peng Wang, Christian Theobalt

Project Page | Video | Demo | Paper | Data

Abstract: We propose StyleNeRF, a 3D-aware generative model for photo-realistic high-resolution image synthesis with high multi-view consistency, which can be trained on unstructured 2D images. Existing approaches either cannot synthesize high-resolution images with fine details or yield noticeable 3D-inconsistent artifacts. In addition, many of them lack control over style attributes and explicit 3D camera poses. StyleNeRF integrates the neural radiance field (NeRF) into a style-based generator to tackle the aforementioned challenges, i.e., improving rendering efficiency and 3D consistency for high-resolution image generation. We perform volume rendering only to produce a low-resolution feature map and progressively apply upsampling in 2D to address the first issue. To mitigate the inconsistencies caused by 2D upsampling, we propose multiple designs, including a better upsampler and a new regularization loss. With these designs, StyleNeRF can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality. StyleNeRF also enables control of camera poses and different levels of styles, which can generalize to unseen views. It also supports challenging tasks, including zoom-in and-out, style mixing, inversion, and semantic editing.

Requirements

The codebase is tested on

Python 3.7
PyTorch 1.7.1
8 Nvidia GPU (Tesla V100 32GB) with CUDA version 11.0

For additional python libraries, please install by:

pip install -r requirements.txt

Please refer to https://github.com/NVlabs/stylegan2-ada-pytorch for additional software/hardware requirements.

Dataset

We follow the same dataset format as StyleGAN2-ADA supported, which can be either an image folder, or a zipped file.

Pretrained Checkpoints

You can download the pre-trained checkpoints (used in our paper) and some recent variants trained with current codebase as follows:

Dataset	Resolution	#Params(M)	Config	Download
FFHQ	256	128	Default	Hugging Face 🤗
FFHQ	512	148	Default	Hugging Face 🤗
FFHQ	1024	184	Default	Hugging Face 🤗

（I am slowly adding more checkpoints. Thanks for your very kind patience!)

Train a new StyleNeRF model

python run_train.py outdir=${OUTDIR} data=${DATASET} spec=paper512 model=stylenerf_ffhq

It will automatically detect all usable GPUs.

Please check configuration files at conf/model and conf/spec. You can always add your own model config. More details on how to use hydra configuration please follow https://hydra.cc/docs/intro/.

Render the pretrained model

python generate.py --outdir=${OUTDIR} --trunc=0.7 --seeds=${SEEDS} --network=${CHECKPOINT_PATH} --render-program="rotation_camera"

It supports different rotation trajectories for rendering new videos.

Run a demo page

python web_demo.py 21111

It will in default run a Gradio-powered demo on https://localhost:21111

[NEW] The demo is also integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo:

Run a GUI visualizer

python visualizer.py

An interative application will show up for users to play with.

Citation

@inproceedings{
    gu2022stylenerf,
    title={StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis},
    author={Jiatao Gu and Lingjie Liu and Peng Wang and Christian Theobalt},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=iUuzzTMUw9K}
}

License

The majority of StyleNeRF is licensed under CC-BY-NC, however, portions of this project are available under a separate license terms: all codes used or modified from stylegan2-ada-pytorch is under the Nvidia Source Code License.

Comments

Training fails on multi-gpu setup

Hello StyleNeRF folks, thank you so much for releasing the code!

I am trying to train the model on a 8xA6000 box with no success so far.

python run_train.py outdir=/root/out data=/root/256.zip spec=paper256 model=stylenerf_ffhq

I have validated that a single GPU A6000 does work, I've also used the provided configs.

I am running Ubuntu 20.04.3 LTS with Pytorch LTS (1.8.2) and CUDA 11.1 (which is necessary for A6000 support AFAIK).

Here is the stack trace I am getting, lmk if I can provide any additional information:

Error executing job with overrides: ['outdir=/root/out', 'data=/root/P256.zip', 'spec=paper256', 'model=stylenerf_ffhq']
Traceback (most recent call last):
  File "run_train.py", line 396, in <module>
    main() # pylint: disable=no-value-for-parameter
  File "/usr/local/lib/python3.8/dist-packages/hydra/main.py", line 49, in decorated_main
    _run_hydra(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 367, in _run_hydra
    run_and_report(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 214, in run_and_report
    raise ex
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report
    return func()
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 368, in <lambda>
    lambda: hydra.run(
  File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 110, in run
    _ = ret.return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 233, in return_value
    raise self._return_value
  File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 160, in run_job
    ret.return_value = task_function(task_cfg)
  File "run_train.py", line 378, in main
    torch.multiprocessing.spawn(fn=subprocess_fn, args=(args,), nprocs=args.num_gpus)
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
    return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
    while not context.join():
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
    raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

-- Process 5 terminated with the following error:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/root/StyleNeRF/run_train.py", line 302, in subprocess_fn
    training_loop.training_loop(**args)
  File "/root/StyleNeRF/training/training_loop.py", line 221, in training_loop
    module = torch.nn.parallel.DistributedDataParallel(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 448, in __init__
    self._ddp_init_helper()
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 603, in _ddp_init_helper
    self.reducer = dist.Reducer(
RuntimeError: replicas[0][0] in this process with strides [60, 1, 1, 1] appears not to match strides of the same param in process 0.

opened by mike-athene 7

some error when inference
Hi! Thanks your amazing work! I try to render your pretrained model as described in https://github.com/facebookresearch/StyleNeRF/#render-the-pretrained-model using your ffhq_512.pkl. But it failed! The error message is as follows:

legacy.py", line 21, in load_network_pkl data = _LegacyUnpickler(f).load() _pickle.UnpicklingError: invalid load key, 'v'.

Is there anything wrong with your hugging face pretrianed model or somewhere else? Looking forward to your reply!
opened by 41xu 4
Can't get anything to work after installation seemed to go smoothly

So I'm a little bit of a noob when it comes to installing things like this (although I have succesfully in the past). I followed all of the instructions, and everything seemed to install fine with no issues.

After installing the requirements, none of the other commands seem to work.

Is there a detailed guide anywhere as to how to get everything set up correctly?

opened by Wythneth 4
Request to release pretrained models

We're glad you finally released the code, this is a great job. Can you release the pretrained models, especially the CompCars model, for a better experience?

opened by huangqiusheng 4
Difference between Generator in run_train and the one used to train pretrained checkpoints

I get this error when loading pretrained network. The size of the layer was changed: Error loading: synthesis.fg_nerf.feat_out.weight torch.Size([64, 128, 1, 1]) torch.Size([256, 128, 1, 1])

Where can I modify the structure of generator to be the same as in pretrained checkpoint?

opened by KyriaAnnwyn 3
Bug? Wrong shape

https://github.com/facebookresearch/StyleNeRF/blob/03d3800500385fffeaa2df09fca649edb001b0bb/apps/inversion.py#L119

if we set encoder_z=True, the shape of zs output from E is [1,17,512], but the mapping network can only accept input with 2 dimensions ([1, 512]). Different from stylegan, the value of each z of zs is not the same (zs[:,0,:] != zs[:,1,:]). So, we can not squeeze zs from [1,17,512] into [1,512].

opened by lelechen63 2
Training config of EG3D

Thanks for releasing the code! I tried training eg3d from scratch following the config file "stylenerf_ffhq_eg3d" , but it cannot converge. How should I change the config file?

opened by MrTornado24 2
How to get the high resolution result

Hello I have a issue when I want to train my own model with my dataset. The resolution of my dataset is 10241024 and the output of my model turn out to be 3232 and when I try for 10 hour it became to 64*64 resolution. I have no idea about this growth resolution and I want to get a high resolution result as the same as my training dataset how to do it? Can you help me? Thank you!

opened by apriljt 1
StyleGan3

Hello! I'm trying to use the web_app with stylegan3 generator, but I have an error TypeError: __init__() got an unexpected keyword argument 'channel_base.

opened by hadhoryth 1
Change Facial expressions?

I was wondering if the visualizer allows you to change the facial expressions? I messed around with the demo and it's fine but 90 percent of the faces are smiling or showing teeth which is not what I want. I tried to look at the image of the visualizer GUI but I couldn't tell if there was any input to change the facial expression so I figured I would ask before I go through all the effort to install stylenerf. Thanks everyone

opened by Echolink50 1
Question for Upsample operation (Equation 7 in paper)

Thanks for the great work. However, I have a question about the upsample operation. In the released code, the Upsample operation seems to be the following code:

https://github.com/facebookresearch/StyleNeRF/blob/03d3800500385fffeaa2df09fca649edb001b0bb/training/networks.py#L481-L491

Why is the above code equal to the upsample operation mentioned in the paper? Looking forward to your reponse.

opened by LeoXing1996 1

Can not reproduce the fid score of the paper

Dear authors, Thanks for sharing this excellent work! I tried to train the new StyleNeRF model by running the run_train.py , but I could not get the fid score consistent with the original paper. I train the model on FFHQ (resolution is 256*256) by 8 Nvidia V100 GPUs and use the configuration by default, i.e., spec=paper512, model=stylenerf_ffhq. However, I got the 10.48 FID when cur_img=017000+ which is 8.00 in the original paper. Could you tell me where I can modified to reproduce the results in the paper. The details of config.yaml, paper512.yaml and stylenerf_ffhq.yaml are below.

config.yaml

defaults:
  - _self_
  - model: stylenerf_ffhq
  - spec: paper512

# general options
outdir: ~
dry_run: False
debug: False
resume_run: ~

snap: 50    # Snapshot interval [default: 50 ticks]
imgsnap: 200
metrics: [ "fid50k_full" ]
seed: 2
num_fp16_res: 4
auto: False

# dataset
data: ~
resolution: ~
cond: False
subset: ~   # Train with only N images: <int>, default = all
mirror: False

# discriminator augmentation
aug: noaug
p: ~
target: ~
augpipe: ~

# transfer learning
resume: ~
freezed: ~

# performance options
fp32: False
nhwc: False
allow_tf32: False
nobench: False
workers: 1

launcher: "ddp"
partition: ~
comment: ~
gpus: ~     # Number of GPUs to use [default: 1]
port: ~
nodes: ~
timeout: ~

paper512.yaml

name: paper512
ref_gpus: 8
kimg: 25000
mb: 64 # Total batch size for one training iteration. Can be larger than batch_gpu * world_size
mbstd: 8 # batch size for once GPU
fmaps: 1
lrate: 0.0025
lrate_disc: 0.0025
gamma: 0.5
ema: 20
ramp: ~
map: 8

stylenerf_ffhq.yaml

# @package _group_
name: stylenerf_ffhq

G_kwargs:
    class_name: "training.networks.Generator"
    z_dim: 512
    w_dim: 512

    mapping_kwargs:
        module_name: "training.networks.MappingNetwork"
        num_layers: ${spec.map}

    synthesis_kwargs:
        # global settings
        num_fp16_res: ${num_fp16_res}
        channel_base: 1
        channel_max: 1024
        conv_clamp: 256
        kernel_size: 1
        architecture: skip
        upsample_mode: "nn_cat"

        z_dim_bg: 32
        z_dim: 0
        resolution_vol: 32
        resolution_start: 32
        rgb_out_dim: 256

        use_noise: False
        module_name: "training.stylenerf.NeRFSynthesisNetwork"
        no_bbox: True
        margin: 0
        magnitude_ema_beta: 0.999

        camera_kwargs:
            range_v: [1.4157963267948965, 1.7257963267948966]
            range_u: [-0.3, 0.3]
            range_radius: [1.0, 1.0]
            depth_range: [0.88, 1.12]
            fov: 12
            gaussian_camera: True
            angular_camera: True
            depth_transform:  ~
            dists_normalized: False
            ray_align_corner: False
            bg_start: 0.5
        
        renderer_kwargs:
            n_bg_samples: 4
            n_ray_samples: 14
            abs_sigma: False
            hierarchical: True
            no_background: False
            
        foreground_kwargs:
            positional_encoding: "normal"
            downscale_p_by: 1
            use_style: "StyleGAN2"
            predict_rgb: True
            use_viewdirs: False

        background_kwargs:
            positional_encoding: "normal"
            hidden_size: 64
            n_blocks: 4
            downscale_p_by: 1
            skips: []
            inverse_sphere: True
            use_style: "StyleGAN2"
            predict_rgb: True
            use_viewdirs: False

        upsampler_kwargs:
            channel_base: ${model.G_kwargs.synthesis_kwargs.channel_base}
            channel_max:  ${model.G_kwargs.synthesis_kwargs.channel_max}
            no_2d_renderer: False
            no_residual_img: False
            block_reses: ~
            shared_rgb_style: False
            upsample_type: "bilinear"
        
        progressive: True

        # reuglarization
        n_reg_samples: 16
        reg_full: True

D_kwargs:
    class_name: "training.stylenerf.Discriminator"
    epilogue_kwargs:
        mbstd_group_size: ${spec.mbstd}

    num_fp16_res: ${num_fp16_res}
    channel_base: ${spec.fmaps}
    channel_max: 512
    conv_clamp: 256
    architecture: skip
    progressive: ${model.G_kwargs.synthesis_kwargs.progressive}
    lowres_head: ${model.G_kwargs.synthesis_kwargs.resolution_start}
    upsample_type: "bilinear"
    resize_real_early: True

# loss kwargs
loss_kwargs:
    pl_batch_shrink: 2
    pl_decay: 0.01
    pl_weight: 2
    style_mixing_prob: 0.9
    curriculum: [500,5000]

opened by BlingHe 0

How do I synthesize image with W+ latent vector?

Hello. Thank you for your nice work.

I want to ask how I can synthesize an image with latent vector in W+ space. The shape of w vector seems to be torch.size([1, 17, 512]). But in my knowledge, latent vectors in W+ space have size of [1, 18, 512].

It seems like you have already experimented with latent vectors in W+ space, so I would be grateful if you let me know how you implemented the experiments with W+ space.

opened by Jio0728 0
Request for Uploading AFHQ Checkpoint

Hi and many thanks for sharing your great work!

Could you please upload the pretrained models of AFHQ dataset? We would like to do some experiments but cannot find the corresponding checkpoint on Hugging Face.

Best regards.

opened by RegisWu 0
Wrong parameter count for StyleNeRF checkpoints?

The README indicates parameter count of 128M, 153M, and 184M for the FFHQ models at 256,512, and 1024 resolution respectively But when I load up the checkpoints in Colab, I see that the 256 resolution model has 5.2 million parameters only. What is the cause of this discrepancy?

opened by ksagoog 0
No block_kwargs for freezed layers

in run_train.py: line 249: args.D_kwargs.block_kwargs.freeze_layers = cfg.freezed

I'm getting the error: omegaconf.errors.ConfigAttributeError: Missing key block_kwargs when set non zero value for freezed layers

How can I freeze some layers?

opened by KyriaAnnwyn 0

Owner

Meta Research

GitHub

Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

81 Dec 1, 2022

Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

7.6k Jan 4, 2023

The Open Source Framework for Machine Vision

SimpleCV Quick Links: About Installation [Docker] (#docker) Ubuntu Virtual Environment Arch Linux Fedora MacOS Windows Raspberry Pi SimpleCV Shell Vid

2.6k Dec 31, 2022

Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

65.7k Jan 3, 2023

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

162 Jan 5, 2023

The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

1.6k Feb 24, 2022

Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

48.4k Jan 9, 2023

CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.

732 Dec 23, 2022

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

235 Dec 22, 2022

Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

7.6k Jan 6, 2023

An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

115 Dec 12, 2022

A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

417 Dec 12, 2022

Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

758 Dec 22, 2022

An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

484 Dec 7, 2022

This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

90 Dec 22, 2022

The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

363 Dec 28, 2022

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

307 Jan 3, 2023

Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

213 Nov 12, 2022

CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360˚ Layout Estimation via Differentiable Depth Rendering". Y

83 Jan 4, 2023