This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

Overview

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis

Random Sample

StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis
Jiatao Gu, Lingjie Liu, Peng Wang, Christian Theobalt

Project Page | Video | Demo | Paper | Data

Abstract: We propose StyleNeRF, a 3D-aware generative model for photo-realistic high-resolution image synthesis with high multi-view consistency, which can be trained on unstructured 2D images. Existing approaches either cannot synthesize high-resolution images with fine details or yield noticeable 3D-inconsistent artifacts. In addition, many of them lack control over style attributes and explicit 3D camera poses. StyleNeRF integrates the neural radiance field (NeRF) into a style-based generator to tackle the aforementioned challenges, i.e., improving rendering efficiency and 3D consistency for high-resolution image generation. We perform volume rendering only to produce a low-resolution feature map and progressively apply upsampling in 2D to address the first issue. To mitigate the inconsistencies caused by 2D upsampling, we propose multiple designs, including a better upsampler and a new regularization loss. With these designs, StyleNeRF can synthesize high-resolution images at interactive rates while preserving 3D consistency at high quality. StyleNeRF also enables control of camera poses and different levels of styles, which can generalize to unseen views. It also supports challenging tasks, including zoom-in and-out, style mixing, inversion, and semantic editing.

Requirements

The codebase is tested on

  • Python 3.7
  • PyTorch 1.7.1
  • 8 Nvidia GPU (Tesla V100 32GB) with CUDA version 11.0

For additional python libraries, please install by:

pip install -r requirements.txt

Please refer to https://github.com/NVlabs/stylegan2-ada-pytorch for additional software/hardware requirements.

Dataset

We follow the same dataset format as StyleGAN2-ADA supported, which can be either an image folder, or a zipped file.

Pretrained Checkpoints

You can download the pre-trained checkpoints (used in our paper) and some recent variants trained with current codebase as follows:

Dataset Resolution #Params(M) Config Download
FFHQ 256 128 Default Hugging Face 🤗
FFHQ 512 148 Default Hugging Face 🤗
FFHQ 1024 184 Default Hugging Face 🤗

(I am slowly adding more checkpoints. Thanks for your very kind patience!)

Train a new StyleNeRF model

python run_train.py outdir=${OUTDIR} data=${DATASET} spec=paper512 model=stylenerf_ffhq

It will automatically detect all usable GPUs.

Please check configuration files at conf/model and conf/spec. You can always add your own model config. More details on how to use hydra configuration please follow https://hydra.cc/docs/intro/.

Render the pretrained model

python generate.py --outdir=${OUTDIR} --trunc=0.7 --seeds=${SEEDS} --network=${CHECKPOINT_PATH} --render-program="rotation_camera"

It supports different rotation trajectories for rendering new videos.

Run a demo page

python web_demo.py 21111

It will in default run a Gradio-powered demo on https://localhost:21111

[NEW] The demo is also integrated into Huggingface Spaces 🤗 using Gradio. Try out the Web Demo: Hugging Face Spaces

Web demo

Run a GUI visualizer

python visualizer.py

An interative application will show up for users to play with. GUI demo

Citation

@inproceedings{
    gu2022stylenerf,
    title={StyleNeRF: A Style-based 3D Aware Generator for High-resolution Image Synthesis},
    author={Jiatao Gu and Lingjie Liu and Peng Wang and Christian Theobalt},
    booktitle={International Conference on Learning Representations},
    year={2022},
    url={https://openreview.net/forum?id=iUuzzTMUw9K}
}

License

Copyright © Facebook, Inc. All Rights Reserved.

The majority of StyleNeRF is licensed under CC-BY-NC, however, portions of this project are available under a separate license terms: all codes used or modified from stylegan2-ada-pytorch is under the Nvidia Source Code License.

Comments
  • Training fails on multi-gpu setup

    Training fails on multi-gpu setup

    Hello StyleNeRF folks, thank you so much for releasing the code!

    I am trying to train the model on a 8xA6000 box with no success so far.

    python run_train.py outdir=/root/out data=/root/256.zip spec=paper256 model=stylenerf_ffhq

    I have validated that a single GPU A6000 does work, I've also used the provided configs.

    I am running Ubuntu 20.04.3 LTS with Pytorch LTS (1.8.2) and CUDA 11.1 (which is necessary for A6000 support AFAIK).

    Here is the stack trace I am getting, lmk if I can provide any additional information:

    Error executing job with overrides: ['outdir=/root/out', 'data=/root/P256.zip', 'spec=paper256', 'model=stylenerf_ffhq']
    Traceback (most recent call last):
      File "run_train.py", line 396, in <module>
        main() # pylint: disable=no-value-for-parameter
      File "/usr/local/lib/python3.8/dist-packages/hydra/main.py", line 49, in decorated_main
        _run_hydra(
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 367, in _run_hydra
        run_and_report(
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 214, in run_and_report
        raise ex
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report
        return func()
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/utils.py", line 368, in <lambda>
        lambda: hydra.run(
      File "/usr/local/lib/python3.8/dist-packages/hydra/_internal/hydra.py", line 110, in run
        _ = ret.return_value
      File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 233, in return_value
        raise self._return_value
      File "/usr/local/lib/python3.8/dist-packages/hydra/core/utils.py", line 160, in run_job
        ret.return_value = task_function(task_cfg)
      File "run_train.py", line 378, in main
        torch.multiprocessing.spawn(fn=subprocess_fn, args=(args,), nprocs=args.num_gpus)
      File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 230, in spawn
        return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
      File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 188, in start_processes
        while not context.join():
      File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 150, in join
        raise ProcessRaisedException(msg, error_index, failed_process.pid)
    torch.multiprocessing.spawn.ProcessRaisedException:
    
    -- Process 5 terminated with the following error:
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
        fn(i, *args)
      File "/root/StyleNeRF/run_train.py", line 302, in subprocess_fn
        training_loop.training_loop(**args)
      File "/root/StyleNeRF/training/training_loop.py", line 221, in training_loop
        module = torch.nn.parallel.DistributedDataParallel(
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 448, in __init__
        self._ddp_init_helper()
      File "/usr/local/lib/python3.8/dist-packages/torch/nn/parallel/distributed.py", line 603, in _ddp_init_helper
        self.reducer = dist.Reducer(
    RuntimeError: replicas[0][0] in this process with strides [60, 1, 1, 1] appears not to match strides of the same param in process 0.
    
    opened by mike-athene 7
  • some error when inference

    some error when inference

    Hi! Thanks your amazing work! I try to render your pretrained model as described in https://github.com/facebookresearch/StyleNeRF/#render-the-pretrained-model using your ffhq_512.pkl. But it failed! The error message is as follows:

    legacy.py", line 21, in load_network_pkl
        data = _LegacyUnpickler(f).load()
    _pickle.UnpicklingError: invalid load key, 'v'.
    

    Is there anything wrong with your hugging face pretrianed model or somewhere else? Looking forward to your reply!

    opened by 41xu 4
  • Can't get anything to work after installation seemed to go smoothly

    Can't get anything to work after installation seemed to go smoothly

    So I'm a little bit of a noob when it comes to installing things like this (although I have succesfully in the past). I followed all of the instructions, and everything seemed to install fine with no issues.

    After installing the requirements, none of the other commands seem to work.

    Is there a detailed guide anywhere as to how to get everything set up correctly?

    opened by Wythneth 4
  • Request to release pretrained models

    Request to release pretrained models

    We're glad you finally released the code, this is a great job. Can you release the pretrained models, especially the CompCars model, for a better experience?

    opened by huangqiusheng 4
  • Difference between Generator in run_train and the one used to train pretrained checkpoints

    Difference between Generator in run_train and the one used to train pretrained checkpoints

    I get this error when loading pretrained network. The size of the layer was changed: Error loading: synthesis.fg_nerf.feat_out.weight torch.Size([64, 128, 1, 1]) torch.Size([256, 128, 1, 1])

    Where can I modify the structure of generator to be the same as in pretrained checkpoint?

    opened by KyriaAnnwyn 3
  • Bug? Wrong shape

    Bug? Wrong shape

    https://github.com/facebookresearch/StyleNeRF/blob/03d3800500385fffeaa2df09fca649edb001b0bb/apps/inversion.py#L119

    if we set encoder_z=True, the shape of zs output from E is [1,17,512], but the mapping network can only accept input with 2 dimensions ([1, 512]). Different from stylegan, the value of each z of zs is not the same (zs[:,0,:] != zs[:,1,:]). So, we can not squeeze zs from [1,17,512] into [1,512].

    opened by lelechen63 2
  • Training config of EG3D

    Training config of EG3D

    Thanks for releasing the code! I tried training eg3d from scratch following the config file "stylenerf_ffhq_eg3d" , but it cannot converge. How should I change the config file?

    opened by MrTornado24 2
  • How to get the high resolution result

    How to get the high resolution result

    Hello I have a issue when I want to train my own model with my dataset. The resolution of my dataset is 10241024 and the output of my model turn out to be 3232 and when I try for 10 hour it became to 64*64 resolution. I have no idea about this growth resolution and I want to get a high resolution result as the same as my training dataset how to do it? Can you help me? Thank you!

    opened by apriljt 1
  • StyleGan3

    StyleGan3

    Hello! I'm trying to use the web_app with stylegan3 generator, but I have an error TypeError: __init__() got an unexpected keyword argument 'channel_base.

    opened by hadhoryth 1
  • Change Facial expressions?

    Change Facial expressions?

    I was wondering if the visualizer allows you to change the facial expressions? I messed around with the demo and it's fine but 90 percent of the faces are smiling or showing teeth which is not what I want. I tried to look at the image of the visualizer GUI but I couldn't tell if there was any input to change the facial expression so I figured I would ask before I go through all the effort to install stylenerf. Thanks everyone

    opened by Echolink50 1
  • Question for Upsample operation (Equation 7 in paper)

    Question for Upsample operation (Equation 7 in paper)

    Thanks for the great work. However, I have a question about the upsample operation. In the released code, the Upsample operation seems to be the following code:

    https://github.com/facebookresearch/StyleNeRF/blob/03d3800500385fffeaa2df09fca649edb001b0bb/training/networks.py#L481-L491

    Why is the above code equal to the upsample operation mentioned in the paper? Looking forward to your reponse.

    opened by LeoXing1996 1
  • Can not reproduce the fid score of the paper

    Can not reproduce the fid score of the paper

    Dear authors, Thanks for sharing this excellent work! I tried to train the new StyleNeRF model by running the run_train.py , but I could not get the fid score consistent with the original paper. I train the model on FFHQ (resolution is 256*256) by 8 Nvidia V100 GPUs and use the configuration by default, i.e., spec=paper512, model=stylenerf_ffhq. However, I got the 10.48 FID when cur_img=017000+ which is 8.00 in the original paper. Could you tell me where I can modified to reproduce the results in the paper. The details of config.yaml, paper512.yaml and stylenerf_ffhq.yaml are below.

    config.yaml

    defaults:
      - _self_
      - model: stylenerf_ffhq
      - spec: paper512
    
    # general options
    outdir: ~
    dry_run: False
    debug: False
    resume_run: ~
    
    snap: 50    # Snapshot interval [default: 50 ticks]
    imgsnap: 200
    metrics: [ "fid50k_full" ]
    seed: 2
    num_fp16_res: 4
    auto: False
    
    # dataset
    data: ~
    resolution: ~
    cond: False
    subset: ~   # Train with only N images: <int>, default = all
    mirror: False
    
    # discriminator augmentation
    aug: noaug
    p: ~
    target: ~
    augpipe: ~
    
    # transfer learning
    resume: ~
    freezed: ~
    
    # performance options
    fp32: False
    nhwc: False
    allow_tf32: False
    nobench: False
    workers: 1
    
    launcher: "ddp"
    partition: ~
    comment: ~
    gpus: ~     # Number of GPUs to use [default: 1]
    port: ~
    nodes: ~
    timeout: ~
    

    paper512.yaml

    name: paper512
    ref_gpus: 8
    kimg: 25000
    mb: 64 # Total batch size for one training iteration. Can be larger than batch_gpu * world_size
    mbstd: 8 # batch size for once GPU
    fmaps: 1
    lrate: 0.0025
    lrate_disc: 0.0025
    gamma: 0.5
    ema: 20
    ramp: ~
    map: 8
    

    stylenerf_ffhq.yaml

    # @package _group_
    name: stylenerf_ffhq
    
    G_kwargs:
        class_name: "training.networks.Generator"
        z_dim: 512
        w_dim: 512
    
        mapping_kwargs:
            module_name: "training.networks.MappingNetwork"
            num_layers: ${spec.map}
    
        synthesis_kwargs:
            # global settings
            num_fp16_res: ${num_fp16_res}
            channel_base: 1
            channel_max: 1024
            conv_clamp: 256
            kernel_size: 1
            architecture: skip
            upsample_mode: "nn_cat"
    
            z_dim_bg: 32
            z_dim: 0
            resolution_vol: 32
            resolution_start: 32
            rgb_out_dim: 256
    
            use_noise: False
            module_name: "training.stylenerf.NeRFSynthesisNetwork"
            no_bbox: True
            margin: 0
            magnitude_ema_beta: 0.999
    
            camera_kwargs:
                range_v: [1.4157963267948965, 1.7257963267948966]
                range_u: [-0.3, 0.3]
                range_radius: [1.0, 1.0]
                depth_range: [0.88, 1.12]
                fov: 12
                gaussian_camera: True
                angular_camera: True
                depth_transform:  ~
                dists_normalized: False
                ray_align_corner: False
                bg_start: 0.5
            
            renderer_kwargs:
                n_bg_samples: 4
                n_ray_samples: 14
                abs_sigma: False
                hierarchical: True
                no_background: False
                
            foreground_kwargs:
                positional_encoding: "normal"
                downscale_p_by: 1
                use_style: "StyleGAN2"
                predict_rgb: True
                use_viewdirs: False
    
            background_kwargs:
                positional_encoding: "normal"
                hidden_size: 64
                n_blocks: 4
                downscale_p_by: 1
                skips: []
                inverse_sphere: True
                use_style: "StyleGAN2"
                predict_rgb: True
                use_viewdirs: False
    
            upsampler_kwargs:
                channel_base: ${model.G_kwargs.synthesis_kwargs.channel_base}
                channel_max:  ${model.G_kwargs.synthesis_kwargs.channel_max}
                no_2d_renderer: False
                no_residual_img: False
                block_reses: ~
                shared_rgb_style: False
                upsample_type: "bilinear"
            
            progressive: True
    
            # reuglarization
            n_reg_samples: 16
            reg_full: True
    
    D_kwargs:
        class_name: "training.stylenerf.Discriminator"
        epilogue_kwargs:
            mbstd_group_size: ${spec.mbstd}
    
        num_fp16_res: ${num_fp16_res}
        channel_base: ${spec.fmaps}
        channel_max: 512
        conv_clamp: 256
        architecture: skip
        progressive: ${model.G_kwargs.synthesis_kwargs.progressive}
        lowres_head: ${model.G_kwargs.synthesis_kwargs.resolution_start}
        upsample_type: "bilinear"
        resize_real_early: True
    
    # loss kwargs
    loss_kwargs:
        pl_batch_shrink: 2
        pl_decay: 0.01
        pl_weight: 2
        style_mixing_prob: 0.9
        curriculum: [500,5000]
    
    opened by BlingHe 0
  • How do I synthesize image with W+ latent vector?

    How do I synthesize image with W+ latent vector?

    Hello. Thank you for your nice work.

    I want to ask how I can synthesize an image with latent vector in W+ space. The shape of w vector seems to be torch.size([1, 17, 512]). But in my knowledge, latent vectors in W+ space have size of [1, 18, 512].

    It seems like you have already experimented with latent vectors in W+ space, so I would be grateful if you let me know how you implemented the experiments with W+ space.

    opened by Jio0728 0
  • Request for Uploading AFHQ Checkpoint

    Request for Uploading AFHQ Checkpoint

    Hi and many thanks for sharing your great work!

    Could you please upload the pretrained models of AFHQ dataset? We would like to do some experiments but cannot find the corresponding checkpoint on Hugging Face.

    Best regards.

    opened by RegisWu 0
  • Wrong parameter count for StyleNeRF checkpoints?

    Wrong parameter count for StyleNeRF checkpoints?

    The README indicates parameter count of 128M, 153M, and 184M for the FFHQ models at 256,512, and 1024 resolution respectively But when I load up the checkpoints in Colab, I see that the 256 resolution model has 5.2 million parameters only. What is the cause of this discrepancy?

    opened by ksagoog 0
  • No block_kwargs for freezed layers

    No block_kwargs for freezed layers

    in run_train.py: line 249: args.D_kwargs.block_kwargs.freeze_layers = cfg.freezed

    I'm getting the error: omegaconf.errors.ConfigAttributeError: Missing key block_kwargs when set non zero value for freezed layers

    How can I freeze some layers?

    opened by KyriaAnnwyn 0
Owner
Meta Research
Meta Research
Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding for Zero-Example Video Retrieval.

Dual Encoding for Video Retrieval by Text Source code of our TPAMI'21 paper Dual Encoding for Video Retrieval by Text and CVPR'19 paper Dual Encoding

null 81 Dec 1, 2022
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 4, 2023
The Open Source Framework for Machine Vision

SimpleCV Quick Links: About Installation [Docker] (#docker) Ubuntu Virtual Environment Arch Linux Fedora MacOS Windows Raspberry Pi SimpleCV Shell Vid

Sight Machine 2.6k Dec 31, 2022
Open Source Computer Vision Library

OpenCV: Open Source Computer Vision Library Resources Homepage: https://opencv.org Courses: https://opencv.org/courses Docs: https://docs.opencv.org/m

OpenCV 65.7k Jan 3, 2023
A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

LAREX LAREX is a semi-automatic open-source tool for layout analysis on early printed books. It uses a rule based connected components approach which

null 162 Jan 5, 2023
The first open-source library that detects the font of a text in a image.

Typefont Typefont is an experimental library that detects the font of a text in a image. Usage Import the main function and invoke it like in the foll

Vasile Pește 1.6k Feb 24, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

null 48.4k Jan 9, 2023
CellProfiler is a open-source application for biological image analysis

CellProfiler is a free open-source software designed to enable biologists without training in computer vision or programming to quantitatively measure phenotypes from thousands of images automatically.

CellProfiler 732 Dec 23, 2022
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and text alignment capabilities.

Microsoft 235 Dec 22, 2022
Kornia is a open source differentiable computer vision library for PyTorch.

Open Source Differentiable Computer Vision Library

kornia 7.6k Jan 6, 2023
An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection

InceptText-Tensorflow An Implementation of the alogrithm in paper IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Orien

GeorgeJoe 115 Dec 12, 2022
A PyTorch implementation of ECCV2018 Paper: TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes

TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes A PyTorch implement of TextSnake: A Flexible Representation for Detecting

Prince Wang 417 Dec 12, 2022
Implementation of our paper 'PixelLink: Detecting Scene Text via Instance Segmentation' in AAAI2018

Code for the AAAI18 paper PixelLink: Detecting Scene Text via Instance Segmentation, by Dan Deng, Haifeng Liu, Xuelong Li, and Deng Cai. Contributions

null 758 Dec 22, 2022
An Implementation of the seglink alogrithm in paper Detecting Oriented Text in Natural Images by Linking Segments

Tips: A more recent scene text detection algorithm: PixelLink, has been implemented here: https://github.com/ZJULearning/pixel_link Contents: Introduc

dengdan 484 Dec 7, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

null 90 Dec 22, 2022
The project is an official implementation of our paper "3D Human Pose Estimation with Spatial and Temporal Transformers".

3D Human Pose Estimation with Spatial and Temporal Transformers This repo is the official implementation for 3D Human Pose Estimation with Spatial and

Ce Zheng 363 Dec 28, 2022
This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).

TransFG: A Transformer Architecture for Fine-grained Recognition Official PyTorch code for the paper: TransFG: A Transformer Architecture for Fine-gra

Ju He 307 Jan 3, 2023
Official implementation of "An Image is Worth 16x16 Words, What is a Video Worth?" (2021 paper)

An Image is Worth 16x16 Words, What is a Video Worth? paper Official PyTorch Implementation Gilad Sharir, Asaf Noy, Lihi Zelnik-Manor DAMO Academy, Al

null 213 Nov 12, 2022
CVPR 2021 Oral paper "LED2-Net: Monocular 360Ëš Layout Estimation via Differentiable Depth Rendering" official PyTorch implementation.

LED2-Net This is PyTorch implementation of our CVPR 2021 Oral paper "LED2-Net: Monocular 360Ëš Layout Estimation via Differentiable Depth Rendering". Y

Fu-En Wang 83 Jan 4, 2023