PyTorch code for DriveGAN: Towards a Controllable High-Quality Neural Simulation

Last update: Dec 24, 2022

Related tags

Deep Learning DriveGAN_code

Overview

DriveGAN: Towards a Controllable High-Quality Neural Simulation

PyTorch code for DriveGAN

DriveGAN: Towards a Controllable High-Quality Neural Simulation
Seung Wook Kim, Jonah Philion, Antonio Torralba, Sanja Fidler
CVPR (oral), 2021
[Paper] [Project Page]

Abstract: Realistic simulators are critical for training and verifying robotics systems. While most of the contemporary simulators are hand-crafted, a scaleable way to build simulators is to use machine learning to learn how the environment behaves in response to an action, directly from data. In this work, we aim to learn to simulate a dynamic environment directly in pixel-space, by watching unannotated sequences of frames and their associated action pairs. We introduce a novel high-quality neural simulator referred to as DriveGAN that achieves controllability by disentangling different components without supervision. In addition to steering controls, it also includes controls for sampling features of a scene, such as the weather as well as the location of non-player objects. Since DriveGAN is a fully differentiable simulator, it further allows for re-simulation of a given video sequence, offering an agent to drive through a recorded scene again, possibly taking different actions. We train DriveGAN on multiple datasets, including 160 hours of real-world driving data. We showcase that our approach greatly surpasses the performance of previous data-driven simulators, and allows for new features not explored before.

For business inquires, please contact researchinquiries@nvidia.com

For press and other inquireis, please contact Hector Marinez at hmarinez@nvidia.com

Citation

If you found this codebase useful in your research, please cite:

@inproceedings{kim2021drivegan,
  title={DriveGAN: Towards a Controllable High-Quality Neural Simulation},
  author={Kim, Seung Wook and Philion, Jonah and Torralba, Antonio and Fidler, Sanja},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={5820--5829},
  year={2021}
}

Environment Setup

This codebase is tested with Ubuntu 18.04 and python 3.6.9, but it most likely would work with other close python3 versions.

Clone the repository

git clone https://github.com/nv-tlabs/DriveGAN_code.git
cd DriveGAN_code

Install dependencies

pip install -r requirements.txt

Data

We provide a dataset derived from Carla Simulator (https://carla.org/, https://github.com/carla-simulator/carla). This dataset is distributed under Creative Commons Attribution-NonCommercial 4.0 International Public LicenseCC BY-NC 4.0

All data are stored in the following link: https://drive.google.com/drive/folders/1fGM6KVzBL9M-6r7058fqyVnNcHVnYoJ3?usp=sharing

Training

Stage 1 (VAE-GAN)

If you want to skip stage 1 training, go to the Stage 2 (Dynamics Engine) section. For stage 1 training, download {0-5}.tar.gz from the link and extract. The extracted datasets have names starting with 6405 - change their name to data1 (for 0.tar.gz) to data6 (for 5.tar.gz).

cd DriveGAN_code/latent_decoder_model
mkdir img_data && cd img_data
tar -xvzf {0-5}.tar.gz
mv 6405x data{1-6}

Then, run

./scripts/train.sh ./img_data/data1,./img_data/data2,./img_data/data3,./img_data/data4,./img_data/data5,./img_data/data6

You can monitor training progress with tensorboard in the log_dir specified in train.sh

When validation loss converges, you can now encode the dataset with the learned model (located in log_dir from training)

./scripts/encode.sh ${path to saved model} 1 0 ./img_data/data1,./img_data/data2,./img_data/data3,./img_data/data4,./img_data/data5,./img_data/data6 ../encoded_data/data

Stage 2 (Dynamics Engine)

If you did not do Stage 1 training, download encoded_data.tar.gz and vaegan_iter210000.pt from link, and extract.

cd DriveGAN_code
mkdir encoded_data
tar -xvzf encoded_data.tar.gz -C encoded_data

Otherwise, run

cd DriveGAN_code
./scripts/train.sh encoded_data/data ${path to saved vae-gan model}

Playing with trained model

If you want to skip training, download simulator_epoch1020.pt and vaegan_iter210000.pt from link.

To play with a trained model, run

./scripts/play/server.sh ${path to saved dynamics engine} ${port e.g. 8888} ${path to saved vae-gan model}

Now you can navigate to localhost:{port} on your browser (tested on Chrome) and play.

(Controls - 'w': speed up, 's': slow down, 'a': steer left, 'd': steer right)

There are also additional buttons for changing contents. To sample a new scene, simply refresh the webpage.

License

Thie codebase and trained models are distributed under Nvidia Source Code License and the dataset is distributed under CC BY-NC 4.0.

Code for VAE-GAN is adapted from https://github.com/rosinality/stylegan2-pytorch (License).

Code for Lpips is imported from https://github.com/richzhang/PerceptualSimilarity (License).

StyleGAN custom ops are imported from https://github.com/NVlabs/stylegan2 (License).

Interactive UI code uses http://www.semantic-ui.com/ (License).

Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

SphereRPN Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021. Authors: Th

15 Dec 2, 2022

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

InversePrompting Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting Code: The code is provided in the "chinese_ip"

101 Dec 16, 2022

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021) Hang Zhou, Yasheng Sun, Wayne Wu, Chen Cha

628 Dec 28, 2022

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics This work will be published in Nature Biomedical

71 Nov 15, 2022

HandTailor: Towards High-Precision Monocular 3D Hand Recovery

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

NUANCED: Natural Utterance Annotation for Nuanced Conversation with Estimated Distributions Overview NUANCED is a user-centric conversational recommen

18 Dec 28, 2021

Comments

server.py: error: option --port: invalid integer value: '{8888}'

Hi,

I am trying to use the already trained model. I followed all the instructions from the repo. And I ran the server.sh as below

./scripts/play/server.sh {Downloads/simulator_epoch1020.pt} {8888} {Downloads/vaegan_iter210000.pt}

But this ends up with an error as shown here:

Any suggestion in resolving this would be much appreciated. Thanks in advance.

opened by kulkarnikeerti 9
【Training stage1(VAE-GAN)】Cannot train with the data i've prepared.

Hello. I tried VAE-GAN training with my data (256×256pix). However , after typing ./scripts/train.sh ./img_data/data1 displayed following errors.

There are about 17,000 png images in the data1 folder. Also --batch in latent_decoder_model/script/train.sh is changed 6 to 1.

The development environment is a docker container provided by nvidia. https://ngc.nvidia.com/catalog/containers/nvidia:pytorch

OS is ubuntu20.04LTS. Graphics board is RTX3080. The specified torch did not work with this graphics board, so I had to re-install the torch. pip3 install torch==1.10.1+cu113 torchvision==0.11.2+cu113 torchaudio==0.10.1+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html

I'm not from an English-speaking country, so my writing may be poor, but I hope you can help me with a solution.

opened by clown6613 5

PyTorch code for DriveGAN: Towards a Controllable High-Quality Neural Simulation

Related tags

Overview

DriveGAN: Towards a Controllable High-Quality Neural Simulation

Citation

Environment Setup

Data

Training

Stage 1 (VAE-GAN)

Stage 2 (Dynamics Engine)

Playing with trained model

License

You might also like...

Code for the paper SphereRPN: Learning Spheres for High-Quality Region Proposals on 3D Point Clouds Object Detection, ICIP 2021.

Code, Data and Demo for Paper: Controllable Generation from Pre-trained Language Models via Inverse Prompting

Code for Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation (CVPR 2021)

source code for https://arxiv.org/abs/2005.11248 "Accelerating Antimicrobial Discovery with Controllable Deep Generative Models and Molecular Dynamics"

HandTailor: Towards High-Precision Monocular 3D Hand Recovery

《Towards High Fidelity Face Relighting with Realistic Shadows》(CVPR 2021)

Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

Seeing Dynamic Scene in the Dark: High-Quality Video Dataset with Mechatronic Alignment (ICCV2021)

NUANCED is a user-centric conversational recommendation dataset that contains 5.1k annotated dialogues and 26k high-quality user turns.

Comments

server.py: error: option --port: invalid integer value: '{8888}'

【Training stage1(VAE-GAN)】Cannot train with the data i've prepared.

Owner

Official PyTorch implementation of the paper "Recycling Discriminator: Towards Opinion-Unaware Image Quality Assessment Using Wasserstein GAN", accepted to ACM MM 2021 BNI Track.

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

The source code of the ICCV2021 paper "PIRenderer: Controllable Portrait Image Generation via Semantic Neural Rendering"

Official pytorch code for SSC-GAN: Semi-Supervised Single-Stage Controllable GANs for Conditional Fine-Grained Image Generation(ICCV 2021)

Neural Nano-Optics for High-quality Thin Lens Imaging

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

PyTorch Implementation of PortaSpeech: Portable and High-Quality Generative Text-to-Speech

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the PyTorch implementation of GANs N’ Roses: Stable, Controllable, Diverse Image to Image Translation