Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Casual GAN Papers

Last update: Dec 28, 2022

Related tags

Overview

Make-A-Scene - PyTorch

Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/pdf/2203.13131.pdf)

Figure 1. from paper

Note: this is work in progress.

Everyone is happily invited to contribute --> Discord Channel: https://discord.gg/hCRMGRZkC6

We would love to open-source a trained model. The model is a billion parameter model. Training it requires a lot of compute. If anyone can provide computational resources, let us know.

Paper Description:

Make-A-Scene modifies the VQGAN framework. It makes heavy use of using semantic segmentation maps for extra conditioning. This enables more influence on the generation process. Morever, it also conditions on text. The main improvements are the following:

Segmentation condition: separate VQVAE is trained (VQ-SEG) + loss modified to a weighted binary cross entropy. (3.4)
VQGAN training (VQ-IMG) is extended by Face-Loss & Object-Loss (3.3 & 3.5)
Classifier Guidance for the autoregressive transformer (3.7)

Training Pipeline

Figure 6. from paper

What needs to be done?

Refer to the different folders to see details.

Citation

@misc{https://doi.org/10.48550/arxiv.2203.13131,
  doi = {10.48550/ARXIV.2203.13131},
  url = {https://arxiv.org/abs/2203.13131},
  author = {Gafni, Oran and Polyak, Adam and Ashual, Oron and Sheynin, Shelly and Parikh, Devi and Taigman, Yaniv},
  title = {Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors},
  publisher = {arXiv},
  year = {2022},
  copyright = {arXiv.org perpetual, non-exclusive license}
}

You might also like...

Geometry-Free View Synthesis: Transformers and no 3D Priors

Geometry-Free View Synthesis: Transformers and no 3D Priors Geometry-Free View Synthesis: Transformers and no 3D Priors Robin Rombach*, Patrick Esser*

293 Dec 22, 2022

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

DETReg: Unsupervised Pretraining with Region Priors for Object Detection Amir Bar, Xin Wang, Vadim Kantorov, Colorado J Reed, Roei Herzig, Gal Chechik

283 Dec 27, 2022

Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

What is judgyprophet? judgyprophet is a Bayesian forecasting algorithm based on Prophet, that enables forecasting while using information known by the

56 Oct 26, 2022

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

66 Nov 16, 2022

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

Comments

Fix logic bugs of the transformer
Remove one seemingly unused text positional embedding

Output the logits starting from the end of segmentation position not at the start of image position
opened by thuangb 1
Will the segmentation graph data set be exposed?

Hi, I'm very concerned about this great work，What should I do if I want to get the segmentation graph and category of mscoco data set? Will it be disclosed in the future

opened by yanhan111 0
Classifier free guidance

Thanks for sharing the code, I cannot find the classifer free guidance in this implementation. I wonder if I miss it, if so can you show me where it is? Thanks for your patience

opened by KevinGoodman 1

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Related tags

Overview

Make-A-Scene - PyTorch

Note: this is work in progress.

Paper Description:

Training Pipeline

What needs to be done?

Citation

You might also like...

Geometry-Free View Synthesis: Transformers and no 3D Priors

DETReg: Unsupervised Pretraining with Region Priors for Object Detection

Forecasting for knowable future events using Bayesian informative priors (forecasting with judgmental-adjustment).

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

Official Pytorch implementation of the paper "MotionCLIP: Exposing Human Motion Generation to CLIP Space"

Pytorch Implementation of Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

Code for "Learning Canonical Representations for Scene Graph to Image Generation", Herzig & Bar et al., ECCV2020

image scene graph generation benchmark

Comments

Fix logic bugs of the transformer

Will the segmentation graph data set be exposed?

Classifier free guidance

Owner

Casual GAN Papers

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

Image-generation-baseline - MUGE Text To Image Generation Baseline

Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

StyleGAN-Human: A Data-Centric Odyssey of Human Generation

Official implementation of "Accelerating Reinforcement Learning with Learned Skill Priors", Pertsch et al., CoRL 2020

Implementation of CVPR'2022:Reconstructing Surfaces for Sparse Point Clouds with On-Surface Priors

Implementation of CVPR'2022:Surface Reconstruction from Point Clouds by Learning Predictive Context Priors

A 1.3B text-to-image generation model trained on 14 million image-text pairs

Source code for the GPT-2 story generation models in the EMNLP 2020 paper "STORIUM: A Dataset and Evaluation Platform for Human-in-the-Loop Story Generation"