PyTorch implementation of "LayoutTransformer: Layout Generation and Completion with Self-attention"

Overview

LayoutTransformer

arXiv | BibTeX | Project Page

This repo contains code for single GPU training of LayoutTransformer from LayoutTransformer: Layout Generation and Completion with Self-attention. This code was rewritten from scratch using a cleaner GPT codebase. Some of the details such as training hyperparameters might differ from the arxiv version of the paper.

teaser!

How To Use This Code

Start a new conda environment

conda env create -f environment.yml
conda activate layout

or update an existing environment

conda env update -f environment.yml --prune

Logging with wandb

In order to log experiments to wandb, we use wandb's API keys that can be found here https://wandb.ai/settings. Copy your key and store them in an environment variable using

export WANDB_API_KEY=
   

   

Alternately, you can also login using wandb login.

Datasets

COCO Bounding Boxes

See the instructions to obtain the dataset here.

PubLayNet Document Layouts

See the instructions to obtain the dataset here.

LayoutVAE

Reimplementation of LayoutVAE is here. Code contributed primarily by Justin.

cd layout_vae

# Train the CountVAE model
python train_counts.py \
    --exp count_coco_instances \
    --train_json /path/to/coco/annotations/instances_train2017.json \
    --val_json /path/to/coco/annotations/instances_val2017.json \
    --epochs 50

# Train the BoxVAE model
python train_counts.py \
    --exp box_coco_instances \
    --train_json /path/to/coco/annotations/instances_train2017.json \
    --val_json /path/to/coco/annotations/instances_val2017.json \
    --epochs 50

LayoutTransformer

Rewritten from scratch using a cleaner GPT codebase. Some of the details such as training hyperparameters might differ from the arxiv version.

# Training on MNIST layouts
python main.py \
    --data_dir /path/to/mnist \
    --threshold 1 --exp mnist_threshold_1
    
# Training on COCO bounding boxes or PubLayNet
python main.py \
    --train_json /path/to/annotations/train.json \
    --val_json /path/to/annotations/val.json \
    --exp publaynet

BibTeX

If you use this code, please cite

@inproceedings{gupta2021layouttransformer,
  title={LayoutTransformer: Layout Generation and Completion with Self-attention},
  author={Gupta, Kamal and Lazarow, Justin and Achille, Alessandro and Davis, Larry S and Mahadevan, Vijay and Shrivastava, Abhinav},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  pages={1004--1014},
  year={2021}
}
}

Acknowledgments

We would like to thank several public repos

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Comments
  • dataset processing

    dataset processing

    Hi Kamal,

    https://github.com/kampta/DeepLayout/blob/1e66e72c11a526b258fe3395fa4a0f23d29987a2/layout_transformer/dataset.py#L120-L130

    In the JSONLayout class, bounding boxes are sorted but categories are not sorted. Bounding boxes w.r.t categories are not matching as per PubLayNet json files after processing. Can you please check it?

    bug 
    opened by abhijeet00 5
  • How to modify the data representation to support different sizes of layouts?

    How to modify the data representation to support different sizes of layouts?

    Hi, Currently the PubLayNet example generates layouts of size 256x256, a fixed square aspect ratio. How do I modify the representation to make it support different aspect ratios during inference?

    opened by mln00b 0
  • Has anyone successfully trained on COCO?

    Has anyone successfully trained on COCO?

    Hello, thank you for sharing this code. I used the code with no changes and when I saw the visualization of the generated boxes are not really good. image image

    Has anyone successfully trained on COCO?

    opened by sbkim052 4
  • Questions related to your paper

    Questions related to your paper

    Hi~ I've read your paper and the code, and I am little confused about some details.

    Firstly, the coordinates of bounding boxes are discretized and the vocabulary for layouts contains the 8-bit uniform quantization token (which is token indexes are 0-127 in your code), categorial token, padding token, bos and eos token. But what if during inference stage, the predicted coordinate of the trained model doesn't lie in 0-127 tokens? Will you do any post processing to correct these mis-predicted layouts?

    Secondly, in your paper, it is said that 'we minimize KL-Divergence between soft-max predictions and output one-hot distribution with Label Smoothing', but why the code is still a cross entropy loss?

    opened by hbell99 1
  • Wrong Testing result from demo code

    Wrong Testing result from demo code

    Thanks for your repo.

    I used the default training code of publayout, and trained for 10 epoches. The final training loss reaches around 1.27. epoch 10 iter 5245: train loss 1.27828. lr 2.880000e-04: 100%|██████████| 5246/5246 [46:58<00:00, 1.86it/s]

    I opened the annotated code groups (total 4), and found that the input image is fine while the output seems not satisfactory. for i, layout in enumerate(layouts): layout = self.test_dataset.render(layout) layout.save(os.path.join(self.config.samples_dir, f'recon_{epoch:02d}_{i:02d}.png'))

    image

    Any suggestions or comments will be appreciated. thanks!

    opened by Inosonnia 0
An essential implementation of BYOL in PyTorch + PyTorch Lightning

Essential BYOL A simple and complete implementation of Bootstrap your own latent: A new approach to self-supervised Learning in PyTorch + PyTorch Ligh

Enrico Fini 48 Sep 27, 2022
RealFormer-Pytorch Implementation of RealFormer using pytorch

RealFormer-Pytorch Implementation of RealFormer using pytorch. Includes comparison with classical Transformer on image classification task (ViT) wrt C

Simo Ryu 90 Dec 8, 2022
A PyTorch implementation of the paper Mixup: Beyond Empirical Risk Minimization in PyTorch

Mixup: Beyond Empirical Risk Minimization in PyTorch This is an unofficial PyTorch implementation of mixup: Beyond Empirical Risk Minimization. The co

Harry Yang 121 Dec 17, 2022
A pytorch implementation of Pytorch-Sketch-RNN

Pytorch-Sketch-RNN A pytorch implementation of https://arxiv.org/abs/1704.03477 In order to draw other things than cats, you will find more drawing da

Alexis David Jacq 172 Dec 12, 2022
PyTorch implementation of Advantage async actor-critic Algorithms (A3C) in PyTorch

Advantage async actor-critic Algorithms (A3C) in PyTorch @inproceedings{mnih2016asynchronous, title={Asynchronous methods for deep reinforcement lea

LEI TAI 111 Dec 8, 2022
Pytorch-diffusion - A basic PyTorch implementation of 'Denoising Diffusion Probabilistic Models'

PyTorch implementation of 'Denoising Diffusion Probabilistic Models' This reposi

Arthur Juliani 76 Jan 7, 2023
Fang Zhonghao 13 Nov 19, 2022
RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

Phil Wang 556 Jan 4, 2023
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
Generic template to bootstrap your PyTorch project with PyTorch Lightning, Hydra, W&B, and DVC.

NN Template Generic template to bootstrap your PyTorch project. Click on Use this Template and avoid writing boilerplate code for: PyTorch Lightning,

Luca Moschella 520 Dec 30, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

NVIDIA Corporation 6.9k Jan 3, 2023
Objective of the repository is to learn and build machine learning models using Pytorch. 30DaysofML Using Pytorch

30 Days Of Machine Learning Using Pytorch Objective of the repository is to learn and build machine learning models using Pytorch. List of Algorithms

Mayur 119 Nov 24, 2022
Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

Pytorch Lightning 1.4k Jan 1, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 360 Dec 10, 2022
The Incredible PyTorch: a curated list of tutorials, papers, projects, communities and more relating to PyTorch.

This is a curated list of tutorials, projects, libraries, videos, papers, books and anything related to the incredible PyTorch. Feel free to make a pu

Ritchie Ng 9.2k Jan 2, 2023
Amazon Forest Computer Vision: Satellite Image tagging code using PyTorch / Keras with lots of PyTorch tricks

Amazon Forest Computer Vision Satellite Image tagging code using PyTorch / Keras Here is a sample of images we had to work with Source: https://www.ka

Mamy Ratsimbazafy 359 Jan 5, 2023
A bunch of random PyTorch models using PyTorch's C++ frontend

PyTorch Deep Learning Models using the C++ frontend Gettting started Clone the repo 1. https://github.com/mrdvince/pytorchcpp 2. cd fashionmnist or

Vince 0 Jul 13, 2021
PyTorch Autoencoders - Implementing a Variational Autoencoder (VAE) Series in Pytorch.

PyTorch Autoencoders Implementing a Variational Autoencoder (VAE) Series in Pytorch. Inspired by this repository Model List check model paper conferen

Subin An 8 Nov 21, 2022
PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices.

PyTorch-LIT PyTorch-LIT is the Lite Inference Toolkit (LIT) for PyTorch which focuses on easy and fast inference of large models on end-devices. With

Amin Rezaei 157 Dec 11, 2022