This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Gautam Singh

Last update: Dec 26, 2022

Related tags

Deep Learning slate

Overview

SLATE

This is the official source code for SLATE. We provide the code for the model, the training code and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Arxiv: https://arxiv.org/pdf/2110.11405.pdf
Project Page: https://sites.google.com/view/slate-autoencoder

Dataset

The current release provides a boilerplate code to train the model on the 3D Shapes dataset. The dataset class is provided in shapes_3d.py. You can edit or replace this class if you need to run the code on a different dataset. The 3D Shapes dataset can be downloaded from the official URL https://console.cloud.google.com/storage/browser/3d-shapes. This should produce a dataset file 3dshapes.h5. During training, the path to this dataset file needs to be provided using the argument --data_path.

Training

To train the model, simply execute:

python train.py

Check train.py to see the full list of training arguments.

Outputs

The training code produces Tensorboard logs. To see these logs, run Tensorboard on the logging directory that was provided in the training argument --log_path. These logs contain the training loss curves and visualizations of reconstructions and object attention maps.

Hyperparameters of Interest

Learning Rate can be tuned using the training argument --lr_main and different choices can affect the characteristics of the object attention maps.
Number of Slots can be tuned using the training argument --num_slots. Number of slots should be set higher than the number of objects you expect to see in the images.
Number of Slot Attention Iterations can be tuned using the training argument --num_iterations. In general, keep the number of iterations as small as possible because too many iterations can prevent slots from learning to diversify and attach to different objects.

Code Files

This repository provides the following files.

train.py contains the main code for running the training.
slate.py provides the model class for SLATE.
shapes_3d.py contains the dataset class for 3D Shapes dataset.
dvae.py provides the encoder and the decoder for Discrete VAE.
slot_attn.py provides the model class for Slot Attention encoder.
transformer.py provides the model classes for Transformer.
utils.py provides helper classes and functions for the implementation.

Comments

Question about the meaning of slots

Dear author,

I test your code on the CUB-200-2011 birds dataset, and I get the slot attention as illustrated in the following image. I set the slot_num=2 and the two attention maps correspond to the top part and the bottom parts of the image, respectively. Could provide any insight into the reason for this phenomenon? The lr_dvae and lr_main are set to 3e-3 and 1e-3 and other parameters are the same as your code.

opened by lvyunqiu 0
Question about multi-head slot attention

Hello! First of all, thank you for the great work and code!

I have a question about multi-head slot attention, especially for the dimension where the softmax function is applied.

While reading, I saw that softmax is applied across every slot and head in your work. Can you please explain why it is not only applied across slots?

opened by kdwonn 0

This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Deep Virtual Markers This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21 Getting Started Get sa

45 Oct 7, 2022

Learning Skeletal Articulations with Neural Blend Shapes

This repository provides an end-to-end library for automatic character rigging and blend shapes generation as well as a visualization tool. It is based on our work Learning Skeletal Articulations with Neural Blend Shapes that is published in SIGGRAPH 2021.

504 Dec 30, 2022

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection Acknowledgement We implement our model, BtcDet, based on [OpenPcdet 0.3.0]. Insta

163 Dec 19, 2022

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

3 Jul 8, 2022

This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transformations at CVPR'21. According to some product reasons, we are not planning to release the training/testing codes and models. However, we will release the dataset and the scripts to prepare the dataset.

TransFill-Reference-Inpainting This is the official repo for TransFill: Reference-guided Image Inpainting by Merging Multiple Color and Spatial Transf

80 Dec 8, 2022

Transformer model implemented with Pytorch

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

72 Jan 6, 2023

This is the official source code for SLATE. We provide the code for the model, the training code, and a dataset loader for the 3D Shapes dataset. This code is implemented in Pytorch.

Related tags

Overview

SLATE

Dataset

Training

Outputs

Hyperparameters of Interest

Code Files

You might also like...

This repository contains the accompanying code for Deep Virtual Markers for Articulated 3D Shapes, ICCV'21

Learning Skeletal Articulations with Neural Blend Shapes

Behind the Curtain: Learning Occluded Shapes for 3D Object Detection

Implementation based on Paper - Learning a Probabilistic Latent Space of Object Shapes via 3D Generative-Adversarial Modeling

Transformer model implemented with Pytorch

LightSeq is a high performance training and inference library for sequence processing and generation implemented in CUDA

YoloV5 implemented by TensorFlow2 , with support for training, evaluation and inference.

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

Comments

Question about the meaning of slots

Question about multi-head slot attention

Owner

Gautam Singh

In this project we investigate the performance of the SetCon model on realistic video footage. Therefore, we implemented the model in PyTorch and tested the model on two example videos.

Official code release for ICCV 2021 paper SNARF: Differentiable Forward Skinning for Animating Non-rigid Neural Implicit Shapes.

Pytorch ImageNet1k Loader with Bounding Boxes.

Official implementation of NPMs: Neural Parametric Models for 3D Deformable Shapes - ICCV 2021

The official implementation of the research paper "DAG Amendment for Inverse Control of Parametric Shapes"

IDA file loader for UF2, created for the DEFCON 29 hardware badge

SysWhispers Shellcode Loader

3D AffordanceNet is a 3D point cloud benchmark consisting of 23k shapes from 23 semantic object categories, annotated with 56k affordance annotations and covering 18 visual affordance categories.

SymmetryNet: Learning to Predict Reflectional and Rotational Symmetries of 3D Shapes from Single-View RGB-D Images

Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.