Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Last update: Nov 19, 2022

Related tags

Deep Learning Probabilistic-Hard-Attention

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Authors: Samrudhdhi Rangrej, James Clark Accepted to: BMVC'21 A recurrent attention model sequentially observes glimpses from an image and predicts a class label. At time t, the model actively observes a glimpse g_t and its coordinates l_t. Given g_t and l_t, the feed-forward module F extracts features f_t, and the recurrent module R updates a hidden state to h_t. Using an updated hidden state h_t, the linear classifier C predicts the class distribution p(y|h_t). At time t+1, the model assesses various candidate locations l before attending an optimal one. It predicts p(y|g,l,h_t) ahead of time and selects the candidate l that maximizes KL[p(y|g,l,h_t)||p(y|h_t)]. The model synthesizes the features of g using a Partial VAE to approximate p(y|g,l,h_t) without attending to the glimpse g. The normalizing flow-based encoder S predicts the approximate posterior q(z|h_t). The decoder D uses a sample z~q(z|h_t) to synthesize a feature map f^~ containing features of all glimpses. The model uses f^~(l) as features of a glimpse at location l and evaluates p(y|g,l,h_t)=p(y|f^~(l),h_t). Dashed arrows show a path to compute the lookahead class distribution p(y|f^~(l),h_t).

Requirements:

torch==1.8.1, torchvision==0.9.1, tensorboard==2.5.0, fire==0.4.0

Datasets:

SVHN (Let PyTorch download this dataset)
CIFAR-10 (Let PyTorch download this dataset)
CIFAR-100 (Let PyTorch download this dataset)
CINIC-10 (download from: https://datashare.is.ed.ac.uk/bitstream/handle/10283/3192/CINIC-10.tar.gz, visit https://github.com/BayesWatch/cinic-10)
TinyImageNet (download from: http://cs231n.stanford.edu/tiny-imagenet-200.zip)

Training a model

Use main.py to train and evaluate the model.

Arguments

dataset: one of 'svhn', 'cifar10', 'cifar100', 'cinic10', 'tinyimagenet'
datapath: path to the downloaded datasets
lr: learning rate
training_phase: one of 'first', 'second', 'third'
ccebal: coefficient for cross entropy loss
batch: batch-size for training
batchv: batch-size for evaluation
T: maximum time-step
logfolder: path to log directory
epochs: number of training epochs
pretrain_checkpoint: checkpoint for pretrained model from previous training phase

Example commands to train the model for SVHN dataset are as follows. Training Stage one

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='first' \
    --ccebal=1 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_first' \
    --epochs=1000 \
    --pretrain_checkpoint=None

Training Stage two

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='second' \
    --ccebal=0 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_second' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_first/weights_f_1000.pth'

Training Stage three

python3 main.py \
    --dataset='svhn' \
    --datapath='./data/' \
    --lr=0.001 \
    --training_phase='third' \
    --ccebal=16 \
    --batch=64 \
    --batchv=64 \
    --T=7 \
    --logfolder='./svhn_log_third' \
    --epochs=100 \
    --pretrain_checkpoint='./svhn_log_second/weights_f_100.pth'

Visualization of attention policy for a CIFAR-10 image

The top row shows the entire image and the EIG maps for t=1 to 6. The bottom row shows glimpses attended by our model. The model observes the first glimpse at a random location. Our model observes a glimpse of size 8x8. The glimpses overlap with the stride of 4, resulting in a 7x7 grid of glimpses. The EIG maps are of size 7x7 and are upsampled for the display. We display the entire image for reference; our model never observes the whole image.

Acknowledgement

Major parts of neural spline flows implementation are borrowed from Karpathy's pytorch-normalizing-flows.

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Relational Self-Attention: What's Missing in Attention for Video Understanding This repository is the official implementation of "Relational Self-Atte

43 Dec 7, 2022

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

cosFormer Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention Update log 2022/2/28 Add core code License This

120 Dec 15, 2022

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

1.1k Dec 30, 2022

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

Official code for: A Probabilistic Hard Attention Model For Sequentially Observed Scenes

Related tags

Overview

"A Probabilistic Hard Attention Model For Sequentially Observed Scenes"

Requirements:

Datasets:

Training a model

Visualization of attention policy for a CIFAR-10 image

Acknowledgement

You might also like...

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Official implementation of cosformer-attention in cosFormer: Rethinking Softmax in Attention

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling

A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

NU-Wave: A Diffusion Probabilistic Model for Neural Audio Upsampling @ INTERSPEECH 2021 Accepted

Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

Implementation of the final project of the course DDA6309 Probabilistic Graphical Model

Owner

Official repository for the paper "Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks"

The official implementation of NeurIPS 2021 paper: Finding Optimal Tangent Points for Reducing Distortions of Hard-label Attacks

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

Source code for the Paper: CombOptNet: Fit the Right NP-Hard Problem by Learning Integer Programming Constraints}

PyTorch Code for the paper "VSE++: Improving Visual-Semantic Embeddings with Hard Negatives"

Official PyTorch Implementation of paper "Deep 3D Mask Volume for View Synthesis of Dynamic Scenes", ICCV 2021.

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

DC3: A Learning Method for Optimization with Hard Constraints

Localizing Visual Sounds the Hard Way

Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).