FANet - Real-time Semantic Segmentation with Fast Attention

Ping Hu

Last update: Nov 30, 2022

Related tags

Deep Learning FANet

Overview

FANet

Real-time Semantic Segmentation with Fast Attention

Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko , Stan Sclaroff

[Paper Link] [Project Page]

Accurate semantic segmentation requires rich contextual cues (large receptive fields) and fine spatial details (high resolution), both of which incur high computational costs. In this paper, we propose a novel architecture that addresses both challenges and achieves state-of-the-art performance for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast attention, which is a simple modification of the popular self-attention mechanism and captures the same rich contextual information at a small fraction of the computational cost, by changing the order of operations. Moreover, to efficiently process high-resolution input, we apply an additional spatial reduction to intermediate feature stages of the network with minimal loss in accuracy thanks to the use of the fast attention module to fuse features. We validate our method with a series of experiments, and show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches for real-time semantic segmentation. On Cityscapes, our network achieves 74.4% mIoU at 72 FPS and 75.5% mIoU at 58 FPS on a single Titan X GPU, which is ~50% faster than the state-of-the-art while retaining the same accuracy.

Comments

Features extracted by ResNet are different for BiseNet and FANet
Thank you for the great work!!

The below lines are from the Resnet model return adapted for FANet

feat4 = self.layer1(x) feat8 = self.layer2(feat4) # 1/8 feat16 = self.layer3(feat8) # 1/16 feat32 = self.layer4(feat16) # 1/32 return feat4, feat8, feat16, feat32

The comments specify that feat8, feat16, feat32 are feature maps of 1/8, 1/16, 1/32 of the image size but the actual sizes are 1/16, 1/32, 1/64.

I also understand why this is happening . Below is the way we create Resnet18 model for FANet

def Resnet18(pretrained=True, norm_layer=None, **kwargs): model = ResNet(BasicBlock, [2,2,2,2],[2,2,2,2], norm_layer=norm_layer) if pretrained: model.init_weight(model_zoo.load_url(model_urls['resnet18'])) return model

which is different from the way we create it in BiseNet

def Resnet18(pretrained=False, **kwargs): model = ResNet(BasicBlock, [2, 2, 2, 2],[1, 2, 2, 2]) if pretrained: model.init_weight(model_zoo.load_url(model_urls['resnet18'])) return model

The stride value for the first BasicBlock is different.

Could you please clarify that if the additional down-sampling is intended for the FANet or not.

Will it be possible for you to share the trained model as well? I don't have access to good GPUs to train it on Cityscapes dataset. It would be nice if you can share the trained model
opened by Eashwar93 2
Why not consider BN for test time calculation

Hi，For fairness comparison, previous work including ICnet. BiSegNet, SFnet report the speed with bn, here you merge bn in conv which results in much higher speed.

opened by lxtGH 2
Torch/Cuda version

Hi Author, thanks for you work : ) Could you please release the torch and cuda version for this repo. It raise "RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50" error when running the cmd CUDA_VISIBLE_DEVICES=1 python3 speeding.py on torch==1.1.0 and CUDA 10.2. Thanks!

opened by MaureenZOU 0
Training Details
Hello once again,

I tried creating a training model and FANet-18 with cityscapes dataset. I replaced the InPlaceABN layers with normal BN followed by Activation as I needed to parse the trained model to ONNX for deploying in my application.

These are my training configurations highly adapted from the paper:

Mini-Batch SGD with Batch Size 4 as I only have 8 gigs of GPU Memory, weight decay = 5e-4, momentum = 0.9

Initial Learning Rate(LR) = 1e-2, with update to LR multiplied by a factor (1-(iter/max_iter)pow(2))

Data Augmentation - Horizontal Flipping, random scaling (0.75 to 2)

Training iterations 80000

I resulted with a OHEM Cross Entropy loss of 0.3941 in the final iteration

I am yet to check the mIOU.

As a preliminary discussion I would like to compare it with BiseNet which was trained in a similar fashion but with auxiliary losses and resulted with a OHEM Cross Entropy Loss of 0.2947 which resulted in a mIOU of 0.63

Could you please give me more details on the training especially

What is the number of Iterations you trained the model for?

What was the Final Cross Entropy loss you ended up with?

Did you use Auxiliary losses as well for better Convergence resulting in lower loss?

Did you have any specific modified version of Cross Entropy Loss to achieve better convergence?

Is there any other thing that I am missing out to achieve better results
opened by Eashwar93 0
why only 2 frames for spatial-temporal context aggregation

Hi. Your paper is very interesting and inspirational to read. I was wondering why you just integrated the features of ONE neighboring frame to facilitate the inference of current frame. Have you experimented on more frames? What's the effect?

Thank you.

opened by baibaidj 0
Resnet block feature map size

Hi, Thanks for your great work. I have a question regarding the feature map size of resnet blocks. In your paper you say that the first res-block produces a feature map of h/4 x w/4 resolution. But in the code the resolution of the feature map is h/8 x w/8

Is this an error ? Or the implementation doesn't fully reproduce the paper description ?

Thank you

opened by Pelursos 1
class BatchNorm2D in fanet.py
I am confusing about it! You defined batchnorm, but in the forward function, you only use activation, can you explain this?

` class BatchNorm2d(nn.BatchNorm2d): #(conv => BN => ReLU) * 2 def init(self, num_features, activation='none'): super(BatchNorm2d, self).init(num_features=num_features) if activation == 'leaky_relu': self.activation = nn.LeakyReLU() elif activation == 'none': self.activation = lambda x:x else: raise Exception("Accepted activation: ['leaky_relu']")

def forward(self, x): return self.activation(x)

`
opened by TranThanh96 10
Trained model with mIoU of 75.5 for fa2

Hi, Can you please provide the trained model for the last experiment reported in Table VI of the paper, which achieved mIoU of 75.5: "TABLE VI: Video semantic segmentation on Cityscapes. “+Temp” indicates FANet with spatial-temporal attention (t=2)"

Thanks

opened by MahdiehP 0

Owner

Ping Hu

GitHub

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the weights at each decoder block vary spatially. For this purpose, we design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features

182 Dec 14, 2022

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

DFFNet Paper DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation. Xiangyan Tang, Wenxuan Tu, Keqiu Li, J

4 Sep 23, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

109 Dec 28, 2022

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

90 Dec 29, 2022

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

8 Apr 25, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time.

BBB Face Recognizer Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time. Instalati

232 Dec 24, 2022

YolactEdge: Real-time Instance Segmentation on the Edge

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images.

1.1k Jan 6, 2023

OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

45 Dec 13, 2022

implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

64 Dec 14, 2022

A keras-based real-time model for medical image segmentation (CFPNet-M)

CFPNet-M: A Light-Weight Encoder-Decoder Based Network for Multimodal Biomedical Image Real-Time Segmentation This repository contains the implementat

268 Nov 27, 2022

A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Real-time Instance Segmentation and Lane Detection This is a lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look

4 Dec 30, 2022

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

16 Dec 30, 2022

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

134 Dec 19, 2022

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

147 Dec 3, 2022

Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

97 Dec 17, 2022

Realtime segmentation with ENet, the fast and accurate segmentation net.

Enet This is a realtime segmentation net with almost 22 fps on GTX1080 ti, and the model size is very small with only 28M. This repo contains the infe

14 Aug 30, 2022