FANet - Real-time Semantic Segmentation with Fast Attention

Related tags

Deep Learning FANet
Overview

FANet

Real-time Semantic Segmentation with Fast Attention

Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko , Stan Sclaroff

[Paper Link] [Project Page]

Accurate semantic segmentation requires rich contextual cues (large receptive fields) and fine spatial details (high resolution), both of which incur high computational costs. In this paper, we propose a novel architecture that addresses both challenges and achieves state-of-the-art performance for semantic segmentation of high-resolution images and videos in real-time. The proposed architecture relies on our fast attention, which is a simple modification of the popular self-attention mechanism and captures the same rich contextual information at a small fraction of the computational cost, by changing the order of operations. Moreover, to efficiently process high-resolution input, we apply an additional spatial reduction to intermediate feature stages of the network with minimal loss in accuracy thanks to the use of the fast attention module to fuse features. We validate our method with a series of experiments, and show that results on multiple datasets demonstrate superior performance with better accuracy and speed compared to existing approaches for real-time semantic segmentation. On Cityscapes, our network achieves 74.4% mIoU at 72 FPS and 75.5% mIoU at 58 FPS on a single Titan X GPU, which is ~50% faster than the state-of-the-art while retaining the same accuracy.

Comments
  • Features extracted by ResNet are different for BiseNet and FANet

    Features extracted by ResNet are different for BiseNet and FANet

    Thank you for the great work!!

    The below lines are from the Resnet model return adapted for FANet

    feat4 = self.layer1(x)
    feat8 = self.layer2(feat4) # 1/8
    feat16 = self.layer3(feat8) # 1/16
    feat32 = self.layer4(feat16) # 1/32
    return feat4, feat8, feat16, feat32
    

    The comments specify that feat8, feat16, feat32 are feature maps of 1/8, 1/16, 1/32 of the image size but the actual sizes are 1/16, 1/32, 1/64.

    I also understand why this is happening . Below is the way we create Resnet18 model for FANet

    def Resnet18(pretrained=True, norm_layer=None, **kwargs):
        model = ResNet(BasicBlock, [2,2,2,2],[2,2,2,2], norm_layer=norm_layer)
        if pretrained:
            model.init_weight(model_zoo.load_url(model_urls['resnet18']))
        return model
    

    which is different from the way we create it in BiseNet

    def Resnet18(pretrained=False, **kwargs):
        model = ResNet(BasicBlock, [2, 2, 2, 2],[1, 2, 2, 2])
        if pretrained:
            model.init_weight(model_zoo.load_url(model_urls['resnet18']))
        return model
    

    The stride value for the first BasicBlock is different.

    Could you please clarify that if the additional down-sampling is intended for the FANet or not.

    Will it be possible for you to share the trained model as well? I don't have access to good GPUs to train it on Cityscapes dataset. It would be nice if you can share the trained model

    opened by Eashwar93 2
  • Why not consider BN for test time calculation

    Why not consider BN for test time calculation

    Hi,For fairness comparison, previous work including ICnet. BiSegNet, SFnet report the speed with bn, here you merge bn in conv which results in much higher speed.

    opened by lxtGH 2
  • Torch/Cuda version

    Torch/Cuda version

    Hi Author, thanks for you work : ) Could you please release the torch and cuda version for this repo. It raise "RuntimeError: cuda runtime error (100) : no CUDA-capable device is detected at /pytorch/aten/src/THC/THCGeneral.cpp:50" error when running the cmd CUDA_VISIBLE_DEVICES=1 python3 speeding.py on torch==1.1.0 and CUDA 10.2. Thanks!

    opened by MaureenZOU 0
  • Training Details

    Training Details

    Hello once again,

    I tried creating a training model and FANet-18 with cityscapes dataset. I replaced the InPlaceABN layers with normal BN followed by Activation as I needed to parse the trained model to ONNX for deploying in my application.

    These are my training configurations highly adapted from the paper:

    1. Mini-Batch SGD with Batch Size 4 as I only have 8 gigs of GPU Memory, weight decay = 5e-4, momentum = 0.9
    2. Initial Learning Rate(LR) = 1e-2, with update to LR multiplied by a factor (1-(iter/max_iter)pow(2))
    3. Data Augmentation - Horizontal Flipping, random scaling (0.75 to 2)
    4. Training iterations 80000

    I resulted with a OHEM Cross Entropy loss of 0.3941 in the final iteration

    I am yet to check the mIOU.

    As a preliminary discussion I would like to compare it with BiseNet which was trained in a similar fashion but with auxiliary losses and resulted with a OHEM Cross Entropy Loss of 0.2947 which resulted in a mIOU of 0.63

    Could you please give me more details on the training especially

    1. What is the number of Iterations you trained the model for?
    2. What was the Final Cross Entropy loss you ended up with?
    3. Did you use Auxiliary losses as well for better Convergence resulting in lower loss?
    4. Did you have any specific modified version of Cross Entropy Loss to achieve better convergence?

    Is there any other thing that I am missing out to achieve better results

    opened by Eashwar93 0
  • why only 2 frames for spatial-temporal context aggregation

    why only 2 frames for spatial-temporal context aggregation

    Hi. Your paper is very interesting and inspirational to read. I was wondering why you just integrated the features of ONE neighboring frame to facilitate the inference of current frame. Have you experimented on more frames? What's the effect?

    Thank you.

    opened by baibaidj 0
  • Resnet block feature map size

    Resnet block feature map size

    Hi, Thanks for your great work. I have a question regarding the feature map size of resnet blocks. In your paper you say that the first res-block produces a feature map of h/4 x w/4 resolution. But in the code the resolution of the feature map is h/8 x w/8

    Is this an error ? Or the implementation doesn't fully reproduce the paper description ?

    Thank you

    opened by Pelursos 1
  • class BatchNorm2D in fanet.py

    class BatchNorm2D in fanet.py

    I am confusing about it! You defined batchnorm, but in the forward function, you only use activation, can you explain this?

    ` class BatchNorm2d(nn.BatchNorm2d): #(conv => BN => ReLU) * 2 def init(self, num_features, activation='none'): super(BatchNorm2d, self).init(num_features=num_features) if activation == 'leaky_relu': self.activation = nn.LeakyReLU() elif activation == 'none': self.activation = lambda x:x else: raise Exception("Accepted activation: ['leaky_relu']")

    def forward(self, x):
        return self.activation(x)
    

    `

    opened by TranThanh96 10
  • Trained model with mIoU of 75.5 for fa2

    Trained model with mIoU of 75.5 for fa2

    Hi, Can you please provide the trained model for the last experiment reported in Table VI of the paper, which achieved mIoU of 75.5: "TABLE VI: Video semantic segmentation on Cityscapes. “+Temp” indicates FANet with spatial-temporal attention (t=2)"

    Thanks

    opened by MahdiehP 0
Owner
Ping Hu
Ping Hu
HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the weights at each decoder block vary spatially. For this purpose, we design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features

Yuval Nirkin 182 Dec 14, 2022
DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

DFFNet Paper DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation. Xiangyan Tang, Wenxuan Tu, Keqiu Li, J

null 4 Sep 23, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

null 90 Dec 29, 2022
code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

Tong 8 Apr 25, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time.

BBB Face Recognizer Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time. Instalati

Rafael Azevedo 232 Dec 24, 2022
YolactEdge: Real-time Instance Segmentation on the Edge

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images.

Haotian Liu 1.1k Jan 6, 2023
OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

null 45 Dec 13, 2022
implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

haochen wang 64 Dec 14, 2022
A keras-based real-time model for medical image segmentation (CFPNet-M)

CFPNet-M: A Light-Weight Encoder-Decoder Based Network for Multimodal Biomedical Image Real-Time Segmentation This repository contains the implementat

null 268 Nov 27, 2022
A lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look At CoefficienTs)

Real-time Instance Segmentation and Lane Detection This is a lane detection integrated Real-time Instance Segmentation based on YOLACT (You Only Look

Jin 4 Dec 30, 2022
TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

凌逆战 16 Dec 30, 2022
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

null 97 Dec 17, 2022
Realtime segmentation with ENet, the fast and accurate segmentation net.

Enet This is a realtime segmentation net with almost 22 fps on GTX1080 ti, and the model size is very small with only 28M. This repo contains the infe

JinTian 14 Aug 30, 2022