Dual Attention Network for Scene Segmentation (CVPR2019)

Related tags

Deep Learning DANet
Overview

Dual Attention Network for Scene Segmentation(CVPR2019)

Jun Fu, Jing Liu, Haijie Tian, Yong Li, Yongjun Bao, Zhiwei Fang,and Hanqing Lu

Introduction

We propose a Dual Attention Network (DANet) to adaptively integrate local features with their global dependencies based on the self-attention mechanism. And we achieve new state-of-the-art segmentation performance on three challenging scene segmentation datasets, i.e., Cityscapes, PASCAL Context and COCO Stuff-10k dataset.

image

Cityscapes testing set result

We train our DANet-101 with only fine annotated data and submit our test results to the official evaluation server.

image

Updates

2020/9Renew the code, which supports Pytorch 1.4.0 or later!

2020/8:The new TNNLS version DRANet achieves 82.9% on Cityscapes test set (submit the result on August, 2019), which is a new state-of-the-arts performance with only using fine annotated dataset and Resnet-101. The code will be released in DRANet.

2020/7:DANet is supported on MMSegmentation, in which DANet achieves 80.47% with single scale testing and 82.02% with multi-scale testing on Cityscapes val set.

2018/9:DANet released. The trained model with ResNet101 achieves 81.5% on Cityscapes test set.

Usage

  1. Install pytorch

    • The code is tested on python3.6 and torch 1.4.0.
    • The code is modified from PyTorch-Encoding.
  2. Clone the resposity

    git clone https://github.com/junfu1115/DANet.git 
    cd DANet 
    python setup.py install
  3. Dataset

    • Download the Cityscapes dataset and convert the dataset to 19 categories.
    • Please put dataset in folder ./datasets
  4. Evaluation for DANet

    • Download trained model DANet101 and put it in folder ./experiments/segmentation/models/

    • cd ./experiments/segmentation/

    • For single scale testing, please run:

    • CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py --dataset citys --model danet --backbone resnet101 --resume  models/DANet101.pth.tar --eval --base-size 2048 --crop-size 768 --workers 1 --multi-grid --multi-dilation 4 8 16 --os 8 --aux --no-deepstem
    • Evaluation Result

      The expected scores will show as follows: DANet101 on cityscapes val set (mIoU/pAcc): 79.93/95.97(ss)

  5. Evaluation for DRANet

    • Download trained model DRANet101 and put it in folder ./experiments/segmentation/models/

    • Evaluation code is in folder ./experiments/segmentation/

    • cd ./experiments/segmentation/

    • For single scale testing, please run:

    • CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py --dataset citys --model dran --backbone resnet101 --resume  models/dran101.pth.tar --eval --base-size 2048 --crop-size 768 --workers 1 --multi-grid --multi-dilation 4 8 16 --os 8 --aux
    • Evaluation Result

      The expected scores will show as follows: DRANet101 on cityscapes val set (mIoU/pAcc): 81.63/96.62 (ss)

Citation

if you find DANet and DRANet useful in your research, please consider citing:

@article{fu2020scene,
  title={Scene Segmentation With Dual Relation-Aware Attention Network},
  author={Fu, Jun and Liu, Jing and Jiang, Jie and Li, Yong and Bao, Yongjun and Lu, Hanqing},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2020},
  publisher={IEEE}
}
@inproceedings{fu2019dual,
  title={Dual attention network for scene segmentation},
  author={Fu, Jun and Liu, Jing and Tian, Haijie and Li, Yong and Bao, Yongjun and Fang, Zhiwei and Lu, Hanqing},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={3146--3154},
  year={2019}
}

Acknowledgement

Thanks PyTorch-Encoding, especially the Synchronized BN!

Comments
  • Install pytorch commit fd25a2a

    Install pytorch commit fd25a2a

    I was running evaluation on cityscape and did not find args.test_scale anywhere. Its not mentioned in options.py aswell.

    Error:

    if len(args.test_scale) == 1:
    AttributeError: 'Namespace' object has no attribute 'test_scale'
    

    Running this from terminal: CUDA_VISIBLE_DEVICES=0 python3.6 test.py --dataset cityscapes --model danet --resume-dir cityscapes/model --crop-size 768 --workers 1 --backbone resnet101 --multi-grid --multi-dilation 4 8 16 --eval

    opened by sbharadwajj 33
  • Convert the Cityscapes dataset to 19 categories

    Convert the Cityscapes dataset to 19 categories

    Hi , I am new to segmentation field, and I meet the problem about how to convert the Cityscapes dataset to 19 categories. Look forward to your reply. Thank you !!

    opened by rinawhale 18
  • About the results on PASCAL Context

    About the results on PASCAL Context

    Question 1: The DANet performs well on PASCAL Context dataset, outperforming EncNet by 1%. Do that use any improvement strategies mentioned in the subsection 'Ablation Study for Improvement Strategies' ?
    As far as I known, the EncNet achieves 51.7 mIoU on the same dataset without using multiple scales strategy. Question 2: Before conducting softmax operation on x, we usually subtract the maximum value of x. But for CAM_Module, I see that energy_new = torch.max(energy, -1, keepdim=True)[0].expand_as(energy)-energy (https://github.com/junfu1115/DANet/blob/master/encoding/nn/attention.py#L75) rather than energy_new = energy- torch.max(energy, -1, keepdim=True)[0].expand_as(energy) followed by softmax op. Therefore, the weights of each channel are reverse based on the two different op before softmax. Can you share why the former is used?

    opened by qiulesun 9
  • How to test for custom Images

    How to test for custom Images

    I'm trying to run inference by modifying test.py to run on custom images. I got the following error with respect to the model checkpoint. I have downloaded the pre-trained model and have added it to danet/cityscapes/model path. Here is my modified code; its referencing the correct path of the checkpoint directory.

    args = Options().parse()
    model = get_segmentation_model(args.model, dataset=args.dataset,backbone=args.backbone, aux=args.aux,se_loss=args.se_loss, norm_layer=BatchNorm2d, base_size=args.base_size, crop_size=args.crop_size,multi_grid=args.multi_grid, multi_dilation=args.multi_dilation)
    
    if args.resume_dir is None or not os.path.isdir(args.resume_dir):
            raise RuntimeError("=> no checkpoint found at '{}'".format(args.resume_dir))
    for resume_file in os.listdir(args.resume_dir):
        if os.path.splitext(resume_file)[1] == '.tar':
            args.resume = os.path.join(args.resume_dir, resume_file)
            assert os.path.exists(args.resume)
            print(args.resume)
    
    checkpoint = torch.load(args.resume) # strict=False, so that it is compatible with old pytorch saved models
    model.load_state_dict(checkpoint['state_dict'], strict=False)
    print(model)
    

    I'm running this from my terminal

    CUDA_VISIBLE_DEVICES=0 python test_on_custom.py --model danet --resume-dir cityscapes/model/ --base-size 2048 --crop-size 768 --workers 1 --backbone resnet101 --multi-grid --multi-dilation 4 8 16 --eval

    This is the error:

    Traceback (most recent call last):
      File "test_on_custom.py", line 34, in <module>
        model.load_state_dict(checkpoint['state_dict'], strict=False)
      File "/datadrive/virtualenvs/torch3.5/lib/python3.5/site-packages/torch/nn/modules/module.py", line 719, in load_state_dict
        self.__class__.__name__, "\n\t".join(error_msgs)))
    RuntimeError: Error(s) in loading state_dict for DANet:
    	size mismatch for head.conv6.1.weight: copying a param of torch.Size([150, 512, 1, 1]) from checkpoint, where the shape is torch.Size([19, 512, 1, 1]) in current model.
    	size mismatch for head.conv6.1.bias: copying a param of torch.Size([150]) from checkpoint, where the shape is torch.Size([19]) in current model.
    	size mismatch for head.conv7.1.weight: copying a param of torch.Size([150, 512, 1, 1]) from checkpoint, where the shape is torch.Size([19, 512, 1, 1]) in current model.
    	size mismatch for head.conv7.1.bias: copying a param of torch.Size([150]) from checkpoint, where the shape is torch.Size([19]) in current model.
    	size mismatch for head.conv8.1.weight: copying a param of torch.Size([150, 512, 1, 1]) from checkpoint, where the shape is torch.Size([19, 512, 1, 1]) in current model.
    	size mismatch for head.conv8.1.bias: copying a param of torch.Size([150]) from checkpoint, where the shape is torch.Size([19]) in current model.
    
    

    Line 34 ismodel.load_state_dict(checkpoint['state_dict'], strict=False)

    opened by sbharadwajj 8
  • why center crop for testing instead of the whole image?

    why center crop for testing instead of the whole image?

    hi @junfu1115 i have a question: in BaseDataset, the func _val_sync_transform use center-crop to crop the image. is it reasonable to use center-crop to test the model instead of the whole image? thanks.

    opened by zimenglan-sysu-512 7
  • I CAN NOT get same result

    I CAN NOT get same result

    i trained the model using the pretrained model(resnet-101) with cityscapes dataset, but i only got 70.07 mIOU, How can i the same result as the paper? please~~~~ thanks ! I set the batch_size 15 (5 GPUs), epochs 120 .

    opened by zihao-lu 7
  • Reproducing Cityscapes

    Reproducing Cityscapes

    Running the command supplied by the repo,

    CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --dataset cityscapes --model danet --backbone resnet101 --checkname danet101 --base-size 1024 --crop-size 768 --epochs 240 --batch-size 8 --lr 0.003 --workers 2 --multi-grid --multi-dilation 4 8 16

    returned a mIOU of 0.735 at the end of 240 epochs. There is probably some randomness in running through the dataset, but I'm surprised it was that much lower than anticipated. Any advice on how to get closer to reported scores?

    opened by erikgaas 7
  • baseline FCN

    baseline FCN

      the baseline FCN can gain 70.03% mIOU in your paper,now I want to realize  the result based your paper.But my result is not good.
      I noticed the paper used LR=0.001,but the git used LR = 0.003
      Can you tell me more details about the training process.Thank you.
    
    opened by yyfyan 6
  • About the `Softmax` in the `CAM_Module`

    About the `Softmax` in the `CAM_Module`

    https://github.com/junfu1115/DANet/blob/799424e7fd9efa5723038ad0dc27c46682160e57/encoding/nn/attention.py#L75

    image

    Is your code trying to improve numerical stability? Maybe it should be in this form.

           energy_new = torch.max(energy, -1, keepdim=True)
           energy_new = energy_new[0].expand_as(energy)
           energy_new = energy - energy_new
           attention = self.softmax(energy_new)
    
    opened by lartpang 5
  • subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1

    subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1

    the environment configure: CUDA10.0 VS2017 pytorch: 1.0.0 python:3.7 Win10 Anaconda3 torch_encoding-0.4.5-py3.7

    when i run python, $ CUDA_VISIBLE_DEVICES=0,1 python train.py --dataset cityscapes --model danet --backbone resnet101 --checkname danet101 --base-size 1024 --crop-size 768 --epochs 240 --batch-size 6 --lr 0.003 --workers 1 --multi-grid --multi-dilation 4 8 16

    it could appeare below errors. C:\Users\MSIK\Anaconda3\lib\site-packages\torch\utils\cpp_extension.py:184: UserWarning: Error checking compiler version for c++: [WinError 2] 系统找不到指定的文件。 warnings.warn('Error checking compiler version for {}: {}'.format(compiler, error)) 信息: 用提供的模式无法找到文件。 Traceback (most recent call last): File "train.py", line 17, in <module> import encoding.utils as utils File "C:\Users\MSIK\Anaconda3\lib\site-packages\encoding\__init__.py", line 13, in <module> from . import nn, functions, parallel, utils, models, datasets, transforms File "C:\Users\MSIK\Anaconda3\lib\site-packages\encoding\nn\__init__.py", line 12, in <module> from .encoding import * File "C:\Users\MSIK\Anaconda3\lib\site-packages\encoding\nn\encoding.py", line 18, in <module> from ..functions import scaled_l2, aggregate, pairwise_cosine File "C:\Users\MSIK\Anaconda3\lib\site-packages\encoding\functions\__init__.py", line 2, in <module> from .encoding import * File "C:\Users\MSIK\Anaconda3\lib\site-packages\encoding\functions\encoding.py", line 14, in <module> from .. import lib File "C:\Users\MSIK\Anaconda3\lib\site-packages\encoding\lib\__init__.py", line 15, in <module> ], build_directory=cpu_path, verbose=False) File "C:\Users\MSIK\Anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 645, in load is_python_module) File "C:\Users\MSIK\Anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 814, in _jit_compile with_cuda=with_cuda) File "C:\Users\MSIK\Anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 859, in _write_ninja_file_and_build with_cuda=with_cuda) File "C:\Users\MSIK\Anaconda3\lib\site-packages\torch\utils\cpp_extension.py", line 1064, in _write_ninja_file 'cl']).decode().split('\r\n') File "C:\Users\MSIK\Anaconda3\lib\subprocess.py", line 389, in check_output **kwargs).stdout File "C:\Users\MSIK\Anaconda3\lib\subprocess.py", line 481, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['where', 'cl']' returned non-zero exit status 1.

    How I can solve this problem? Anyone can help me? Thanks a lot.

    opened by xdsonglinliu 4
  • The difference between the paper and the code

    The difference between the paper and the code

    Given a local feature A, first feed it into a convolution layers with batch normalization and ReLU layers to generate two new feature maps B and C.(in paper) But, there is no BN and ReLU in code self.query_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1) and self.key_conv = Conv2d(in_channels=in_dim, out_channels=in_dim//8, kernel_size=1), in PAM_Module. Which is right?

    opened by Tramac 4
  • CVE-2007-4559 Patch

    CVE-2007-4559 Patch

    Patching CVE-2007-4559

    Hi, we are security researchers from the Advanced Research Center at Trellix. We have began a campaign to patch a widespread bug named CVE-2007-4559. CVE-2007-4559 is a 15 year old bug in the Python tarfile package. By using extract() or extractall() on a tarfile object without sanitizing input, a maliciously crafted .tar file could perform a directory path traversal attack. We found at least one unsantized extractall() in your codebase and are providing a patch for you via pull request. The patch essentially checks to see if all tarfile members will be extracted safely and throws an exception otherwise. We encourage you to use this patch or your own solution to secure against CVE-2007-4559. Further technical information about the vulnerability can be found in this blog.

    If you have further questions you may contact us through this projects lead researcher Kasimir Schulz.

    opened by TrellixVulnTeam 0
  • python setup.py install

    python setup.py install

    我在对encoding/lib/cpu里的setup.py执行python setup.py install时,报错LINK : fatal error LNK1181: 无法打开输入文件“E:\test\lib\cpu\build\temp.win-amd64-3.6\Release\operator.obj”,请问这个要怎么解决?

    opened by this-tree 0
  • from option import Options

    from option import Options

    Hello!Do you have time to help? My report here is wrong from option import Options When I use pip install option, I still report an error

    Traceback (most recent call last): File "/home/pxg/DAN/DANet05/experiments/segmentation/train.py", line 24, in from option import Options ImportError: cannot import name 'Options'

    opened by wlj567 0
  • ValueError: operands could not be broadcast together with shapes (1,1024,2048) (1024,2048,4)

    ValueError: operands could not be broadcast together with shapes (1,1024,2048) (1024,2048,4)

    I used the command CUDA_VISIBLE_DEVICES=0,1,2,3 python test.py --dataset citys --model danet --backbone resnet101 --resume models/DANet101.pth.tar --eval --base-size 2048 --crop-size 768 --workers 1 --multi-grid --multi-dilation 4 8 16 --os 8 --aux --no-deepstem But I got a lot of ValueError like the title shows. And I got very low results: pixAcc: 0.1265, mIoU: 0.0180

    I referred to https://github.com/junfu1115/DANet/issues/12 and https://github.com/junfu1115/DANet/issues/120 to process Cityscapes and generated train_fine.txt,val_fine.txt,test_fine.txt. What`s wrong with me? Thanks.

    opened by TangToy 0
Owner
Jun Fu
Jun Fu
DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

DFFNet Paper DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation. Xiangyan Tang, Wenxuan Tu, Keqiu Li, J

null 4 Sep 23, 2022
A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022)

A Text Attention Network for Spatial Deformation Robust Scene Text Image Super-resolution (CVPR2022) https://arxiv.org/abs/2203.09388 Jianqi Ma, Zheto

MA Jianqi, shiki 104 Jan 5, 2023
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

null 151 Dec 26, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
Automatic number plate recognition using tech: Yolo, OCR, Scene text detection, scene text recognation, flask, torch

Automatic Number Plate Recognition Automatic Number Plate Recognition (ANPR) is the process of reading the characters on the plate with various optica

Meftun AKARSU 52 Dec 22, 2022
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation"

DSP Official implementation of "DSP: Dual Soft-Paste for Unsupervised Domain Adaptive Semantic Segmentation". Accepted by ACM Multimedia 2021. Authors

null 20 Oct 24, 2022
Implementation of CVPR 2020 Dual Super-Resolution Learning for Semantic Segmentation

Dual super-resolution learning for semantic segmentation 2021-01-02 Subpixel Update Happy new year! The 2020-12-29 update of SISR with subpixel conv p

Sam 79 Nov 24, 2022
PyTorch code for our paper "Attention in Attention Network for Image Super-Resolution"

Under construction... Attention in Attention Network for Image Super-Resolution (A2N) This repository is an PyTorch implementation of the paper "Atten

Haoyu Chen 71 Dec 30, 2022
Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021)

Deep Dual Consecutive Network for Human Pose Estimation (CVPR2021) Introduction This is the official code of Deep Dual Consecutive Network for Human P

null 295 Dec 29, 2022
《Dual-Resolution Correspondence Network》(NeurIPS 2020)

Dual-Resolution Correspondence Network Dual-Resolution Correspondence Network, NeurIPS 2020 Dependency All dependencies are included in asset/dualrcne

Active Vision Laboratory 45 Nov 21, 2022
InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images

InDuDoNet+: A Model-Driven Interpretable Dual Domain Network for Metal Artifact Reduction in CT Images Hong Wang, Yuexiang Li, Haimiao Zhang, Deyu Men

Hong Wang 4 Dec 27, 2022
Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Segformer - Pytorch Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch. Install $ pip install segformer-pytorch

Phil Wang 208 Dec 25, 2022