Official implementation of TMANet.

Related tags

Deep Learning TMANet
Overview

Temporal Memory Attention for Video Semantic Segmentation, arxiv

PWC PWC

Introduction

We propose a Temporal Memory Attention Network (TMANet) to adaptively integrate the long-range temporal relations over the video sequence based on the self-attention mechanism without exhaustive optical flow prediction. Our method achieves new state-of-the-art performances on two challenging video semantic segmentation datasets, particularly 80.3% mIoU on Cityscapes and 76.5% mIoU on CamVid with ResNet-50. (Accepted by ICIP2021)

If this codebase is helpful for you, please consider give me a star โญ ๐Ÿ˜Š .

image

Updates

2021/1: TMANet training and evaluation code released.

2021/6: Update README.md:

  • adding some Camvid dataset download links;
  • update 'camvid_video_process.py' script.

Usage

  • Install mmseg

    • Please refer to mmsegmentation to get installation guide.
    • This repository is based on mmseg-0.7.0 and pytorch 1.6.0.
  • Clone the repository

    git clone https://github.com/wanghao9610/TMANet.git
    cd TMANet
    pip install -e .
  • Prepare the datasets

    • Download Cityscapes dataset and Camvid dataset.

    • For Camvid dataset, we need to extract frames from downloaded videos according to the following steps:

      • Download the raw video from here, in which I provide a google drive link to download.
      • Put the downloaded raw video(e.g. 0016E5.MXF, 0006R0.MXF, 0005VD.MXF, 01TP_extract.avi) to ./data/camvid/raw .
      • Download the extracted images and labels from here and split.txt file from here, untar the tar.gz file to ./data/camvid , and we will get two subdirs "./data/camvid/images" (stores the images with annotations), and "./data/camvid/labels" (stores the ground truth for semantic segmentation). Reference the following shell command:
        cd TMANet
        cd ./data/camvid
        wget https://drive.google.com/file/d/1FcVdteDSx0iJfQYX2bxov0w_j-6J7plz/view?usp=sharing
        # or first download on your PC then upload to your server.
        tar -xf camvid.tar.gz 
      • Generate image_sequence dir frame by frame from the raw videos. Reference the following shell command:
        cd TMANet
        python tools/convert_datasets/camvid_video_process.py
    • For Cityscapes dataset, we need to request the download link of 'leftImg8bit_sequence_trainvaltest.zip' from Cityscapes dataset official webpage.

    • The converted/downloaded datasets store on ./data/camvid and ./data/cityscapes path.

      File structure of video semantic segmentation dataset is as followed.

      โ”œโ”€โ”€ data                                              โ”œโ”€โ”€ data                              
      โ”‚   โ”œโ”€โ”€ cityscapes                                    โ”‚   โ”œโ”€โ”€ camvid                        
      โ”‚   โ”‚   โ”œโ”€โ”€ gtFine                                    โ”‚   โ”‚   โ”œโ”€โ”€ images                    
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{img_suffix}                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{img_suffix}                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{img_suffix}       
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{img_suffix}                   โ”‚   โ”‚   โ”œโ”€โ”€ annotations               
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train.txt             
      โ”‚   โ”‚   โ”œโ”€โ”€ leftImg8bit                               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val.txt               
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ test.txt              
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{seg_map_suffix}               โ”‚   โ”‚   โ”œโ”€โ”€ labels                    
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{seg_map_suffix}               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{seg_map_suffix}   
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{seg_map_suffix}               โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{seg_map_suffix}   
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{seg_map_suffix}   
      โ”‚   โ”‚   โ”œโ”€โ”€ leftImg8bit_sequence                      โ”‚   โ”‚   โ”œโ”€โ”€ image_sequence            
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ train                                 โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ xxx{sequence_suffix}              โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ yyy{sequence_suffix}              โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{sequence_suffix}  
      โ”‚   โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ zzz{sequence_suffix}              
      โ”‚   โ”‚   โ”‚   โ”œโ”€โ”€ val                                   
      
  • Evaluation

    • Download the trained models for Cityscapes and Camvid. And put them on ./work_dirs/{config_file}
    • Run the following command(on Cityscapes):
    sh eval.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py
  • Training

    • Please download the pretrained ResNet-50 model, and put it on ./init_models .
    • Run the following command(on Cityscapes):
    sh train.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py

    Note: the above evaluation and training shell commands execute on Cityscapes, if you want to execute evaluation or training on Camvid, please replace the config file on the shell command with the config file of Camvid.

Citation

If you find TMANet is useful in your research, please consider citing:

@misc{wang2021temporal,
    title={Temporal Memory Attention for Video Semantic Segmentation}, 
    author={Hao Wang and Weining Wang and Jing Liu},
    year={2021},
    eprint={2102.08643},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgement

Thanks mmsegmentation contribution to the community!

Comments
  • Loading the pretrained model

    Loading the pretrained model

    1.I have a question concerning loading the camvid pretrained model. Once it is loaded, I am seeing the weights for ResNet architecture and I am seeing the two last tensors (fc.weight and fc.bias). These two tensors loaded are related to what? Are they related to the segmentation head or to which part of the network because I can't see the relation with ResNet??

    1. In the paper, it is mentioned that "we add the auxiliary segmentation loss at the low-level feature of the backbone (e.g. the stage 3 output of ResNet)", this means that it is added at the output of layer3 of ResNet? It is not clear for me? Could you plz clarify it?

    Best Regards,

    opened by LeilaMaria 6
  • inferencing with my own video

    inferencing with my own video

    I have a video which I want to test with cityscapes pretrained model. Is there a script to do this? Evaluation script seems to handle cityscapes validation dataset only

    FC

    opened by fjremnav 5
  • Cityscapse validation with bad results.

    Cityscapse validation with bad results.

    I followed the steps to prepare the cityscapse dataset and eval the pth file with script "sh eval.sh configs/video/cityscapes/tmanet_r50-d8_769x769_80k_cityscapes_video.py". But I got some problems.

    1. In your original config file, https://github.com/wanghao9610/TMANet/blob/master/configs/base/datasets/cityscapes_video_769x769.py, I found the ann_dir setting is 'gtFine_19/train'. I don't know whether gtFIne_19 is different to gtFine or not. But in order to run the validation code, I change it into gtFine. And as a result, the validation code could run.
    2. After validation, I got a strang result as shown in the picture below: image.

    Actually, I got a very good results at CamVid dataset just follow the same steps with cityscapes. Do I need some modifications to run the validation code at cityscapes dataset? Thanks for your reply.

    opened by wuwenbin970731 5
  • loading model camvid

    loading model camvid

    Hi,

    If I want to load the trained model for camvid, do I have to choose latest.pth zipped file as the checkpoint file or the data.pkl inside this file as the checkpoint??

    in eval.sh, CHECKPOINT="${WORK_DIR}/latest.pth" while latest.pth is a zippped file, so I leave it like that or I unzip the file and I put CHECKPOINT="${WORK_DIR}/data.pkl", where data.pkl is inside latest.pth??

    If leave latest.pth in eval.sh, I am getting this error: RuntimeError: [enforce fail at inline_container.cc:197] . file not found: archive/tensors/94018778419408

    Thank you for considering my question

    opened by LeilaMaria 4
  • bad result on custom dataset

    bad result on custom dataset

    Hi,

    Thank you for your work but i have a problem. I tried to train TMANet on my custom dataset, but it produced very bad result. Other image semantic segmentation model like OCRNet achieved far more better result on this same custom dataset than TMANet did. Is this situation normal for TMANet? and is there any way i can fix this problem? Thank you very much.

    opened by NekonoKoe 3
  • Did you defined a new class named **CityscapesVideoDataset** ?

    Did you defined a new class named **CityscapesVideoDataset** ?

    Did you defined a new class named CityscapesVideoDataset ? When I run the test.py, there is a error KeyError: 'CityscapesVideoDataset is not in the dataset registry'. If you defined a new class, could you share it please?

    opened by Wave2689 2
  • Asking for the matmul_norm variable why it's set as false

    Asking for the matmul_norm variable why it's set as false

    Hi Prof., I noticed that in your memory module, you set the 'matmul_norm' parameter as false, which in my view is opposite to the argument in the paper ใ€ŠAttention is All You Needใ€‹. May I ask for the reason and whether the result will be the same if I set this variable to 'true' or not?

    Have a NICE DAY!

    opened by lawrence-cj 2
  • About the CamVID dataset

    About the CamVID dataset

    May I ask that the CamVID data set you provided is a grayscale label, does this affect the result ?Other versions of the Camvid dataset are color-graph labels.Or is your output actually category plus location๏ผŸLooking forward to your reply๏ผ

    opened by Michelexiaoxiao 2
  • the labels of camvid

    the labels of camvid

    @wanghao9610 Hello!The size of the trained model you gave is different from the size I trained .I looked at the difference.What is the difference between a Memory Head and a TMANet๏ผŸ ๆ‚จๅฅฝ๏ผๆ‚จ็ป™ๅ‡บ็š„่ฎญ็ปƒๅฅฝ็š„ๆจกๅž‹ๅคงๅฐๅ’Œๆˆ‘่ฎญ็ปƒๅ‡บๆฅ็š„ๅคงๅฐไธๅคชไธ€ๆ ทใ€‚ๆˆ‘็œ‹ไบ†ไธ€ไธ‹ๅŒบๅˆซ๏ผŒ่ฏท้—ฎMemory Headๅ’ŒTMA Headๆœ‰ไป€ไนˆๅŒบๅˆซๅ—๏ผŸ

    opened by Michelexiaoxiao 2
  • do we need to convert the cityscape dataset with 35 classes to 19 classes?

    do we need to convert the cityscape dataset with 35 classes to 19 classes?

    Since the downloaded cityscape dataset has 35 classes, I can't train the previous code due to the setting of num_classes=19. So I modify the code to num_classes=35, and it begins to train. But I'm worrying about should I use some script to turn the 35 classes into 19 classes? Hope to listen to your advice, thanks.

    opened by QuLiao1117 2
  • `camvid_video_process.py` convert error Problem.

    `camvid_video_process.py` convert error Problem.

    shutil package missing.

    Traceback (most recent call last):
      File "tools/convert_datasets/camvid_video_process.py", line 294, in <module>
        shutil.rmtree(save_path)
    NameError: name 'shutil' is not defined
    
    opened by J911 2
Owner
wanghao
wanghao
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 3.2k Dec 30, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richardย Wang 443 Dec 6, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

CV Lab @ Yonsei University 87 Dec 30, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

null 46 Nov 9, 2022
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling ฤorฤ‘e Miladinoviฤ‡ โ€ƒ Aleksandar Staniฤ‡ โ€ƒ Stefan Bauer โ€ƒ Jรผrgen Schmid

Djordje Miladinovic 34 Jan 19, 2022