LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

Overview

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

python-image pytorch-image

Table of Contents:

Introduction

This project contains the code (Note: The code is test in the environment with python=3.6, cuda=9.0, PyTorch-0.4.1, also support Pytorch-0.4.1+) for: LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation by Yu Wang.

The extensive computational burden limits the usage of CNNs in mobile devices for dense estimation tasks, a.k.a semantic segmentation. In this paper, we present a lightweight network to address this problem, namely **LEDNet**, which employs an asymmetric encoder-decoder architecture for the task of real-time semantic segmentation.More specifically, the encoder adopts a ResNet as backbone network, where two new operations, channel split and shuffle, are utilized in each residual block to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, an attention pyramid network (APN) is employed in the decoder to further lighten the entire network complexity. Our model has less than 1M parameters, and is able to run at over 71 FPS on a single GTX 1080Ti GPU card. The comprehensive experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off on Cityscapes dataset. and becomes an effective method for real-time semantic segmentation tasks.

Project-Structure

├── datasets  # contains all datasets for the project
|  └── cityscapes #  cityscapes dataset
|  |  └── gtCoarse #  Coarse cityscapes annotation
|  |  └── gtFine #  Fine cityscapes annotation
|  |  └── leftImg8bit #  cityscapes training image
|  └── cityscapesscripts #  cityscapes dataset label convert scripts!
├── utils
|  └── dataset.py # dataloader for cityscapes dataset
|  └── iouEval.py # for test 'iou mean' and 'iou per class'
|  └── transform.py # data preprocessing
|  └── visualize.py # Visualize with visdom 
|  └── loss.py # loss function 
├── checkpoint
|  └── xxx.pth # pretrained models encoder form ImageNet
├── save
|  └── xxx.pth # trained models form scratch 
├── imagenet-pretrain
|  └── lednet_imagenet.py # 
|  └── main.py # 
├── train
|  └── lednet.py  # model definition for semantic segmentation
|  └── main.py # train model scripts
├── test
|  |  └── dataset.py 
|  |  └── lednet.py # model definition
|  |  └── lednet_no_bn.py # Remove the BN layer in model definition
|  |  └── eval_cityscapes_color.py # Test the results to generate RGB images
|  |  └── eval_cityscapes_server.py # generate result uploaded official server
|  |  └── eval_forward_time.py # Test model inference time
|  |  └── eval_iou.py 
|  |  └── iouEval.py 
|  |  └── transform.py 

Installation

  • Python 3.6.x. Recommended using Anaconda3
  • Set up python environment
pip3 install -r requirements.txt
  • Env: PyTorch_0.4.1; cuda_9.0; cudnn_7.1; python_3.6,

  • Clone this repository.

git clone https://github.com/xiaoyufenfei/LEDNet.git
cd LEDNet-master

Datasets

├── leftImg8bit
│   ├── train
│   ├──  val
│   └── test
├── gtFine
│   ├── train
│   ├──  val
│   └── test
├── gtCoarse
│   ├── train
│   ├── train_extra
│   └── val

Training-LEDNet

  • For help on the optional arguments you can run: python main.py -h

  • By default, we assume you have downloaded the cityscapes dataset in the ./data/cityscapes dir.

  • To train LEDNet using the train/main.py script the parameters listed in main.py as a flag or manually change them.

python main.py --savedir logs --model lednet --datadir path/root_directory/  --num-epochs xx --batch-size xx ...

Resuming-training-if-decoder-part-broken

  • for help on the optional arguments you can run: python main.py -h
python main.py --savedir logs --name lednet --datadir path/root_directory/  --num-epochs xx --batch-size xx --decoder --state "../save/logs/model_best_enc.pth.tar"...

Testing

  • the trained models of training process can be found at here. This may not be the best one, you can train one from scratch by yourself or Fine-tuning the training decoder with model encoder pre-trained on ImageNet, For instance
more details refer ./test/README.md

Results

  • Please refer to our article for more details.
Method Dataset Fine Coarse IoU_cla IoU_cat FPS
LEDNet cityscapes yes yes 70.6​% 87.1​%​ 70​+​

qualitative segmentation result examples:

Citation

If you find this code useful for your research, please use the following BibTeX entry.

 @article{wang2019lednet,
  title={LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation},
  author={Wang, Yu and Zhou, Quan and Liu, Jia and Xiong,Jian and Gao, Guangwei and Wu, Xiaofu, and Latecki Jan Longin},
  journal={arXiv preprint arXiv:1905.02423},
  year={2019}
}

Tips

  • Limited by GPU resources, the project results need to be further improved...
  • It is recommended to pre-train Encoder on ImageNet and then Fine-turning Decoder part. The result will be better.

Reference

  1. Deep residual learning for image recognition
  2. Enet: A deep neural network architecture for real-time semantic segmentation
  3. Erfnet: Efficient residual factorized convnet for real-time semantic segmentation
  4. Shufflenet: An extremely efficient convolutional neural network for mobile devices
Comments
  • problems about cityscripts

    problems about cityscripts

    Can you tell me where the cityscapesscripts are used? I download the whole dataset including cityscapes and cityscapesscripts, but I can't find the use of scripts!

    opened by guaiwuguba 7
  • Decoder channels

    Decoder channels

    I tried to implement LEDNet in keras and found that I have ~2.5m parameters which is incorrect according to the paper (claim to have ~900K). I saw you wrote two versions of LEDNet one with ~900K and the other like my implementation with ~2.5m. In the 900K implementation, you downsampled the number of channels to 1 straight away from the first conv layer in the downsampling branch of the decoder (why is that?). In the attached figure (from the paper) the number of the channels should remain the same through the first two levels of the downsampling branch of the decoder (and they still claim to have just ~900k parameters). What am I missing here? is this problematic paper?

    image

    opened by AvivSham 7
  • Question about reproduction

    Question about reproduction

    I want to implement your paper, but have several questions:

    1. is there missing first convolution before downsampling in the paper ? (from channel 3 to channel 32)
    2. Did you pretrained you encoder in imagenet or just scratch ? (And whould you mind to tell the training epochs you use ?)
    3. What kind of weight init strategy you used?
    4. The training image size

    Here are my implementaion : https://github.com/AceCoooool/LEDNet But there is still some gap to the paper

    opened by AceCoooool 5
  • 训练时出现问题

    训练时出现问题

    你好,当我运行python main.py --savedir logs --model lednet --datadir D:/bishe/LEDNet-master/datasets/cityscapes时,跳出错误

    ========== ENCODER TRAINING =========== D:/bishe/LEDNet-master/datasets/cityscapes\leftImg8bit/train D:/bishe/LEDNet-master/datasets/cityscapes\leftImg8bit/val <class 'utils.loss.CrossEntropyLoss2d'> ----- TRAINING - EPOCH 1 ----- LEARNING RATE: 0.0005 Traceback (most recent call last): File "main.py", line 510, in main(parser.parse_args()) File "main.py", line 464, in main model = train(args, model, True) #Train encoder File "main.py", line 211, in train for step, (images, labels) in enumerate(loader): File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 336, in next return self._process_next_batch(batch) File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 357, in _process_next_batch raise batch.exc_type(batch.exc_msg) IndexError: Traceback (most recent call last): File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 106, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 106, in samples = collate_fn([dataset[i] for i in batch_indices]) File "D:\bishe\LEDNet-master\utils\dataset.py", line 86, in getitem filenameGt = self.filenamesGt[index] IndexError: list index out of range

    请问这个问题应该怎么解决

    opened by Liu6697 4
  • Label size of eval_cityscapes_server.py

    Label size of eval_cityscapes_server.py

    It seems that your code of eval_cityscapes_server.py produces labels with the resolution of 512X1024(not1024X2048). So I confused whether the Cityscapes test server can eval it correctly?

    opened by Serge-weihao 4
  • data

    data

    THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=134 error=710 : device-side assert triggered Traceback (most recent call last): File "train/main.py", line 519, in main(parser.parse_args()) File "train/main.py", line 473, in main model = train(args, model, True) #Train encoder File "train/main.py", line 237, in train loss = criterion(outputs, targets[:, 0])
    File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/content/drive/My Drive/LEDNet/utils/loss.py", line 15, in forward return self.loss(F.log_softmax(outputs, dim=1), targets) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 211, in forward return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2220, in nll_loss ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:134

    opened by Benjiaminh 2
  • 作者你好 我在训练过程中遇到了如下问题 请解答谢谢你

    作者你好 我在训练过程中遇到了如下问题 请解答谢谢你

    for step, (images, labels) in enumerate(loader): print('images = ',images.shape) #images = torch.Size([5, 3, 512, 682]) print('labels = ',labels.shape) #labels = torch.Size([5, 1, 64, 85]) inputs = images.cuda() targets = labels.cuda()

            start_time = time.time()
    
            #print('inputs= ',inputs.shape)   # inputs=  torch.Size([5, 3, 512, 682])
            #print('targets= ',targets.shape) # targets=  torch.Size([5, 1, 64, 85])
    
            imgs_batch = images.shape[0]
            if imgs_batch != args.batch_size:
                break            
                                     
            outputs = model(inputs, only_encode=enc)
            print('outputs = ',outputs)
            print('outputs=',outputs.shape) #torch.Size([5, 2, 64, 86])
            print('targets[:, 0] = ',targets[:, 0].shape) #torch.Size([5, 64, 85])
    

    input经过model()后输出的output的size变成了([5, 2, 64, 86]) 与标签维度不符合 出现了错误 请问什么原因可以造成这样的结果呢?

    opened by anqin5211314 2
  • training data

    training data

    could you tell me how to convert the dataset to 19 categories.I just use the *_color.png picture for training ,but after some step,it will report an error. RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535490206202/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:266 /opt/conda/conda-bld/pytorch_1535490206202/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [160,0,0] Assertion t >= 0 && t < n_classes failed.

    opened by YYingcute 2
  • 使用自己数据集时报错

    使用自己数据集时报错

    THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=128 error=59 : device-side assert triggered Traceback (most recent call last): File "main.py", line 518, in main(parser.parse_args()) File "main.py", line 472, in main model = train(args, model, True) #Train encoder File "main.py", line 236, in train loss = criterion(outputs, targets[:, 0]) File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/disk/LEDNet/utils/loss.py", line 15, in forward return self.loss(F.log_softmax(outputs, dim=0), targets) File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 210, in forward return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction) File "/home/disksoftware/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/functional.py", line 1792, in nll_loss ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128

    你好,我在使用自己数据集进行训练的时候报了如下错误,数据集格式按照cityscapes文件制作的,图片大小一致、格式一致。修改了类别数目。但是会报如下错误,如果有时间方便看一下吗,十分感谢。。。

    opened by Comedian1926 2
  • eval speed

    eval speed

    i run test/eval_forward_time.py with default params, the mean time is 28ms, when i set image size (2048,1024) ,the mean time is 100ms.

    when i set batchsize to 2,4,8, the mean time is all 32ms, so the inference time increases linearly

    opened by kakaluote 1
  • training data

    training data

    Thank you for your contribution. Can you give more details about training on VOC dataset? I think the number class on VOC dataset is 21(background + other category labels).

    opened by jhch1995 1
  • Update dataset.py

    Update dataset.py

    This is something that I encountered when I was executing the code on another dataset. So I modified the code in a way that it can support other datasets as well.

    Problem and Solution:

    PIL's 'P' mode which is used for reading images can change the spatial pixel information i.e. it alters the labels encoded as pixel values. The better way is to open the image in RBG mode and then simply take one channel as the label image.

    opened by pranavmicro7 0
  • 输入尺寸和输出尺寸的问题

    输入尺寸和输出尺寸的问题

    我并没有看到在输入端把图像resize到1024512,仅仅是把短边等比例缩放到512,长边并没有固定到1024,为何输出的时候,就直接缩放到1024512呢? class Decoder (nn.Module): def init(self, num_classes): super().init()

        self.apn = APN_Module(in_ch=128,out_ch=20)
        # self.upsample = Interpolate(size=(512, 1024), mode="bilinear")
        # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=4, stride=2, padding=1, output_padding=0, bias=True)
        # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=3, stride=2, padding=1, output_padding=1, bias=True)
        # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=2, stride=2, padding=0, output_padding=0, bias=True)
    
    def forward(self, input):
        
        output = self.apn(input)
        out = interpolate(output, size=(512, 1024), mode="bilinear", align_corners=True)
        # out = self.upsample(output)
        # print(out.shape)
        return out
    
    opened by LanWong1 3
  • Mask annotation extraction in cityscapes

    Mask annotation extraction in cityscapes

    @xiaoyufenfei hi when i try to visual cityscapes annotation on the data by extraction of semattic maps from png i get the following results is there any best way to extract the mask and obtain the polygon image

    opened by abhigoku10 0
Owner
Yu Wang
I am a graduate student in CV, my research areas center around computer vision and deep learning.
Yu Wang
[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation Getting Started Our codes are implemented and tested with pyth

ZiNiU WaN 176 Dec 15, 2022
DeepLabv3+:Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

DeepLabv3+:Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现 目录 性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Download

Bubbliiiing 31 Nov 25, 2022
This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

Wanyu Du 18 Dec 29, 2022
DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

DFFNet Paper DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation. Xiangyan Tang, Wenxuan Tu, Keqiu Li, J

null 4 Sep 23, 2022
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the weights at each decoder block vary spatially. For this purpose, we design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features

Yuval Nirkin 182 Dec 14, 2022
FANet - Real-time Semantic Segmentation with Fast Attention

FANet Real-time Semantic Segmentation with Fast Attention Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko , Stan Sc

Ping Hu 42 Nov 30, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

凌逆战 16 Dec 30, 2022
Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

STAR-pytorch Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021). CVF (pdf) STAR-DC

null 43 Dec 21, 2022
Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

null 90 Dec 29, 2022
Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

null 77 Dec 24, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

Daniel Hirsch 13 Nov 4, 2022
Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Lightweight (Bayesian) Media Mix Model This is not an official Google product. L

Google 342 Jan 3, 2023
YolactEdge: Real-time Instance Segmentation on the Edge

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images.

Haotian Liu 1.1k Jan 6, 2023
OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

null 45 Dec 13, 2022
implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

haochen wang 64 Dec 14, 2022