LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

Yu Wang

Last update: Dec 2, 2022

Related tags

Deep Learning real-time computer-vision pytorch semantic-segmentation cityscape-dataset lednet

Overview

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

Introduction

This project contains the code (Note: The code is test in the environment with python=3.6, cuda=9.0, PyTorch-0.4.1, also support Pytorch-0.4.1+) for: LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation by Yu Wang.

The extensive computational burden limits the usage of CNNs in mobile devices for dense estimation tasks, a.k.a semantic segmentation. In this paper, we present a lightweight network to address this problem, namely **LEDNet**, which employs an asymmetric encoder-decoder architecture for the task of real-time semantic segmentation.More specifically, the encoder adopts a ResNet as backbone network, where two new operations, channel split and shuffle, are utilized in each residual block to greatly reduce computation cost while maintaining higher segmentation accuracy. On the other hand, an attention pyramid network (APN) is employed in the decoder to further lighten the entire network complexity. Our model has less than 1M parameters, and is able to run at over 71 FPS on a single GTX 1080Ti GPU card. The comprehensive experiments demonstrate that our approach achieves state-of-the-art results in terms of speed and accuracy trade-off on Cityscapes dataset. and becomes an effective method for real-time semantic segmentation tasks.

Project-Structure

├── datasets  # contains all datasets for the project
|  └── cityscapes #  cityscapes dataset
|  |  └── gtCoarse #  Coarse cityscapes annotation
|  |  └── gtFine #  Fine cityscapes annotation
|  |  └── leftImg8bit #  cityscapes training image
|  └── cityscapesscripts #  cityscapes dataset label convert scripts！
├── utils
|  └── dataset.py # dataloader for cityscapes dataset
|  └── iouEval.py # for test 'iou mean' and 'iou per class'
|  └── transform.py # data preprocessing
|  └── visualize.py # Visualize with visdom 
|  └── loss.py # loss function 
├── checkpoint
|  └── xxx.pth # pretrained models encoder form ImageNet
├── save
|  └── xxx.pth # trained models form scratch 
├── imagenet-pretrain
|  └── lednet_imagenet.py # 
|  └── main.py # 
├── train
|  └── lednet.py  # model definition for semantic segmentation
|  └── main.py # train model scripts
├── test
|  |  └── dataset.py 
|  |  └── lednet.py # model definition
|  |  └── lednet_no_bn.py # Remove the BN layer in model definition
|  |  └── eval_cityscapes_color.py # Test the results to generate RGB images
|  |  └── eval_cityscapes_server.py # generate result uploaded official server
|  |  └── eval_forward_time.py # Test model inference time
|  |  └── eval_iou.py 
|  |  └── iouEval.py 
|  |  └── transform.py

Installation

Python 3.6.x. Recommended using Anaconda3
Set up python environment

pip3 install -r requirements.txt

Env: PyTorch_0.4.1; cuda_9.0; cudnn_7.1; python_3.6,
Clone this repository.

git clone https://github.com/xiaoyufenfei/LEDNet.git
cd LEDNet-master

Install Visdom.
Install torchsummary
Download the dataset by following the Datasets below.
Note: For training, we currently support cityscapes , aim to add Camvid and VOC and ADE20K dataset

Datasets

You can download cityscapes from here. Note: please download leftImg8bit_trainvaltest.zip(11GB) and gtFine_trainvaltest(241MB) and gtCoarse(1.3GB).
You can download CityscapesScripts, and convert the dataset to 19 categories. It should have this basic structure.

├── leftImg8bit
│   ├── train
│   ├──  val
│   └── test
├── gtFine
│   ├── train
│   ├──  val
│   └── test
├── gtCoarse
│   ├── train
│   ├── train_extra
│   └── val

Training-LEDNet

For help on the optional arguments you can run: python main.py -h
By default, we assume you have downloaded the cityscapes dataset in the ./data/cityscapes dir.
To train LEDNet using the train/main.py script the parameters listed in main.py as a flag or manually change them.

python main.py --savedir logs --model lednet --datadir path/root_directory/  --num-epochs xx --batch-size xx ...

Resuming-training-if-decoder-part-broken

for help on the optional arguments you can run: python main.py -h

python main.py --savedir logs --name lednet --datadir path/root_directory/  --num-epochs xx --batch-size xx --decoder --state "../save/logs/model_best_enc.pth.tar"...

Testing

the trained models of training process can be found at here. This may not be the best one, you can train one from scratch by yourself or Fine-tuning the training decoder with model encoder pre-trained on ImageNet, For instance

more details refer ./test/README.md

Results

Please refer to our article for more details.

Method	Dataset	Fine	Coarse	IoU_cla	IoU_cat	FPS
LEDNet	cityscapes	yes	yes	70.6%	87.1%	70+

qualitative segmentation result examples:

Citation

If you find this code useful for your research, please use the following BibTeX entry.

 @article{wang2019lednet,
  title={LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation},
  author={Wang, Yu and Zhou, Quan and Liu, Jia and Xiong，Jian and Gao, Guangwei and Wu, Xiaofu, and Latecki Jan Longin},
  journal={arXiv preprint arXiv:1905.02423},
  year={2019}
}

Tips

Limited by GPU resources, the project results need to be further improved...
It is recommended to pre-train Encoder on ImageNet and then Fine-turning Decoder part. The result will be better.

Reference

Comments

problems about cityscripts

Can you tell me where the cityscapesscripts are used? I download the whole dataset including cityscapes and cityscapesscripts, but I can't find the use of scripts!

opened by guaiwuguba 7
Decoder channels

I tried to implement LEDNet in keras and found that I have ~2.5m parameters which is incorrect according to the paper (claim to have ~900K). I saw you wrote two versions of LEDNet one with ~900K and the other like my implementation with ~2.5m. In the 900K implementation, you downsampled the number of channels to 1 straight away from the first conv layer in the downsampling branch of the decoder (why is that?). In the attached figure (from the paper) the number of the channels should remain the same through the first two levels of the downsampling branch of the decoder (and they still claim to have just ~900k parameters). What am I missing here? is this problematic paper?

opened by AvivSham 7
Question about reproduction
I want to implement your paper, but have several questions:

is there missing first convolution before downsampling in the paper ? (from channel 3 to channel 32)

Did you pretrained you encoder in imagenet or just scratch ? (And whould you mind to tell the training epochs you use ?)

What kind of weight init strategy you used?

The training image size

Here are my implementaion : https://github.com/AceCoooool/LEDNet But there is still some gap to the paper
opened by AceCoooool 5
训练时出现问题

你好，当我运行python main.py --savedir logs --model lednet --datadir D:/bishe/LEDNet-master/datasets/cityscapes时，跳出错误

========== ENCODER TRAINING =========== D:/bishe/LEDNet-master/datasets/cityscapes\leftImg8bit/train D:/bishe/LEDNet-master/datasets/cityscapes\leftImg8bit/val <class 'utils.loss.CrossEntropyLoss2d'> ----- TRAINING - EPOCH 1 ----- LEARNING RATE: 0.0005 Traceback (most recent call last): File "main.py", line 510, in main(parser.parse_args()) File "main.py", line 464, in main model = train(args, model, True) #Train encoder File "main.py", line 211, in train for step, (images, labels) in enumerate(loader): File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 336, in next return self._process_next_batch(batch) File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 357, in _process_next_batch raise batch.exc_type(batch.exc_msg) IndexError: Traceback (most recent call last): File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 106, in _worker_loop samples = collate_fn([dataset[i] for i in batch_indices]) File "C:\Users\admin\anaconda3\envs\DFANet\lib\site-packages\torch\utils\data\dataloader.py", line 106, in samples = collate_fn([dataset[i] for i in batch_indices]) File "D:\bishe\LEDNet-master\utils\dataset.py", line 86, in getitem filenameGt = self.filenamesGt[index] IndexError: list index out of range

请问这个问题应该怎么解决

opened by Liu6697 4
Label size of eval_cityscapes_server.py

It seems that your code of eval_cityscapes_server.py produces labels with the resolution of 512X1024(not1024X2048). So I confused whether the Cityscapes test server can eval it correctly?

opened by Serge-weihao 4
data

THCudaCheck FAIL file=/pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=134 error=710 : device-side assert triggered Traceback (most recent call last): File "train/main.py", line 519, in main(parser.parse_args()) File "train/main.py", line 473, in main model = train(args, model, True) #Train encoder File "train/main.py", line 237, in train loss = criterion(outputs, targets[:, 0])
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/content/drive/My Drive/LEDNet/utils/loss.py", line 15, in forward return self.loss(F.log_softmax(outputs, dim=1), targets) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/loss.py", line 211, in forward return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction) File "/usr/local/lib/python3.6/dist-packages/torch/nn/functional.py", line 2220, in nll_loss ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: cuda runtime error (710) : device-side assert triggered at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:134

opened by Benjiaminh 2

作者你好我在训练过程中遇到了如下问题请解答谢谢你

for step, (images, labels) in enumerate(loader): print('images = ',images.shape) #images = torch.Size([5, 3, 512, 682]) print('labels = ',labels.shape) #labels = torch.Size([5, 1, 64, 85]) inputs = images.cuda() targets = labels.cuda()

        start_time = time.time()

        #print('inputs= ',inputs.shape)   # inputs=  torch.Size([5, 3, 512, 682])
        #print('targets= ',targets.shape) # targets=  torch.Size([5, 1, 64, 85])

        imgs_batch = images.shape[0]
        if imgs_batch != args.batch_size:
            break            
                                 
        outputs = model(inputs, only_encode=enc)
        print('outputs = ',outputs)
        print('outputs=',outputs.shape) #torch.Size([5, 2, 64, 86])
        print('targets[:, 0] = ',targets[:, 0].shape) #torch.Size([5, 64, 85])

input经过model()后输出的output的size变成了([5, 2, 64, 86]) 与标签维度不符合出现了错误请问什么原因可以造成这样的结果呢？

opened by anqin5211314 2

training data

could you tell me how to convert the dataset to 19 categories.I just use the *_color.png picture for training ,but after some step,it will report an error. RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1535490206202/work/aten/src/THC/generated/../generic/THCTensorMathPointwise.cu:266 /opt/conda/conda-bld/pytorch_1535490206202/work/aten/src/THCUNN/SpatialClassNLLCriterion.cu:99: void cunn_SpatialClassNLLCriterion_updateOutput_kernel(T *, T *, T *, long *, T *, int, int, int, int, int, long) [with T = float, AccumT = float]: block: [1,0,0], thread: [160,0,0] Assertion t >= 0 && t < n_classes failed.

opened by YYingcute 2
使用自己数据集时报错

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu line=128 error=59 : device-side assert triggered Traceback (most recent call last): File "main.py", line 518, in main(parser.parse_args()) File "main.py", line 472, in main model = train(args, model, True) #Train encoder File "main.py", line 236, in train loss = criterion(outputs, targets[:, 0]) File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/disk/LEDNet/utils/loss.py", line 15, in forward return self.loss(F.log_softmax(outputs, dim=0), targets) File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "/home/disk/software/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 210, in forward return F.nll_loss(input, target, weight=self.weight, ignore_index=self.ignore_index, reduction=self.reduction) File "/home/disksoftware/anaconda3/envs/pytorch_env/lib/python3.6/site-packages/torch/nn/functional.py", line 1792, in nll_loss ret = torch._C._nn.nll_loss2d(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: cuda runtime error (59) : device-side assert triggered at /opt/conda/conda-bld/pytorch_1550780889552/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:128

你好，我在使用自己数据集进行训练的时候报了如下错误，数据集格式按照cityscapes文件制作的，图片大小一致、格式一致。修改了类别数目。但是会报如下错误，如果有时间方便看一下吗，十分感谢。。。

opened by Comedian1926 2
eval speed

i run test/eval_forward_time.py with default params, the mean time is 28ms, when i set image size (2048,1024) ,the mean time is 100ms.

when i set batchsize to 2,4,8, the mean time is all 32ms, so the inference time increases linearly

opened by kakaluote 1
training data

Thank you for your contribution. Can you give more details about training on VOC dataset? I think the number class on VOC dataset is 21(background + other category labels).

opened by jhch1995 1
Update dataset.py

This is something that I encountered when I was executing the code on another dataset. So I modified the code in a way that it can support other datasets as well.

Problem and Solution:

PIL's 'P' mode which is used for reading images can change the spatial pixel information i.e. it alters the labels encoded as pixel values. The better way is to open the image in RBG mode and then simply take one channel as the label image.

opened by pranavmicro7 0

输入尺寸和输出尺寸的问题

我并没有看到在输入端把图像resize到1024512，仅仅是把短边等比例缩放到512，长边并没有固定到1024，为何输出的时候，就直接缩放到1024512呢？ class Decoder (nn.Module): def init(self, num_classes): super().init()

    self.apn = APN_Module(in_ch=128,out_ch=20)
    # self.upsample = Interpolate(size=(512, 1024), mode="bilinear")
    # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=4, stride=2, padding=1, output_padding=0, bias=True)
    # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=3, stride=2, padding=1, output_padding=1, bias=True)
    # self.output_conv = nn.ConvTranspose2d(16, num_classes, kernel_size=2, stride=2, padding=0, output_padding=0, bias=True)

def forward(self, input):
    
    output = self.apn(input)
    out = interpolate(output, size=(512, 1024), mode="bilinear", align_corners=True)
    # out = self.upsample(output)
    # print(out.shape)
    return out

opened by LanWong1 3

Mask annotation extraction in cityscapes

@xiaoyufenfei hi when i try to visual cityscapes annotation on the data by extraction of semattic maps from png i get the following results is there any best way to extract the mask and obtain the polygon

opened by abhigoku10 0

Owner

Yu Wang

I am a graduate student in CV, my research areas center around computer vision and deep learning.

GitHub https://github.com/xiaoyufenfei/LEDNet

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

MAED: Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation Getting Started Our codes are implemented and tested with pyth

176 Dec 15, 2022

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现目录性能情况 Performance 所需环境 Environment 注意事项 Attention 文件下载 Download

31 Nov 25, 2022

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

GP-VAE This repository provides datasets and code for preprocessing, training and testing models for the paper: Diverse Text Generation via Variationa

18 Dec 29, 2022

DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

DFFNet Paper DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation. Xiangyan Tang, Wenxuan Tu, Keqiu Li, J

4 Sep 23, 2022

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

1 Feb 15, 2022

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

: We present a novel, real-time, semantic segmentation network in which the encoder both encodes and generates the parameters (weights) of the decoder. Furthermore, to allow maximal adaptivity, the weights at each decoder block vary spatially. For this purpose, we design a new type of hypernetwork, composed of a nested U-Net for drawing higher level context features

182 Dec 14, 2022

FANet - Real-time Semantic Segmentation with Fast Attention

FANet Real-time Semantic Segmentation with Fast Attention Ping Hu, Federico Perazzi, Fabian Caba Heilbron, Oliver Wang, Zhe Lin, Kate Saenko , Stan Sc

42 Nov 30, 2022

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

16 Dec 30, 2022

Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

STAR-pytorch Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021). CVF (pdf) STAR-DC

43 Dec 21, 2022

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021) This repository is for BAAF-Net introduce

90 Dec 29, 2022

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

R2RNet Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network." Jiang Hai, Zhu Xuan, Ren Yang, Yutong Hao, Fengzhu

77 Dec 24, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

AMAZ3DSim AMAZ3DSim is a lightweight python-based 3D network multi-agent simulator. It uses a cell-based congestion model. It calculates risk, battery

13 Nov 4, 2022

Lightweight mmm - Lightweight (Bayesian) Media Mix Model

Lightweight (Bayesian) Media Mix Model This is not an official Google product. L

342 Jan 3, 2023

YolactEdge: Real-time Instance Segmentation on the Edge

YolactEdge, the first competitive instance segmentation approach that runs on small edge devices at real-time speeds. Specifically, YolactEdge runs at up to 30.8 FPS on a Jetson AGX Xavier (and 172.7 FPS on an RTX 2080 Ti) with a ResNet-101 backbone on 550x550 resolution images.

1.1k Jan 6, 2023

OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

OrienMask This repository implements the framework OrienMask for real-time instance segmentation. It achieves 34.8 mask AP on COCO test-dev at the spe

45 Dec 13, 2022

implement of SwiftNet:Real-time Video Object Segmentation

SwiftNet The official PyTorch implementation of SwiftNet:Real-time Video Object Segmentation, which has been accepted by CVPR2021. Requirements Python

64 Dec 14, 2022

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

Related tags

Overview

Table of Contents:

Introduction

Project-Structure

Installation

Datasets

Training-LEDNet

Resuming-training-if-decoder-part-broken

Testing

Results

Citation

Tips

Reference

Comments

Owner

Yu Wang

[ICCV 2021] Encoder-decoder with Multi-level Attention for 3D Human Shape and Pose Estimation

DeepLabv3+：Encoder-Decoder with Atrous Separable Convolution语义分割模型在tensorflow2当中的实现

This repository contains the data and code for the paper "Diverse Text Generation via Variational Encoder-Decoder Models with Gaussian Process Priors" (SPNLP@ACL2022)

DFFNet: An IoT-perceptive Dual Feature Fusion Network for General Real-time Semantic Segmentation

Real-Time-Student-Attendence-System - Real Time Student Attendence System

HyperSeg: Patch-wise Hypernetwork for Real-time Semantic Segmentation Official PyTorch Implementation

FANet - Real-time Semantic Segmentation with Fast Attention

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

Semantic Segmentation for Real Point Cloud Scenes via Bilateral Augmentation and Adaptive Fusion (CVPR 2021)

Official code of "R2RNet: Low-light Image Enhancement via Real-low to Real-normal Network."

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

A lightweight Python-based 3D network multi-agent simulator. Uses a cell-based congestion model. Calculates risk, loudness and battery capacities of the agents. Suitable for 3D network optimization tasks.

Lightweight mmm - Lightweight (Bayesian) Media Mix Model

YolactEdge: Real-time Instance Segmentation on the Edge

OrienMask: Real-time Instance Segmentation with Discriminative Orientation Maps

implement of SwiftNet:Real-time Video Object Segmentation