This repository contains the source code of our work on designing efficient CNNs for computer vision

Sachin Mehta

Last update: Nov 26, 2022

Related tags

Deep Learning cnn pytorch object-detection semantic-segmentation pascal-voc cityscapes mscoco imagenet-classifier imagenet-dataset cnn-classification shufflenetv2 espnetv2 dicenet

Overview

Efficient networks for Computer Vision

This repo contains source code of our work on designing efficient networks for different computer vision tasks: (1) Image classification, (2) Object detection, and (3) Semantic segmentation.

Real-time semantic segmentation using ESPNetv2 on iPhone7. See here for iOS application source code using COREML.

Real-time object detection using ESPNetv2

Table of contents

Key highlihgts
Supported networks
Relevant papers
Blogs
Performance comparison
Training receipe
Instructions for segmentation and detection demos
Citation
License
Acknowledgements
Contributions
Notes

Key highlights

Object classification on the ImageNet and MS-COCO (multi-label)
Semantic Segmentation on the PASCAL VOC and the CityScapes
Object Detection on the PASCAL VOC and the MS-COCO
Supports PyTorch 1.0
Integrated with Tensorboard for easy visualization of training logs.
Scripts for downloading different datasets.
Semantic segmentation application using ESPNetv2 on iPhone can be found here.

Supported networks

This repo supports following networks:

ESPNetv2 (Classification, Segmentation, Detection)
DiCENet (Classification, Segmentation, Detection)
ShuffleNetv2 (Classification)

Relevant papers

Blogs

Performance comparison

ImageNet

Below figure compares the performance of DiCENet with other efficient networks on the ImageNet dataset. DiCENet outperforms all existing efficient networks, including MobileNetv2 and ShuffleNetv2. More details here

Object detection

Below table compares the performance of our architecture with other detection networks on the MS-COCO dataset. Our network is fast and accurate. More details here

	MSCOCO
	Image Size	FLOPs	mIOU	FPS
SSD-VGG	512x512	100 B	26.8	19
YOLOv2	544x544	17.5 B	21.6	40
ESPNetv2-SSD (Ours)	512x512	3.2 B	24.54	35

Semantic Segmentation

Below figure compares the performance of ESPNet and ESPNetv2 on two different datasets. Note that ESPNets are one of the first efficient networks that delivers competitive performance to existing networks on the PASCAL VOC dataset, even with low resolution images say 256x256. See here for more details.

	Cityscapes			PASCAL VOC 2012
	Image Size	FLOPs	mIOU	Image Size	FLOPs	mIOU
ESPNet	1024x512	4.5 B	60.3	512x512	2.2 B	63
ESPNetv2	1024x512	2.7 B	66.2	384x384	0.76 B	68

Training Receipe

Image Classification

Details about training and testing are provided here.

Details about performance of different models are provided here.

Semantic segmentation

Details about training and testing are provided here.

Details about performance of different models are provided here.

Object Detection

Details about training and testing are provided here.

Details about performance of different models are provided here.

Instructions for segmentation and detection demos

To run the segmentation demo, just type:

python segmentation_demo.py

To run the detection demo, run the following command:

python detection_demo.py

OR 

python detection_demo.py --live

For other supported arguments, please see the corresponding files.

Citation

If you find this repository helpful, please feel free to cite our work:

@article{mehta2019dicenet,
Author = {Sachin Mehta and Hannaneh Hajishirzi and Mohammad Rastegari},
Title = {DiCENet: Dimension-wise Convolutions for Efficient Networks},
Year = {2020},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
}

@inproceedings{mehta2018espnetv2,
  title={ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network},
  author={Mehta, Sachin and Rastegari, Mohammad and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2019}
}

@inproceedings{mehta2018espnet,
  title={Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation},
  author={Mehta, Sachin and Rastegari, Mohammad and Caspi, Anat and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={552--568},
  year={2018}
}

License

By downloading this software, you acknowledge that you agree to the terms and conditions given here.

Acknowledgements

Most of our object detection code is adapted from SSD in pytorch. We thank authors for such an amazing work.

Want to help out?

Thanks for your interest in our work :).

Open tasks that are interesting:

Tensorflow implementation. I kind of wanna do this but not getting enough time. If you are interested, drop a message and we can talk about it.
Optimizing the EESP and the DiceNet block at CUDA-level.
Optimize and port pretrained models across multiple mobile platforms, including Android.
Other thoughts are also welcome :).

Notes

Notes about DiCENet paper

This repository contains DiCENet's source code in PyTorch only and you should be able to reproduce the results of v1/v2 of our arxiv paper. To reproduce the results of our T-PAMI paper, you need to incorporate MobileNet tricks in Section 5.3, which are currently not a part of this repository.

Comments

question about cityscapes performance of ESPNet v2

Hi, I'd like to ask that if the performance on cityscapes reported in the model_zoo/README.md is the trained using both fine and coarse labeled data? I tried to run your code using only fine labeled data at s=1.5, 100 epochs for stage1 and 100 epochs for stage2 but the best mIoU on val is 61.1%, 2.7% lower than your implementation. I wonder if it's because I didn't train enough or the data problem.

opened by Wangzhuoying0716 19
convert pytorch model into onnx version [Detection part]

Thank you for sharing this great code. Right row, I want to deploy your model in to tvm platform, which may need conversion between pytorch and onnx, the code I used is like below.

weights = 'model/detection/model_zoo/espnetv2/espnetv2_s_2.0_pascal_300x300.pth' model = ssd(args, cfg) pretrained_dict = torch.load(weights, map_location=torch.device('cpu')) model.load_state_dict(pretrained_dict) PATH_ONNX='deploy.onnx' dummy_input = torch.randn(1, 3, 300, 300, device='cpu') torch.onnx.export(model, dummy_input, PATH_ONNX, input_names = ['image'], output_names= ['output'], verbose=True,opset_version=11)

but during the conversion, an error occurs,the info is below:

~/software/EdgeNets/nn_layers/eesp.py:139: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if w2 == w1: ~/software/EdgeNets/nn_layers/eesp.py:89: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if expanded.size() == input.size(): ~/software/EdgeNets/nn_layers/efficient_pyramid_pool.py:44: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! h_s = int(math.ceil(height * self.scales[i])) ~/software/EdgeNets/nn_layers/efficient_pyramid_pool.py:45: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! w_s = int(math.ceil(width * self.scales[i]))

raise RuntimeError("Failed to export an ONNX attribute, " RuntimeError: Failed to export an ONNX attribute, since it's not constant, please try to make things (e.g., kernel size) static if possible

please give some tips, which I can figure out the problem. thank you for your help!

opened by zechendev 17
Question about the Experiment of Image Multi-label Classification

In the paper 《DiCENet: Dimension-wise Convolutions for Efficient Networks》, the network width scaling parameter s can be selected, but in the experiment of image multi-label classification, the experimental error of s=0.1 (when I run the corresponding program of s=0.2, the machine can't run, but the network parameter of s=0.1 is less than half) ... can you provide a program for s = 0.1?

opened by mymuli 10
Segmentation Fault Espnetv2 on RPI

Hello guys, after reading your article on ESPNETv2 I was very excited about your optimizations in convolutional operations, and would very much like to make the network run on a raspberry pi.

After much effort, I was able to compile all the necessary dependencies on raspberry pi, and was able to successfully execute the detection_demo.py script, which by default makes inferences on the images in your repository. However, when I use another set of images (passed as a parameter in --im-dir) the network ends up running with the following message: Segmentation Fault.

Do you have any idea what might be generating this? I tried to use several other image sets, and the only one the network could run without any problems was the default image set from its repository.

I am using the raspberry pi 3 model B + with the Raspibian Buster Lite operating system.

opened by marcusvlc 9
loss calculation of multi-label classification.
For multi classification, while calculating loss for coco dataset why it is multiplied by number of classes (80.0)? Is it weight parameter for class imbalance? loss = criteria(output, target.float()) * 80.0

For calculating precision and recall should we use cumulative TP & FP as mentioned here: https://github.com/rafaelpadilla/Object-Detection-Metrics
opened by aiwithshekhar 8
to onnx

I have tried to convert pytorch model to onnx with api (torch.onnx.export), but I meet a problem showing below info Failed to export an ONNX attribute, since it's not constant, please try to make things (e.g., kernel size) static if possible How could I fix it ?

opened by kingvision 6
Zeroing out the gradient before calling optimizer.step() in train_eval_seg.py

Hi, thanks for publishing the code.

I noticed the optimizer.zero_grad() is called after calling optimizer.step() in train_eval_seg.py. Is this intentional? I was having issues until I moved zero_grad() above loss.backward().

Also when resuming the training there was a runtime error with type compatibility (cpu vs cuda) when calculating loss.backward() and adding model = model.cuda() right after loading the state dictionary in train_segmentation.py solved it.

opened by mitalbert 4
Had an error about multiplying Bool with ByteTensor
The training script would stop throwing an error about multiplying Bool with ByteTensor in segmentation_miou.py at

pred = pred * (target>0) inter = pred * (pred == target)

I assume the authors were expecting torch to cast Bool values into 0 and 1, which didn't happen. I changed it to

pred = pred * (target>0).type(torch.ByteTensor) inter = pred * .type(torch.ByteTensor)

to explicitly cast the values.

I'm not sure why this happened in my case, perhaps due to version incompatibility.
opened by mitalbert 4
onnx model error

Hi I got the follow error when i convert espnetv2 segment model to onnx. raise RuntimeError("Failed to export an ONNX attribute, " RuntimeError: Failed to export an ONNX attribute, since it's not constant, please try to make things (e.g., kernel size) static if possible

how i can fix it?

opened by kingvision 3
Is it possible to use DiCENet train a regression model

thanks for the great work i would like to use the idea of DiceNet on a custom dataset. And i'm new to DL and pytorch can you give me some idea how to make it?

Thank you:)

opened by w11m 3
Runtime Error

Sorry for disturbing you. When I run the test_detection.py ,I got an error:RuntimeError: expected backend CUDA and dtype Float but got backend CPU and dtype Float. How can I solve it? Thank you

opened by Ideal-Bai 3
Walls and surfaces detection

Hi, I tried EgeNets in object detections. It is working fine with objects. My question is about the walls and surfaces (floors and ceilings). Are there more properties to be set to make the model detect them I mean walls and surfaces.

thanks, Dina

opened by d-sharafeldeen 0
Allow input channel count to be changed

Update to ensure that different numbers of input channels can be handled without error. (Shortcut connection configuration assumed input channel count always equal to config_inp_reinf.)

opened by pnorridge 0
args.resume error in type. SHould be str and should take path of model checkpoint
The following are the correct lines in train_detection.py to enable resume from checkpoint.

checkpoint = torch.load(args.resume, map_location=torch.device('cpu')) parser.add_argument('--resume', type=str, help='resume from checkpoint') #action='store_true',
opened by saiabinesh 0
Asking about the optimized implementation for DimConv

@sacmehta Really impressive work!

It is great to see that you have open-sourced the implementation of DimConv😁

We are wondering whether you have any plan to release the optimized implementation illustrated in the following Figures:

opened by PkuRainBow 0

Espnetv2 redundant avg pooling in input reinforcement

In the input reinforcement branches it seem that there are redundants Average pool on the original input in the deeper stages. Do you see any problem to pass the strided average input image after each DownSample block to the next one? In DownSampler forward method, return input2:

def forward(self, input, input2=None):
        '''
        :param input: input feature map
        :return: feature map down-sampled by a factor of 2
        '''
        avg_out = self.avg(input)
        eesp_out = self.eesp(input)
        output = torch.cat([avg_out, eesp_out], 1)

        if input2 is not None:
            #assuming the input is a square image
            # Shortcut connection with the input image
            w1 = avg_out.size(2)
            while True:
                input2 = F.avg_pool2d(input2, kernel_size=3, padding=1, stride=2)
                w2 = input2.size(2)
                if w2 == w1:
                    break
            output = output + self.inp_reinf(input2)

        return self.act(output), input2

In Espnet class, overwrite the input object:

out_l3_0, input = self.level3_0(out_l2, input)  # down-sample

opened by mayaboker 0

Owner

Sachin Mehta

Research Scientist at Apple and Affiliate Assistant Professor at UW

GitHub

Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

93 Nov 28, 2022

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

128 Dec 8, 2022

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

6.6k Jan 6, 2023

This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

Bridge-damage-segmentation This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge c

5 Dec 7, 2022

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

SERank An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow

44 Oct 20, 2022

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le

5 Jul 8, 2022

This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

Polygonal Building Segmentation by Frame Field Learning We add a frame field output to an image segmentation neural network to improve segmentation qu

186 Jan 4, 2023

This repository contains PyTorch code for Robust Vision Transformers.

117 Dec 7, 2022

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

34 Apr 13, 2022

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI Gym toolkit.

99 Dec 12, 2022

This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision | Project Page | Paper | This repository contains a pytorch implementation of "St

87 Dec 9, 2022

This repository contains the source code of our work on designing efficient CNNs for computer vision

Related tags

Overview

Efficient networks for Computer Vision

Key highlights

Supported networks

Relevant papers

Blogs

Performance comparison

ImageNet

Object detection

Semantic Segmentation

Training Receipe

Image Classification

Semantic segmentation

Object Detection

Instructions for segmentation and detection demos

Citation

License

Acknowledgements

Want to help out?

Notes

Notes about DiCENet paper

Comments

Owner

Sachin Mehta

Unadversarial Examples: Designing Objects for Robust Vision

This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

This repository contains PyTorch code for Robust Vision Transformers.

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Open source Python module for computer vision

Open Source Differentiable Computer Vision Library for PyTorch

A simple, high level, easy-to-use open source Computer Vision library for Python.

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".