This repository contains the source code of our work on designing efficient CNNs for computer vision

Overview

Efficient networks for Computer Vision

This repo contains source code of our work on designing efficient networks for different computer vision tasks: (1) Image classification, (2) Object detection, and (3) Semantic segmentation.

Real-time semantic segmentation using ESPNetv2 on iPhone7. See here for iOS application source code using COREML.
Seg demo on iPhone7 Seg demo on iPhone7
Real-time object detection using ESPNetv2
Demo 1
Demo 2 Demo 3

Table of contents

  1. Key highlihgts
  2. Supported networks
  3. Relevant papers
  4. Blogs
  5. Performance comparison
  6. Training receipe
  7. Instructions for segmentation and detection demos
  8. Citation
  9. License
  10. Acknowledgements
  11. Contributions
  12. Notes

Key highlights

  • Object classification on the ImageNet and MS-COCO (multi-label)
  • Semantic Segmentation on the PASCAL VOC and the CityScapes
  • Object Detection on the PASCAL VOC and the MS-COCO
  • Supports PyTorch 1.0
  • Integrated with Tensorboard for easy visualization of training logs.
  • Scripts for downloading different datasets.
  • Semantic segmentation application using ESPNetv2 on iPhone can be found here.

Supported networks

This repo supports following networks:

  • ESPNetv2 (Classification, Segmentation, Detection)
  • DiCENet (Classification, Segmentation, Detection)
  • ShuffleNetv2 (Classification)

Relevant papers

Blogs

Performance comparison

ImageNet

Below figure compares the performance of DiCENet with other efficient networks on the ImageNet dataset. DiCENet outperforms all existing efficient networks, including MobileNetv2 and ShuffleNetv2. More details here

DiCENet performance on the ImageNet

Object detection

Below table compares the performance of our architecture with other detection networks on the MS-COCO dataset. Our network is fast and accurate. More details here

MSCOCO
Image Size FLOPs mIOU FPS
SSD-VGG 512x512 100 B 26.8 19
YOLOv2 544x544 17.5 B 21.6 40
ESPNetv2-SSD (Ours) 512x512 3.2 B 24.54 35

Semantic Segmentation

Below figure compares the performance of ESPNet and ESPNetv2 on two different datasets. Note that ESPNets are one of the first efficient networks that delivers competitive performance to existing networks on the PASCAL VOC dataset, even with low resolution images say 256x256. See here for more details.

Cityscapes PASCAL VOC 2012
Image Size FLOPs mIOU Image Size FLOPs mIOU
ESPNet 1024x512 4.5 B 60.3 512x512 2.2 B 63
ESPNetv2 1024x512 2.7 B 66.2 384x384 0.76 B 68

Training Receipe

Image Classification

Details about training and testing are provided here.

Details about performance of different models are provided here.

Semantic segmentation

Details about training and testing are provided here.

Details about performance of different models are provided here.

Object Detection

Details about training and testing are provided here.

Details about performance of different models are provided here.

Instructions for segmentation and detection demos

To run the segmentation demo, just type:

python segmentation_demo.py

To run the detection demo, run the following command:

python detection_demo.py

OR 

python detection_demo.py --live

For other supported arguments, please see the corresponding files.

Citation

If you find this repository helpful, please feel free to cite our work:

@article{mehta2019dicenet,
Author = {Sachin Mehta and Hannaneh Hajishirzi and Mohammad Rastegari},
Title = {DiCENet: Dimension-wise Convolutions for Efficient Networks},
Year = {2020},
journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence},
}

@inproceedings{mehta2018espnetv2,
  title={ESPNetv2: A Light-weight, Power Efficient, and General Purpose Convolutional Neural Network},
  author={Mehta, Sachin and Rastegari, Mohammad and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
  year={2019}
}

@inproceedings{mehta2018espnet,
  title={Espnet: Efficient spatial pyramid of dilated convolutions for semantic segmentation},
  author={Mehta, Sachin and Rastegari, Mohammad and Caspi, Anat and Shapiro, Linda and Hajishirzi, Hannaneh},
  booktitle={Proceedings of the European Conference on Computer Vision (ECCV)},
  pages={552--568},
  year={2018}
}

License

By downloading this software, you acknowledge that you agree to the terms and conditions given here.

Acknowledgements

Most of our object detection code is adapted from SSD in pytorch. We thank authors for such an amazing work.

Want to help out?

Thanks for your interest in our work :).

Open tasks that are interesting:

  • Tensorflow implementation. I kind of wanna do this but not getting enough time. If you are interested, drop a message and we can talk about it.
  • Optimizing the EESP and the DiceNet block at CUDA-level.
  • Optimize and port pretrained models across multiple mobile platforms, including Android.
  • Other thoughts are also welcome :).

Notes

Notes about DiCENet paper

This repository contains DiCENet's source code in PyTorch only and you should be able to reproduce the results of v1/v2 of our arxiv paper. To reproduce the results of our T-PAMI paper, you need to incorporate MobileNet tricks in Section 5.3, which are currently not a part of this repository.

Comments
  • question about cityscapes performance of ESPNet v2

    question about cityscapes performance of ESPNet v2

    Hi, I'd like to ask that if the performance on cityscapes reported in the model_zoo/README.md is the trained using both fine and coarse labeled data? I tried to run your code using only fine labeled data at s=1.5, 100 epochs for stage1 and 100 epochs for stage2 but the best mIoU on val is 61.1%, 2.7% lower than your implementation. I wonder if it's because I didn't train enough or the data problem.

    opened by Wangzhuoying0716 19
  • convert pytorch model into onnx version [Detection part]

    convert pytorch model into onnx version [Detection part]

    Thank you for sharing this great code. Right row, I want to deploy your model in to tvm platform, which may need conversion between pytorch and onnx, the code I used is like below.

    weights = 'model/detection/model_zoo/espnetv2/espnetv2_s_2.0_pascal_300x300.pth' model = ssd(args, cfg) pretrained_dict = torch.load(weights, map_location=torch.device('cpu')) model.load_state_dict(pretrained_dict) PATH_ONNX='deploy.onnx' dummy_input = torch.randn(1, 3, 300, 300, device='cpu') torch.onnx.export(model, dummy_input, PATH_ONNX, input_names = ['image'], output_names= ['output'], verbose=True,opset_version=11)

    but during the conversion, an error occurs,the info is below:

    ~/software/EdgeNets/nn_layers/eesp.py:139: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if w2 == w1: ~/software/EdgeNets/nn_layers/eesp.py:89: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! if expanded.size() == input.size(): ~/software/EdgeNets/nn_layers/efficient_pyramid_pool.py:44: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! h_s = int(math.ceil(height * self.scales[i])) ~/software/EdgeNets/nn_layers/efficient_pyramid_pool.py:45: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! w_s = int(math.ceil(width * self.scales[i]))

    raise RuntimeError("Failed to export an ONNX attribute, " RuntimeError: Failed to export an ONNX attribute, since it's not constant, please try to make things (e.g., kernel size) static if possible

    please give some tips, which I can figure out the problem. thank you for your help!

    opened by zechendev 17
  • Question about the Experiment of Image Multi-label Classification

    Question about the Experiment of Image Multi-label Classification

    In the paper 《DiCENet: Dimension-wise Convolutions for Efficient Networks》, the network width scaling parameter s can be selected, but in the experiment of image multi-label classification, the experimental error of s=0.1 (when I run the corresponding program of s=0.2, the machine can't run, but the network parameter of s=0.1 is less than half) ... can you provide a program for s = 0.1?

    opened by mymuli 10
  • Segmentation Fault Espnetv2 on RPI

    Segmentation Fault Espnetv2 on RPI

    Hello guys, after reading your article on ESPNETv2 I was very excited about your optimizations in convolutional operations, and would very much like to make the network run on a raspberry pi.

    After much effort, I was able to compile all the necessary dependencies on raspberry pi, and was able to successfully execute the detection_demo.py script, which by default makes inferences on the images in your repository. However, when I use another set of images (passed as a parameter in --im-dir) the network ends up running with the following message: Segmentation Fault.

    Do you have any idea what might be generating this? I tried to use several other image sets, and the only one the network could run without any problems was the default image set from its repository.

    I am using the raspberry pi 3 model B + with the Raspibian Buster Lite operating system.

    opened by marcusvlc 9
  • loss calculation of multi-label classification.

    loss calculation of multi-label classification.

    1. For multi classification, while calculating loss for coco dataset why it is multiplied by number of classes (80.0)? Is it weight parameter for class imbalance? loss = criteria(output, target.float()) * 80.0

    2. For calculating precision and recall should we use cumulative TP & FP as mentioned here: https://github.com/rafaelpadilla/Object-Detection-Metrics

    opened by aiwithshekhar 8
  • to onnx

    to onnx

    I have tried to convert pytorch model to onnx with api (torch.onnx.export), but I meet a problem showing below info Failed to export an ONNX attribute, since it's not constant, please try to make things (e.g., kernel size) static if possible How could I fix it ?

    opened by kingvision 6
  • Zeroing out the gradient before calling optimizer.step() in train_eval_seg.py

    Zeroing out the gradient before calling optimizer.step() in train_eval_seg.py

    Hi, thanks for publishing the code.

    I noticed the optimizer.zero_grad() is called after calling optimizer.step() in train_eval_seg.py. Is this intentional? I was having issues until I moved zero_grad() above loss.backward().

    Also when resuming the training there was a runtime error with type compatibility (cpu vs cuda) when calculating loss.backward() and adding model = model.cuda() right after loading the state dictionary in train_segmentation.py solved it.

    opened by mitalbert 4
  • Had an error about multiplying Bool with ByteTensor

    Had an error about multiplying Bool with ByteTensor

    The training script would stop throwing an error about multiplying Bool with ByteTensor in segmentation_miou.py at

            pred = pred * (target>0)
            inter = pred * (pred == target)
    

    I assume the authors were expecting torch to cast Bool values into 0 and 1, which didn't happen. I changed it to

            pred = pred * (target>0).type(torch.ByteTensor)
            inter = pred * .type(torch.ByteTensor)
    

    to explicitly cast the values.

    I'm not sure why this happened in my case, perhaps due to version incompatibility.

    opened by mitalbert 4
  • onnx model error

    onnx model error

    Hi I got the follow error when i convert espnetv2 segment model to onnx. raise RuntimeError("Failed to export an ONNX attribute, " RuntimeError: Failed to export an ONNX attribute, since it's not constant, please try to make things (e.g., kernel size) static if possible

    how i can fix it?

    opened by kingvision 3
  • Is it possible to use DiCENet train a regression model

    Is it possible to use DiCENet train a regression model

    thanks for the great work i would like to use the idea of DiceNet on a custom dataset. And i'm new to DL and pytorch can you give me some idea how to make it?

    Thank you:)

    opened by w11m 3
  • Runtime Error

    Runtime Error

    Sorry for disturbing you. When I run the test_detection.py ,I got an error:RuntimeError: expected backend CUDA and dtype Float but got backend CPU and dtype Float. How can I solve it? Thank you

    opened by Ideal-Bai 3
  • Walls and surfaces detection

    Walls and surfaces detection

    Hi, I tried EgeNets in object detections. It is working fine with objects. My question is about the walls and surfaces (floors and ceilings). Are there more properties to be set to make the model detect them I mean walls and surfaces.

    thanks, Dina

    opened by d-sharafeldeen 0
  • Allow input channel count to be changed

    Allow input channel count to be changed

    Update to ensure that different numbers of input channels can be handled without error. (Shortcut connection configuration assumed input channel count always equal to config_inp_reinf.)

    opened by pnorridge 0
  • args.resume error in type. SHould be str and should take path of model checkpoint

    args.resume error in type. SHould be str and should take path of model checkpoint

    The following are the correct lines in train_detection.py to enable resume from checkpoint.

            checkpoint = torch.load(args.resume, map_location=torch.device('cpu'))
    parser.add_argument('--resume', type=str, help='resume from checkpoint') #action='store_true',
    
    opened by saiabinesh 0
  • Asking about the optimized implementation for DimConv

    Asking about the optimized implementation for DimConv

    @sacmehta Really impressive work!

    It is great to see that you have open-sourced the implementation of DimConv😁

    We are wondering whether you have any plan to release the optimized implementation illustrated in the following Figures:

    image

    image

    opened by PkuRainBow 0
  • Espnetv2 redundant avg pooling in input reinforcement

    Espnetv2 redundant avg pooling in input reinforcement

    In the input reinforcement branches it seem that there are redundants Average pool on the original input in the deeper stages. Do you see any problem to pass the strided average input image after each DownSample block to the next one? In DownSampler forward method, return input2:

    def forward(self, input, input2=None):
            '''
            :param input: input feature map
            :return: feature map down-sampled by a factor of 2
            '''
            avg_out = self.avg(input)
            eesp_out = self.eesp(input)
            output = torch.cat([avg_out, eesp_out], 1)
    
            if input2 is not None:
                #assuming the input is a square image
                # Shortcut connection with the input image
                w1 = avg_out.size(2)
                while True:
                    input2 = F.avg_pool2d(input2, kernel_size=3, padding=1, stride=2)
                    w2 = input2.size(2)
                    if w2 == w1:
                        break
                output = output + self.inp_reinf(input2)
    
            return self.act(output), input2
    

    In Espnet class, overwrite the input object:

    out_l3_0, input = self.level3_0(out_l2, input)  # down-sample
    

    image

    opened by mayaboker 0
Owner
Sachin Mehta
Research Scientist at Apple and Affiliate Assistant Professor at UW
Sachin Mehta
Unadversarial Examples: Designing Objects for Robust Vision

Unadversarial Examples: Designing Objects for Robust Vision This repository contains the code necessary to replicate the major results of our paper: U

Microsoft 93 Nov 28, 2022
This repo contains the official code of our work SAM-SLR which won the CVPR 2021 Challenge on Large Scale Signer Independent Isolated Sign Language Recognition.

Skeleton Aware Multi-modal Sign Language Recognition By Songyao Jiang, Bin Sun, Lichen Wang, Yue Bai, Kunpeng Li and Yun Fu. Smile Lab @ Northeastern

Isen (Songyao Jiang) 128 Dec 8, 2022
Many Class Activation Map methods implemented in Pytorch for CNNs and Vision Transformers. Including Grad-CAM, Grad-CAM++, Score-CAM, Ablation-CAM and XGrad-CAM

Class Activation Map methods implemented in Pytorch pip install grad-cam ⭐ Tested on many Common CNN Networks and Vision Transformers. ⭐ Includes smoo

Jacob Gildenblat 6.6k Jan 6, 2023
This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge column damage detection

Bridge-damage-segmentation This is the code repository for the paper A hierarchical semantic segmentation framework for computer-vision-based bridge c

Jingxiao Liu 5 Dec 7, 2022
An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

SERank An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow

Zhihu 44 Oct 20, 2022
This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

inverse_attention This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021. Le

Firas Laakom 5 Jul 8, 2022
This repository contains the code for our fast polygonal building extraction from overhead images pipeline.

Polygonal Building Segmentation by Frame Field Learning We add a frame field output to an image segmentation neural network to improve segmentation qu

Nicolas Girard 186 Jan 4, 2023
This repository contains PyTorch code for Robust Vision Transformers.

This repository contains PyTorch code for Robust Vision Transformers.

null 117 Dec 7, 2022
Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering (NAACL 2021)

Designing a Minimal Retrieve-and-Read System for Open-Domain Question Answering Abstract In open-domain question answering (QA), retrieve-and-read mec

Clova AI Research 34 Apr 13, 2022
gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks.

gym-anm is a framework for designing reinforcement learning (RL) environments that model Active Network Management (ANM) tasks in electricity distribution networks. It is built on top of the OpenAI Gym toolkit.

Robin Henry 99 Dec 12, 2022
PHOTONAI is a high level python API for designing and optimizing machine learning pipelines.

PHOTONAI is a high level python API for designing and optimizing machine learning pipelines. We've created a system in which you can easily select and

Medical Machine Learning Lab - University of Münster 57 Nov 12, 2022
Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

AI Summer 962 Dec 23, 2022
Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

Colorado Reed 53 Nov 9, 2022
Open source Python module for computer vision

About PCV PCV is a pure Python library for computer vision based on the book "Programming Computer Vision with Python" by Jan Erik Solem. More details

Jan Erik Solem 1.9k Jan 6, 2023
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 4, 2023
A simple, high level, easy-to-use open source Computer Vision library for Python.

ZoomVision : Slicing Aid Detection A simple, high level, easy-to-use open source Computer Vision library for Python. Installation Installing dependenc

Nurettin Sinanoğlu 2 Mar 4, 2022
Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

PLBART Code pre-release of our work, Unified Pre-training for Program Understanding and Generation accepted at NAACL 2021. Note. A detailed documentat

Wasi Ahmad 138 Dec 30, 2022
Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation.

AVATAR Official code of our work, AVATAR: A Parallel Corpus for Java-Python Program Translation. AVATAR stands for jAVA-pyThon progrAm tRanslation. AV

Wasi Ahmad 26 Dec 3, 2022
This repository contains a pytorch implementation of "StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision".

StereoPIFu: Depth Aware Clothed Human Digitization via Stereo Vision | Project Page | Paper | This repository contains a pytorch implementation of "St

null 87 Dec 9, 2022