CVNets: A library for training computer vision networks

Overview

CVNets: A library for training computer vision networks

This repository contains the source code for training computer vision models. Specifically, it contains the source code of the MobileViT paper for the following tasks:

  • Image classification on the ImageNet dataset
  • Object detection using SSD
  • Semantic segmentation using Deeplabv3

Note: Any image classification backbone can be used with object detection and semantic segmentation models

Training can be done with two samplers:

We recommend to use multi-scale sampler as it improves generalization capability and leads to better performance. See MobileViT for details.

Installation

CVNets can be installed in the local python environment using the below command:

    git clone [email protected]:apple/ml-cvnets.git
    cd ml-cvnets
    pip install -r requirements.txt
    pip install --editable .

We recommend to use Python 3.6+ and PyTorch (version >= v1.8.0) with conda environment. For setting-up python environment with conda, see here.

Getting Started

Citation

If you find our work useful, please cite the following paper:

@article{mehta2021mobilevit,
  title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={arXiv preprint arXiv:2110.02178},
  year={2021}
}
Comments
  • AMP settings

    AMP settings

    When I was training mobile net v3 model with mixed_precision = true, the program raised an error like this:

    022-08-16 03:13:22 - DEBUG - Training epoch 0 with 66072 samples 2022-08-16 03:14:03 - LOGS - Epoch: 0 [ 1/10000000], loss: 5.1851, LR: [0.1, 0.1], Avg. batch load time: 38.484, Elapsed time: 40.62 2022-08-16 03:14:06 - LOGS - Exception occurred that interrupted the training. CUDA error: an illegal memory access was encountered

    Do you have any suggestion?

    opened by sdeven95 12
  • Checkpoint cannot be loaded

    Checkpoint cannot be loaded

    Greetings! I have problems with loading pretrained weights in detection task. I took weights and config from the Model Zoo. Chosen model for detection is SSD MobileViTv2-0.75.

    I tried to get the model with pretrained weights with code below:

    from cvnets import get_model
    from options.opts import get_training_arguments
    from options.utils import load_config_file
    
    sys.argv= ['']
    opts = get_training_arguments()
    setattr(opts, 'common.config_file', <path-to-config-file>)
    setattr(opts, 'model.detection.pretrained', <path-to-checkpoint>)
    
    opts = load_config_file(opts)
    model = get_model(opts)
    

    However, there is an error occurs:

    Unable to load pretrained weights from /content/ml-cvnets/coco-ssd-mobilevitv2-0.75.pt. Error: Error(s) in loading state_dict for SingleShotMaskDetector:
    	size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([504, 512, 1, 1]).
    	size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([504]).
    	size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([504, 256, 1, 1]). ...
    

    As I understand, there is a difference between layer names in model and checkpoint.

    Is there is anything I can do to fix this issue? Thanks

    opened by MaEvGoR 9
  • Code Unfriendly for Downstream Tasks

    Code Unfriendly for Downstream Tasks

    Hi @sacmehta,

    First I'd like to thank you for your contribution of MobileViTs. But I have the code in this repo very unfriendly for downstream tasks. I cannot find which snippets are related to MobileViTs as well as their detailed configurations. Although other have been some other GitHub repos providing unofficial code for MobileViT-V1, but the I couldn't load pre-trained weights from this one. Could you please re-organise your code here to make it more useful for the community?

    Many thanks, Yiming

    opened by YimingMa 7
  •  The accuracy of the two figures does not match

    The accuracy of the two figures does not match

    Hi,thanks for such a great job,but I have a question, the accuracy of the standard training method of MobileViT-S in (b) and (c) in Figure 9 in the paper seems to be different, and the top-1 accuracy in (b) should be about 77% , the top-1 accuracy in (c) should be around 78%. image

    Can anyone answer my doubts? thanks!

    opened by cuicheng01 6
  • Dose batch sampler make model learns same weight in every training repeatation?

    Dose batch sampler make model learns same weight in every training repeatation?

    Hi! Thanks for sharing great work!

    I have a question about the sampler

    I am working on some training examples with variable_batch_sampler and batch sampler. I'm trying to get average ACC over 5 times training (repeat training 5 times in the same setting) Best validation ACCs may be similar (not the same) in every repeated trained model with both samplers. But when I used the batch sampler, the all best val ACC of the repeated model are the same. Is that right?

    I'm working with this yaml file

    0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.docx

    with this shell script command

    for iter in '1' '2' '3' '4' '5'
    do
    
    
        CUDA_VISIBLE_DEVICES=3 cvnets-train --common.config-file ./config/classification/CBIS-DDSM_2c_womulti/0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.yaml --common.results-loc ./results/2class_iters_wo_multi/iter$iter --model.classification.finetune-pretrained-model --model.classification.n-pretrained-classes 1000 --model.classification.pretrained ./weights/mobilevit_s.pt
    done
    

    and when I repeated training 5 times, best val ACC (same value 72.5467) appear in 268 epoch. comparing_iterations.xlsx

    This result is right?

    Also I modified some code because to tracking the training information.

    modified code is under hear

    code.zip

    opened by YHYeooooong 5
  • Question about the dataloader on kinetics 400

    Question about the dataloader on kinetics 400

    Hi,thanks for such a great job! I download the kinetics 400 from this github https://github.com/cvdfoundation/kinetics-dataset And I try to modify the parameters in this directory:config/video_classification/kinetics/mobilevit_st.yaml When I modify the root of the training set and validation set,it seems doesn't load the data correctly!And I use one 1080ti. image

    Does it need to do other things to load the dataset? Thank you!!

    opened by andrewwang0612 5
  • Segmentation Head for Custom Dataset is not automatically connected!

    Segmentation Head for Custom Dataset is not automatically connected!

    I like your library. Thank you for this.

    I tried to train my Custom Dataset. I created CustomDataset Class. Class number of my dataset is 10. I used pretrained mobilenetv2 weights. When I run it, I have ize mismatch for seg_head.classifier.block.conv.bias: copying a param with shape torch.Size([150]) from checkpoint, the shape in current model is torch.Size([10])

    Because class number of Ade20k Dataset is 150. Mine is 10. I think the Segmentation head is automatically connected. Most libraries do this automatically. How can I fix this? Are you going to post a command for fine-tuning?

    opened by umitkacar 5
  • Recommendations for configuring heads/training on custom datasets?

    Recommendations for configuring heads/training on custom datasets?

    Thanks for developing MobileVit! I'm wondering if there are any specific tips/examples for fine-tuning the pre-trained classification and detection models using mobilevit on custom datasets? I see the n_classes reference in both classifier (1000) and detection (80), but can you provide any quick example of modifying for custom datasets and if you have any recommended lr for finetuning? Thanks very much!

    opened by lessw2020 5
  • Get different accuracy on different GPU

    Get different accuracy on different GPU

    Hi, I trained the mobilevit-xxs model on 2 different machine, and I got different results, while the accuracy on Titan RTX is always lower than the one on RTX 2080Ti by 0.5%.

    Below is the specs of 2 machines:

    • Machine 1: Titan RTX, pytorch 1.10.0
    • Machine 2: RTX 2080 Ti pytorch 1.11.0

    After checking the code, I can only think of AMP as potential problem, but both gpus are using TU102 as chip, so they should support the same precision of float.

    Do you have any idea about where might cause the problem ?

    Thank you

    opened by jimmylin0979 4
  • If I use mobilevitv2's pretrained weights for transfer learning, does the custom dataset need to be normalized between 0 and 1? i.e. divide by 255.0; and the input image is bgr  or rgb format?

    If I use mobilevitv2's pretrained weights for transfer learning, does the custom dataset need to be normalized between 0 and 1? i.e. divide by 255.0; and the input image is bgr or rgb format?

    Is the tensor of the input image of mobilevitv2's pre-trained weights between 0 to 255, or is it normalized to between 0 to 1? i.e. divide by 255.0?

    In other words, if I use mobilevitv2 pre-trained weights for transfer learning, does the custom dataset need to be normalized between 0 and 1? i.e. divide by 255.0 thanks

    opened by chenying99 4
  • Training Time on ImageNet

    Training Time on ImageNet

    Hi, I wonder how long does it take for you to train MobileViT-S with 8 GPUs? I trained your model MobileViT-S with 1024 batch size (128*8) for 1 epoch with 8 V100 GPUs, but the training time is very slow. It costs like 40 minutes/epoch. For 300 epochs, it means more than 8 days. Is it normal?

    Thank you

    opened by StephenEkaputra 3
  • Size mismatch for ssd_heads when using the pretrained model

    Size mismatch for ssd_heads when using the pretrained model

    Hi, I came up with an issue when training on my dataset which has 102 classes. It seems like a size mismatch problem when using the pretrained model.

    Here is my code: PYTHONWARNINGS="ignore" cvnets-train
    --common.config-file config/detection/ssd_coco/coco-ssd-mobilevitv2-1.75.yaml
    --common.results-loc exp/exp1
    --common.override-kwargs model.detection.pretrained="ckpt/coco-ssd-mobilevitv2-1.75.pt" model.detection.n-classes=103

    and this is the error: 2022-12-12 21:09:24 - ERROR - Unable to load pretrained weights from ckpt/coco-ssd-mobilevitv2-1.75.pt. Error: Error(s) in loading state_dict for SingleShotMaskDetector: size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 512, 1, 1]). size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 256, 1, 1]). size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.2.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 256, 1, 1]). size mismatch for ssd_heads.2.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.3.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 128, 1, 1]). size mismatch for ssd_heads.3.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.4.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 128, 1, 1]). size mismatch for ssd_heads.4.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.5.loc_cls_layer.block.conv.weight: copying a param with shape torch.Size([340, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([428, 64, 1, 1]). size mismatch for ssd_heads.5.loc_cls_layer.block.conv.bias: copying a param with shape torch.Size([340]) from checkpoint, the shape in current model is torch.Size([428]).

    How to use the pretrained model when i try to train on my own dataset with different num_classes with mscoco?

    opened by QiqLiang 6
  • About Dataset collate fn

    About Dataset collate fn

    Hello! thanks for sharing the great work!

    I have some questions about the dataset.py code. I rewrite the imagenet.py in dataset/classification to make my own dataset.

    I changed only the name of def and 'register_collated_fn', 'register_dataset'. I found that my collated_fn def name and 'register_collated_fn' are not the same, and the collated name in dataset.py line 45~47 and 'register_collate_fn' are different either. I wonder if those things call the other collate_fn (not collated_fn in my code) in the training and validation phase? if so, is called collated_fn exactly the same as imagenet.py collated_fn? or working as same as imagenet.py collated_fn? May it make any performance difference by using default collate_fn instead of imagenet_collate_fn?

    here is my dataset.py CBIS-DDSM_4class_sampled.zip

    and my yaml file is here 0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.zip

    opened by YHYeooooong 2
  • ERROR   - Nan encountered in the loss.

    ERROR - Nan encountered in the loss.

    Hi, there are some problems when I tried to train on my dataset, the prediction of the model's output appeared nan. Do you know how to solve such kind of problems?

    opened by blossom-lv 1
  • Model stability problem

    Model stability problem

    Hi! Thank you for the great work. I used the mobilevit blocks for my model to low level task. at begin it has good performance , but I get different performance when I run it once again. my model is stable if I remove the mobilevit blocks. Do you know what problem would make the model instability, I use following basic parameter: max_lr:1e-4 min_lr:1e-6 optim name: adamw scheduler: name: "cosine" in_channels:96 transformer_dim : 144 ffn_dim = 288 n_transformer_blocks=2

    opened by Xinjie-Wei 3
  • LayerNorm2d != GroupNorm w/ groups=1

    LayerNorm2d != GroupNorm w/ groups=1

    Re your MobileVit2, these two norms are not equivalent and it would be misleading to call it LayerNorm2d as the group norm w/ groups=1 is not equivalent. 'LayerNorm2d' is already used elsewhere in other nets. Might be worth retraining MobileVit2 with an actual LayerNorm or renaming the norm to just GroupNorm.

    https://github.com/apple/ml-cvnets/blob/84d992f413e52c0468f86d23196efd9dad885e6f/cvnets/layers/normalization/layer_norm.py#L56

    opened by rwightman 9
  • adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors.

    adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors.

    Hi, congratulations on your excellent work! I would really appreciate if you could help me through this. So I run

    PYTHONWARNINGS="ignore" cvnets-train --common.config-file config/classification/imagenet/mobilevit_v2.yaml --common.results-loc mobilevitv2_results/width_1_0_0 --common.override-kwargs scheduler.cosine.max_lr=0.0075 scheduler.cosine.min_lr=0.00075 optim.weight_decay=0.013 model.classification.mitv2.width_multiplier=1.00 --common.tensorboard-logging --common.accum-freq 4 --common.auto-resume 
    

    and trigger the auto-resume mode to continue my last training, and this error occurs

    2022-07-03 06:06:18 - LOGS    - Exception occurred that interrupted the training. If capturable=False, state_steps shou
    ld not be CUDA tensors.
    If capturable=False, state_steps should not be CUDA tensors.
    
    Traceback (most recent call last):                                                                           
      File "/home/yu/projects/mobilevit/ml-cvnets/engine/training_engine.py", line 682, in run
        train_loss, train_ckpt_metric = self.train_epoch(epoch)
      File "/home/yu/projects/mobilevit/ml-cvnets/engine/training_engine.py", line 353, in train_epoch
        self.gradient_scalar.step(optimizer=self.optimizer)
      File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 338, in step
        retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
      File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 285, in _may
    be_opt_step
        retval = optimizer.step(*args, **kwargs)
      File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/optimizer.py", line 109, in wrapper
        return func(*args, **kwargs)
      File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorat
    e_context
        return func(*args, **kwargs)
      File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 161, in step
        adamw(params_with_grad,
      File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 218, in adamw
        func(params,
      File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 259, in _single_tenso
    r_adamw
        assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."
    

    And I am 100% sure that CUDNN is enabled, all gpus are available, nothing wrong happens when I first train this.

    And here's a nother problem, do you guys have a clue if the training process is slow? Thanks sooooo much!

    opened by yqi19 3
Owner
Apple
Apple
[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

VITA 59 Dec 28, 2022
This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

Aryclenio Xavier Barros 26 Oct 14, 2022
QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit counterparts.

Monash Green AI Lab 51 Dec 10, 2022
LeafSnap replicated using deep neural networks to test accuracy compared to traditional computer vision methods.

Deep-Leafsnap Convolutional Neural Networks have become largely popular in image tasks such as image classification recently largely due to to Krizhev

Sujith Vishwajith 48 Nov 27, 2022
Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

lacmus The program for searching through photos from the air of lost people in the forest using Retina Net neural nwtwork. The project is being develo

Lacmus Foundation 168 Dec 27, 2022
Open Source Differentiable Computer Vision Library for PyTorch

Kornia is a differentiable computer vision library for PyTorch. It consists of a set of routines and differentiable modules to solve generic computer

kornia 7.6k Jan 4, 2023
Scenic: A Jax Library for Computer Vision and Beyond

Scenic Scenic is a codebase with a focus on research around attention-based models for computer vision. Scenic has been successfully used to develop c

Google Research 1.6k Dec 27, 2022
GluonMM is a library of transformer models for computer vision and multi-modality research

GluonMM is a library of transformer models for computer vision and multi-modality research. It contains reference implementations of widely adopted baseline models and also research work from Amazon Research.

null 42 Dec 2, 2022
A simple, high level, easy-to-use open source Computer Vision library for Python.

ZoomVision : Slicing Aid Detection A simple, high level, easy-to-use open source Computer Vision library for Python. Installation Installing dependenc

Nurettin Sinanoğlu 2 Mar 4, 2022
Open source Python module for computer vision

About PCV PCV is a pure Python library for computer vision based on the book "Programming Computer Vision with Python" by Jan Erik Solem. More details

Jan Erik Solem 1.9k Jan 6, 2023
PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{CV2018, author = {Donny You ([email protected])}, howpubl

Donny You 40 Sep 14, 2022
Build fully-functioning computer vision models with PyTorch

Detecto is a Python package that allows you to build fully-functioning computer vision and object detection models with just 5 lines of code. Inferenc

Alan Bi 576 Dec 29, 2022
Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Self-attention building blocks for computer vision applications in PyTorch Implementation of self attention mechanisms for computer vision in PyTorch

AI Summer 962 Dec 23, 2022
Datasets, Transforms and Models specific to Computer Vision

torchvision The torchvision package consists of popular datasets, model architectures, and common image transformations for computer vision. Installat

null 13.1k Jan 2, 2023
Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

Hierarchical Pretraining: Research Repository This is a research repository for reproducing the results from the project "Self-supervised pretraining

Colorado Reed 53 Nov 9, 2022
A PyTorch-Based Framework for Deep Learning in Computer Vision

TorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision @misc{you2019torchcv, author = {Ansheng You and Xiangtai Li and Zhen Zhu a

Donny You 2.2k Jan 9, 2023
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

null 107 Dec 2, 2022
A framework for analyzing computer vision models with simulated data

3DB: A framework for analyzing computer vision models with simulated data Paper Quickstart guide Blog post Installation Follow instructions on: https:

3DB 112 Jan 1, 2023
It's final year project of Diploma Engineering. This project is based on Computer Vision.

Face-Recognition-Based-Attendance-System It's final year project of Diploma Engineering. This project is based on Computer Vision. Brief idea about ou

Neel 10 Nov 2, 2022