CVNets: A library for training computer vision networks

Apple

Last update: Jan 3, 2023

Related tags

Deep Learning ml-cvnets

Overview

CVNets: A library for training computer vision networks

This repository contains the source code for training computer vision models. Specifically, it contains the source code of the MobileViT paper for the following tasks:

Image classification on the ImageNet dataset
Object detection using SSD
Semantic segmentation using Deeplabv3

Note: Any image classification backbone can be used with object detection and semantic segmentation models

Training can be done with two samplers:

Standard distributed sampler
Mulit-scale distributed sampler

We recommend to use multi-scale sampler as it improves generalization capability and leads to better performance. See MobileViT for details.

Installation

CVNets can be installed in the local python environment using the below command:

    git clone [email protected]:apple/ml-cvnets.git
    cd ml-cvnets
    pip install -r requirements.txt
    pip install --editable .

We recommend to use Python 3.6+ and PyTorch (version >= v1.8.0) with conda environment. For setting-up python environment with conda, see here.

Getting Started

General instructions for training and evaluation different models are given here.
Examples for a training and evaluating a specific model are provided in the examples folder. Right now, we support following models.
For converting PyTorch models to CoreML, see README-pytorch-to-coreml.md.

Citation

If you find our work useful, please cite the following paper:

@article{mehta2021mobilevit,
  title={MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer},
  author={Mehta, Sachin and Rastegari, Mohammad},
  journal={arXiv preprint arXiv:2110.02178},
  year={2021}
}

Comments

AMP settings

When I was training mobile net v3 model with mixed_precision = true, the program raised an error like this:

022-08-16 03:13:22 - DEBUG - Training epoch 0 with 66072 samples 2022-08-16 03:14:03 - LOGS - Epoch: 0 [ 1/10000000], loss: 5.1851, LR: [0.1, 0.1], Avg. batch load time: 38.484, Elapsed time: 40.62 2022-08-16 03:14:06 - LOGS - Exception occurred that interrupted the training. CUDA error: an illegal memory access was encountered

Do you have any suggestion?

opened by sdeven95 12

Checkpoint cannot be loaded

Greetings! I have problems with loading pretrained weights in detection task. I took weights and config from the Model Zoo. Chosen model for detection is SSD MobileViTv2-0.75.

I tried to get the model with pretrained weights with code below:

from cvnets import get_model
from options.opts import get_training_arguments
from options.utils import load_config_file

sys.argv= ['']
opts = get_training_arguments()
setattr(opts, 'common.config_file', <path-to-config-file>)
setattr(opts, 'model.detection.pretrained', <path-to-checkpoint>)

opts = load_config_file(opts)
model = get_model(opts)

However, there is an error occurs:

Unable to load pretrained weights from /content/ml-cvnets/coco-ssd-mobilevitv2-0.75.pt. Error: Error(s) in loading state_dict for SingleShotMaskDetector:
	size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([504, 512, 1, 1]).
	size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([504]).
	size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([504, 256, 1, 1]). ...

As I understand, there is a difference between layer names in model and checkpoint.

Is there is anything I can do to fix this issue? Thanks

opened by MaEvGoR 9

Code Unfriendly for Downstream Tasks

Hi @sacmehta,

First I'd like to thank you for your contribution of MobileViTs. But I have the code in this repo very unfriendly for downstream tasks. I cannot find which snippets are related to MobileViTs as well as their detailed configurations. Although other have been some other GitHub repos providing unofficial code for MobileViT-V1, but the I couldn't load pre-trained weights from this one. Could you please re-organise your code here to make it more useful for the community?

Many thanks, Yiming

opened by YimingMa 7
The accuracy of the two figures does not match

Hi，thanks for such a great job，but I have a question, the accuracy of the standard training method of MobileViT-S in (b) and (c) in Figure 9 in the paper seems to be different, and the top-1 accuracy in (b) should be about 77% , the top-1 accuracy in (c) should be around 78%.

Can anyone answer my doubts? thanks！

opened by cuicheng01 6
Dose batch sampler make model learns same weight in every training repeatation?
Hi! Thanks for sharing great work!

I have a question about the sampler

I am working on some training examples with variable_batch_sampler and batch sampler. I'm trying to get average ACC over 5 times training (repeat training 5 times in the same setting) Best validation ACCs may be similar (not the same) in every repeated trained model with both samplers. But when I used the batch sampler, the all best val ACC of the repeated model are the same. Is that right?

I'm working with this yaml file

0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.docx

with this shell script command

for iter in '1' '2' '3' '4' '5' do CUDA_VISIBLE_DEVICES=3 cvnets-train --common.config-file ./config/classification/CBIS-DDSM_2c_womulti/0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.yaml --common.results-loc ./results/2class_iters_wo_multi/iter$iter --model.classification.finetune-pretrained-model --model.classification.n-pretrained-classes 1000 --model.classification.pretrained ./weights/mobilevit_s.pt done

and when I repeated training 5 times, best val ACC (same value 72.5467) appear in 268 epoch. comparing_iterations.xlsx

This result is right?

Also I modified some code because to tracking the training information.

modified code is under hear

code.zip
opened by YHYeooooong 5
Question about the dataloader on kinetics 400

Hi，thanks for such a great job! I download the kinetics 400 from this github https://github.com/cvdfoundation/kinetics-dataset And I try to modify the parameters in this directory:config/video_classification/kinetics/mobilevit_st.yaml When I modify the root of the training set and validation set,it seems doesn't load the data correctly!And I use one 1080ti.

Does it need to do other things to load the dataset? Thank you!!

opened by andrewwang0612 5
Segmentation Head for Custom Dataset is not automatically connected!

I like your library. Thank you for this.

I tried to train my Custom Dataset. I created CustomDataset Class. Class number of my dataset is 10. I used pretrained mobilenetv2 weights. When I run it, I have ize mismatch for seg_head.classifier.block.conv.bias: copying a param with shape torch.Size([150]) from checkpoint, the shape in current model is torch.Size([10])

Because class number of Ade20k Dataset is 150. Mine is 10. I think the Segmentation head is automatically connected. Most libraries do this automatically. How can I fix this? Are you going to post a command for fine-tuning?

opened by umitkacar 5
Recommendations for configuring heads/training on custom datasets?

Thanks for developing MobileVit! I'm wondering if there are any specific tips/examples for fine-tuning the pre-trained classification and detection models using mobilevit on custom datasets? I see the n_classes reference in both classifier (1000) and detection (80), but can you provide any quick example of modifying for custom datasets and if you have any recommended lr for finetuning? Thanks very much!

opened by lessw2020 5
Get different accuracy on different GPU
Hi, I trained the mobilevit-xxs model on 2 different machine, and I got different results, while the accuracy on Titan RTX is always lower than the one on RTX 2080Ti by 0.5%.

Below is the specs of 2 machines:

Machine 1: Titan RTX, pytorch 1.10.0

Machine 2: RTX 2080 Ti pytorch 1.11.0

After checking the code, I can only think of AMP as potential problem, but both gpus are using TU102 as chip, so they should support the same precision of float.

Do you have any idea about where might cause the problem ?

Thank you
opened by jimmylin0979 4
If I use mobilevitv2's pretrained weights for transfer learning, does the custom dataset need to be normalized between 0 and 1? i.e. divide by 255.0; and the input image is bgr or rgb format?

Is the tensor of the input image of mobilevitv2's pre-trained weights between 0 to 255, or is it normalized to between 0 to 1? i.e. divide by 255.0?

In other words, if I use mobilevitv2 pre-trained weights for transfer learning, does the custom dataset need to be normalized between 0 and 1? i.e. divide by 255.0 thanks

opened by chenying99 4
Training Time on ImageNet

Hi, I wonder how long does it take for you to train MobileViT-S with 8 GPUs? I trained your model MobileViT-S with 1024 batch size (128*8) for 1 epoch with 8 V100 GPUs, but the training time is very slow. It costs like 40 minutes/epoch. For 300 epochs, it means more than 8 days. Is it normal?

Thank you

opened by StephenEkaputra 3
Size mismatch for ssd_heads when using the pretrained model

Hi, I came up with an issue when training on my dataset which has 102 classes. It seems like a size mismatch problem when using the pretrained model.

Here is my code: PYTHONWARNINGS="ignore" cvnets-train
--common.config-file config/detection/ssd_coco/coco-ssd-mobilevitv2-1.75.yaml
--common.results-loc exp/exp1
--common.override-kwargs model.detection.pretrained="ckpt/coco-ssd-mobilevitv2-1.75.pt" model.detection.n-classes=103

and this is the error: 2022-12-12 21:09:24 - ERROR - Unable to load pretrained weights from ckpt/coco-ssd-mobilevitv2-1.75.pt. Error: Error(s) in loading state_dict for SingleShotMaskDetector: size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 512, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 512, 1, 1]). size mismatch for ssd_heads.0.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 256, 1, 1]). size mismatch for ssd_heads.1.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.2.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 256, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 256, 1, 1]). size mismatch for ssd_heads.2.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.3.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 128, 1, 1]). size mismatch for ssd_heads.3.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.4.loc_cls_layer.pw_conv.block.conv.weight: copying a param with shape torch.Size([510, 128, 1, 1]) from checkpoint, the shape in current model is torch.Size([642, 128, 1, 1]). size mismatch for ssd_heads.4.loc_cls_layer.pw_conv.block.conv.bias: copying a param with shape torch.Size([510]) from checkpoint, the shape in current model is torch.Size([642]). size mismatch for ssd_heads.5.loc_cls_layer.block.conv.weight: copying a param with shape torch.Size([340, 64, 1, 1]) from checkpoint, the shape in current model is torch.Size([428, 64, 1, 1]). size mismatch for ssd_heads.5.loc_cls_layer.block.conv.bias: copying a param with shape torch.Size([340]) from checkpoint, the shape in current model is torch.Size([428]).

How to use the pretrained model when i try to train on my own dataset with different num_classes with mscoco?

opened by QiqLiang 6
About Dataset collate fn

Hello! thanks for sharing the great work!

I have some questions about the dataset.py code. I rewrite the imagenet.py in dataset/classification to make my own dataset.

I changed only the name of def and 'register_collated_fn', 'register_dataset'. I found that my collated_fn def name and 'register_collated_fn' are not the same, and the collated name in dataset.py line 45~47 and 'register_collate_fn' are different either. I wonder if those things call the other collate_fn (not collated_fn in my code) in the training and validation phase? if so, is called collated_fn exactly the same as imagenet.py collated_fn? or working as same as imagenet.py collated_fn? May it make any performance difference by using default collate_fn instead of imagenet_collate_fn?

here is my dataset.py CBIS-DDSM_4class_sampled.zip

and my yaml file is here 0707_mobilevits_real_defualt_lr0.0001_cosine_advanced_multiscale.zip

opened by YHYeooooong 2
ERROR - Nan encountered in the loss.

Hi, there are some problems when I tried to train on my dataset, the prediction of the model's output appeared nan. Do you know how to solve such kind of problems?

opened by blossom-lv 1
Model stability problem

Hi! Thank you for the great work. I used the mobilevit blocks for my model to low level task. at begin it has good performance , but I get different performance when I run it once again. my model is stable if I remove the mobilevit blocks. Do you know what problem would make the model instability, I use following basic parameter: max_lr:1e-4 min_lr:1e-6 optim name: adamw scheduler: name: "cosine" in_channels:96 transformer_dim : 144 ffn_dim = 288 n_transformer_blocks=2

opened by Xinjie-Wei 3
LayerNorm2d != GroupNorm w/ groups=1

Re your MobileVit2, these two norms are not equivalent and it would be misleading to call it LayerNorm2d as the group norm w/ groups=1 is not equivalent. 'LayerNorm2d' is already used elsewhere in other nets. Might be worth retraining MobileVit2 with an actual LayerNorm or renaming the norm to just GroupNorm.

https://github.com/apple/ml-cvnets/blob/84d992f413e52c0468f86d23196efd9dad885e6f/cvnets/layers/normalization/layer_norm.py#L56

opened by rwightman 9

adam optim ERROR:If capturable=False, state_steps should not be CUDA tensors.

Hi, congratulations on your excellent work! I would really appreciate if you could help me through this. So I run

PYTHONWARNINGS="ignore" cvnets-train --common.config-file config/classification/imagenet/mobilevit_v2.yaml --common.results-loc mobilevitv2_results/width_1_0_0 --common.override-kwargs scheduler.cosine.max_lr=0.0075 scheduler.cosine.min_lr=0.00075 optim.weight_decay=0.013 model.classification.mitv2.width_multiplier=1.00 --common.tensorboard-logging --common.accum-freq 4 --common.auto-resume

and trigger the auto-resume mode to continue my last training, and this error occurs

2022-07-03 06:06:18 - LOGS    - Exception occurred that interrupted the training. If capturable=False, state_steps shou
ld not be CUDA tensors.
If capturable=False, state_steps should not be CUDA tensors.

Traceback (most recent call last):                                                                           
  File "/home/yu/projects/mobilevit/ml-cvnets/engine/training_engine.py", line 682, in run
    train_loss, train_ckpt_metric = self.train_epoch(epoch)
  File "/home/yu/projects/mobilevit/ml-cvnets/engine/training_engine.py", line 353, in train_epoch
    self.gradient_scalar.step(optimizer=self.optimizer)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 338, in step
    retval = self._maybe_opt_step(optimizer, optimizer_state, *args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/cuda/amp/grad_scaler.py", line 285, in _may
be_opt_step
    retval = optimizer.step(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/optimizer.py", line 109, in wrapper
    return func(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorat
e_context
    return func(*args, **kwargs)
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 161, in step
    adamw(params_with_grad,
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 218, in adamw
    func(params,
  File "/home/yu/anaconda3/envs/mobilevit/lib/python3.8/site-packages/torch/optim/adamw.py", line 259, in _single_tenso
r_adamw
    assert not step_t.is_cuda, "If capturable=False, state_steps should not be CUDA tensors."

And I am 100% sure that CUDNN is enabled, all gpus are available, nothing wrong happens when I first train this.

And here's a nother problem, do you guys have a clue if the training process is slow? Thanks sooooo much!

opened by yqi19 3

Owner

Apple

GitHub

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models Codes for this paper The Lottery Tickets Hypo

59 Dec 28, 2022

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

LIBRAS-Image-Classifier This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian

26 Oct 14, 2022

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This project provides abundant choices of quantization strategies (such as the quantization algorithms, training schedules and empirical tricks) for quantizing the deep neural networks into low-bit counterparts.

51 Dec 10, 2022

LeafSnap replicated using deep neural networks to test accuracy compared to traditional computer vision methods.

Deep-Leafsnap Convolutional Neural Networks have become largely popular in image tasks such as image classification recently largely due to to Krizhev

48 Nov 27, 2022

Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

lacmus The program for searching through photos from the air of lost people in the forest using Retina Net neural nwtwork. The project is being develo

168 Dec 27, 2022

CVNets: A library for training computer vision networks

Related tags

Overview

CVNets: A library for training computer vision networks

Installation

Getting Started

Citation

Comments

Owner

Apple

[CVPR 2021] "The Lottery Tickets Hypothesis for Supervised and Self-supervised Pre-training in Computer Vision Models" Tianlong Chen, Jonathan Frankle, Shiyu Chang, Sijia Liu, Yang Zhang, Michael Carbin, Zhangyang Wang

This project demonstrates the use of neural networks and computer vision to create a classifier that interprets the Brazilian Sign Language.

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

LeafSnap replicated using deep neural networks to test accuracy compared to traditional computer vision methods.

Lacmus is a cross-platform application that helps to find people who are lost in the forest using computer vision and neural networks.

Open Source Differentiable Computer Vision Library for PyTorch

Scenic: A Jax Library for Computer Vision and Beyond

GluonMM is a library of transformer models for computer vision and multi-modality research

A simple, high level, easy-to-use open source Computer Vision library for Python.

Open source Python module for computer vision

PyTorchCV: A PyTorch-Based Framework for Deep Learning in Computer Vision.

Build fully-functioning computer vision models with PyTorch

Implementation of self-attention mechanisms for general purpose. Focused on computer vision modules. Ongoing repository.

Datasets, Transforms and Models specific to Computer Vision

Repository providing a wide range of self-supervised pretrained models for computer vision tasks.

A PyTorch-Based Framework for Deep Learning in Computer Vision

Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

A framework for analyzing computer vision models with simulated data

It's final year project of Diploma Engineering. This project is based on Computer Vision.