Sandbox for training deep learning networks

Oleg Sémery

Last update: Jan 1, 2023

Related tags

Deep Learning machine-learning deep-learning neural-network mxnet chainer tensorflow keras pytorch classification imagenet image-classification segmentation human-pose-estimation pretrained-models gluon cifar semantic-segmentation 3d-face-reconstruction tensorflow2

Overview

Deep learning networks

This repo is used to research convolutional networks primarily for computer vision tasks. For this purpose, the repo contains (re)implementations of various classification, segmentation, detection, and pose estimation models and scripts for training/evaluating/converting.

The following frameworks are used:

MXNet/Gluon (info),
PyTorch (info),
Chainer (info),
Keras (info),
TensorFlow 1.x/2.x (info).

For each supported framework, there is a PIP-package containing pure models without auxiliary scripts. List of packages:

gluoncv2 for Gluon,
pytorchcv for PyTorch,
chainercv2 for Chainer,
kerascv for Keras,
tensorflowcv for TensorFlow 1.x,
tf2cv for TensorFlow 2.x.

Currently, models are mostly implemented on Gluon and then ported to other frameworks. Some models are pretrained on ImageNet-1K, CIFAR-10/100, SVHN, CUB-200-2011, Pascal VOC2012, ADE20K, Cityscapes, and COCO datasets. All pretrained weights are loaded automatically during use. See examples of such automatic loading of weights in the corresponding sections of the documentation dedicated to a particular package:

Installation

To use training/evaluating scripts as well as all models, you need to clone the repository and install dependencies:

git clone [email protected]:osmr/imgclsmob.git
pip install -r requirements.txt

Table of implemented classification models

Some remarks:

Repo is an author repository, if it exists.
a, b, c, d, and e means the implementation of a model for ImageNet-1K, CIFAR-10, CIFAR-100, SVHN, and CUB-200-2011, respectively.
A, B, C, D, and E means having a pre-trained model for corresponding datasets.

Model	Gluon	PyTorch	Chainer	Keras	TF	TF2	Paper	Repo	Year
AlexNet	A	A	A	A	A	A	link	link	2012
ZFNet	A	A	A	A	A	A	link	-	2013
VGG	A	A	A	A	A	A	link	-	2014
BN-VGG	A	A	A	A	A	A	link	-	2015
BN-Inception	A	A	A	-	-	A	link	-	2015
ResNet	ABCDE	ABCDE	ABCDE	A	A	ABCDE	link	link	2015
PreResNet	ABCD	ABCD	ABCD	A	A	ABCD	link	link	2016
ResNeXt	ABCD	ABCD	ABCD	A	A	ABCD	link	link	2016
SENet	A	A	A	A	A	A	link	link	2017
SE-ResNet	ABCDE	ABCDE	ABCDE	A	A	ABCDE	link	link	2017
SE-PreResNet	ABCD	ABCD	ABCD	A	A	ABCD	link	link	2017
SE-ResNeXt	A	A	A	A	A	A	link	link	2017
ResNeSt(A)	A	A	A	-	-	A	link	link	2020
IBN-ResNet	A	A	-	-	-	A	link	link	2018
IBN-ResNeXt	A	A	-	-	-	A	link	link	2018
IBN-DenseNet	A	A	-	-	-	A	link	link	2018
AirNet	A	A	A	-	-	A	link	link	2018
AirNeXt	A	A	A	-	-	A	link	link	2018
BAM-ResNet	A	A	A	-	-	A	link	link	2018
CBAM-ResNet	A	A	A	-	-	A	link	link	2018
ResAttNet	a	a	a	-	-	-	link	link	2017
SKNet	a	a	a	-	-	-	link	link	2019
SCNet	A	A	A	-	-	A	link	link	2020
RegNet	A	A	A	-	-	A	link	link	2020
DIA-ResNet	aBCD	aBCD	aBCD	-	-	-	link	link	2019
DIA-PreResNet	aBCD	aBCD	aBCD	-	-	-	link	link	2019
PyramidNet	ABCD	ABCD	ABCD	-	-	ABCD	link	link	2016
DiracNetV2	A	A	A	-	-	A	link	link	2017
ShaResNet	a	a	a	-	-	-	link	link	2017
CRU-Net	A	-	-	-	-	-	link	link	2018
DenseNet	ABCD	ABCD	ABCD	A	A	ABCD	link	link	2016
CondenseNet	A	A	A	-	-	-	link	link	2017
SparseNet	a	a	a	-	-	-	link	link	2018
PeleeNet	A	A	A	-	-	A	link	link	2018
Oct-ResNet	abcd	a	a	-	-	-	link	-	2019
Res2Net	a	-	-	-	-	-	link	-	2019
WRN	ABCD	ABCD	ABCD	-	-	a	link	link	2016
WRN-1bit	BCD	BCD	BCD	-	-	-	link	link	2018
DRN-C	A	A	A	-	-	A	link	link	2017
DRN-D	A	A	A	-	-	A	link	link	2017
DPN	A	A	A	-	-	A	link	link	2017
DarkNet Ref	A	A	A	A	A	A	link	link	-
DarkNet Tiny	A	A	A	A	A	A	link	link	-
DarkNet-19	a	a	a	a	a	a	link	link	-
DarkNet-53	A	A	A	A	A	A	link	link	2018
ChannelNet	a	a	a	-	a	-	link	link	2018
iSQRT-COV-ResNet	a	a	-	-	-	-	link	link	2017
RevNet	-	a	-	-	-	-	link	link	2017
i-RevNet	A	A	A	-	-	-	link	link	2018
BagNet	A	A	A	-	-	A	link	link	2019
DLA	A	A	A	-	-	A	link	link	2017
MSDNet	a	ab	-	-	-	-	link	link	2017
FishNet	A	A	A	-	-	-	link	link	2018
ESPNetv2	A	A	A	-	-	-	link	link	2018
DiCENet	A	A	A	-	-	A	link	link	2019
HRNet	A	A	A	-	-	A	link	link	2019
VoVNet	A	A	A	-	-	A	link	link	2019
SelecSLS	A	A	A	-	-	A	link	link	2019
HarDNet	A	A	A	-	-	A	link	link	2019
X-DenseNet	aBCD	aBCD	aBCD	-	-	-	link	link	2017
SqueezeNet	A	A	A	A	A	A	link	link	2016
SqueezeResNet	A	A	A	A	A	A	link	-	2016
SqueezeNext	A	A	A	A	A	A	link	link	2018
ShuffleNet	A	A	A	A	A	A	link	-	2017
ShuffleNetV2	A	A	A	A	A	A	link	-	2018
MENet	A	A	A	A	A	A	link	link	2018
MobileNet	AE	AE	AE	A	A	AE	link	link	2017
FD-MobileNet	A	A	A	A	A	A	link	link	2018
MobileNetV2	A	A	A	A	A	A	link	link	2018
MobileNetV3	A	A	A	A	-	A	link	link	2019
IGCV3	A	A	A	A	A	A	link	link	2018
GhostNet	a	a	a	-	-	a	link	link	2019
MnasNet	A	A	A	A	A	A	link	-	2018
DARTS	A	A	A	-	-	-	link	link	2018
ProxylessNAS	AE	AE	AE	-	-	AE	link	link	2018
FBNet-C	A	A	A	-	-	A	link	-	2018
Xception	A	A	A	-	-	A	link	link	2016
InceptionV3	A	A	A	-	-	A	link	link	2015
InceptionV4	A	A	A	-	-	A	link	link	2016
InceptionResNetV2	A	A	A	-	-	A	link	link	2016
PolyNet	A	A	A	-	-	A	link	link	2016
NASNet-Large	A	A	A	-	-	A	link	link	2017
NASNet-Mobile	A	A	A	-	-	A	link	link	2017
PNASNet-Large	A	A	A	-	-	A	link	link	2017
SPNASNet	A	A	A	-	-	A	link	link	2019
EfficientNet	A	A	A	A	-	A	link	link	2019
MixNet	A	A	A	-	-	A	link	link	2019
NIN	BCD	BCD	BCD	-	-	-	link	link	2013
RoR-3	BCD	BCD	BCD	-	-	-	link	-	2016
RiR	BCD	BCD	BCD	-	-	-	link	-	2016
ResDrop-ResNet	bcd	bcd	bcd	-	-	-	link	link	2016
Shake-Shake-ResNet	BCD	BCD	BCD	-	-	-	link	link	2017
ShakeDrop-ResNet	bcd	bcd	bcd	-	-	-	link	-	2018
FractalNet	bc	bc	-	-	-	-	link	link	2016
NTS-Net	E	E	E	-	-	-	link	link	2018

Table of implemented segmentation models

Some remarks:

a/A corresponds to Pascal VOC2012.
b/B corresponds to ADE20K.
c/C corresponds to Cityscapes.
d/D corresponds to COCO.
e/E corresponds to CelebAMask-HQ.

Model	Gluon	PyTorch	Chainer	Keras	TF	TF2	Paper	Repo	Year
PSPNet	ABCD	ABCD	ABCD	-	-	ABCD	link	-	2016
DeepLabv3	ABcD	ABcD	ABcD	-	-	ABcD	link	-	2017
FCN-8s(d)	ABcD	ABcD	ABcD	-	-	ABcD	link	-	2014
ICNet	C	C	C	-	-	C	link	link	2017
SINet	C	C	C	-	-	c	link	link	2019
BiSeNet	e	e	e	-	-	e	link	-	2018
DANet	C	C	C	-	-	C	link	link	2018
Fast-SCNN	C	C	C	-	-	C	link	-	2019
CGNet	c	c	c	-	-	c	link	link	2018
DABNet	c	c	c	-	-	c	link	link	2019
FPENet	c	c	c	-	-	c	link	-	2019
ContextNet	-	c	-	-	-	-	link	-	2018
LEDNet	c	c	c	-	-	c	link	-	2019
ESNet	-	c	-	-	-	-	link	-	2019
EDANet	-	c	-	-	-	-	link	link	2018
ENet	-	c	-	-	-	-	link	-	2016
ERFNet	-	c	-	-	-	-	link	-	2017
LinkNet	-	c	-	-	-	-	link	-	2017
SegNet	-	c	-	-	-	-	link	-	2015
U-Net	-	c	-	-	-	-	link	-	2015
SQNet	-	c	-	-	-	-	link	-	2016

Table of implemented object detection models

Some remarks:

a/A corresponds to COCO.

Model	Gluon	PyTorch	Chainer	Keras	TF	TF2	Paper	Repo	Year
CenterNet	a	a	a	-	-	a	link	link	2019

Table of implemented human pose estimation models

Some remarks:

a/A corresponds to COCO.

Model	Gluon	PyTorch	Chainer	Keras	TF	TF2	Paper	Repo	Year
AlphaPose	A	A	A	-	-	A	link	link	2016
SimplePose	A	A	A	-	-	A	link	link	2018
SimplePose(Mobile)	A	A	A	-	-	A	link	-	2018
Lightweight OpenPose	A	A	A	-	-	A	link	link	2018
IBPPose	A	A	A	-	-	A	link	link	2019

Comments

(tf)Resnesta model problems

TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 128, 128, 2, 64). Consider casting elements to a supported type. There is no problem with the other models I use.
bug

opened by Mrmdzz 12

Inconsistent behavior between SimplePose and SimplePoseMobile models

I'm using SimplePose models and train them with my custom dataset generator with this snippet:

print(f'Tensror Flow version: {tf.__version__}')
tf.keras.backend.clear_session()

BATCH_SIZE=64
NUM_KEYPOINTS=14
IMAGE_RES=128
HEATMAP_RES=32

net = tf2cv_get_model("simplepose_mobile_mobilenetv3_large_w1_coco", 
                      pretrained_backbone=True,
                      keypoints=NUM_KEYPOINTS, 
                      return_heatmap=True)

net.build(input_shape=(BATCH_SIZE, IMAGE_RES, IMAGE_RES, 3))
net.heatmap_max_det.build((BATCH_SIZE, HEATMAP_RES, HEATMAP_RES, NUM_KEYPOINTS))
net.summary()
net.compile(optimizer=tf.keras.optimizers.Adam(lr=5e-4), 
            loss=tf.keras.losses.mean_squared_error)

history = net.fit_generator(
  generator=train_data,
  validation_data=valid_data,
  epochs=15
)

And that works or not depending on the model type I choose. If I use any non mobile model ( simplepose_resnet18_coco for example) then everything works, the network trains and predicts accurate results. Whereas if use any mobile model like simplepose_mobile_mobilenetv3_large_w1_coco or simplepose_mobile_resnet18_coco, the above code will break with the following error:

Tensror Flow version: 2.1.0
Downloading /root/.tensorflow/models/mobilenetv3_large_w1-0769-f66596ae.tf2.h5.zip from https://github.com/osmr/imgclsmob/releases/download/v0.0.422/mobilenetv3_large_w1-0769-f66596ae.tf2.h5.zip...
Model: "simple_pose_mobile"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
backbone (Sequential)        multiple                  2996352   
_________________________________________________________________
decoder (Sequential)         multiple                  1798080   
_________________________________________________________________
heatmap_max_det (HeatmapMaxD multiple                  0         
=================================================================
Total params: 4,794,432
Trainable params: 4,768,240
Non-trainable params: 26,192
_________________________________________________________________

TypeError: in converted code:

    /usr/local/lib/python3.6/dist-packages/tf2cv/models/simpleposemobile_coco.py:94 call  *
        heatmap = self.decoder(x, training=training)
    /tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:773 __call__
        outputs = call_fn(cast_inputs, *args, **kwargs)
    /usr/local/lib/python3.6/dist-packages/tf2cv/models/common.py:2016 call  *
        x = self.pix_shuffle(x)
    /tensorflow-2.1.0/python3.6/tensorflow_core/python/keras/engine/base_layer.py:773 __call__
        outputs = call_fn(cast_inputs, *args, **kwargs)
# .... 

    TypeError: Failed to convert object of type <class 'tuple'> to Tensor. Contents: (None, 4, 4, 128, 4). Consider casting elements to a supported type.

It looks to me like the net output is in some unexpected format. See this gist for the full output: https://gist.github.com/grin/d1a9836aca5ca462dbb03527246ba941

What could be causing this error? Am I missing some crucial configuration for mobile models? I would expect the two APIs work similarly.

Thank you

bug

opened by grin 9

Input size segmentation!
I am trying to use pspnet_resnetd101b_voc as follows:

sample_batch_size = 1 channel = 3 height, width = 224.224 dummy_input = torch.randn(sample_batch_size, channel, height, width) out = pspnet_resnetd101b_voc(dummy_input)

But I get following error: Expected more than 1 value per channel when training, got input size torch.Size([1, 512, 1, 1])

What are the right data format for this model or similar as pspnet_resnetd101b_coco?.

Note: With Imagenet pretrained model resnetd101b works fine.
question
opened by MarioProjects 8
Squeezenext sqnxt23v5_w2 callbacks error

checkpoint = ModelCheckpoint('./gdrive/My Drive/MLAI_files/project-images/Full_Dataset/squeezenext_weights.{epoch:02d}-{val_loss:.2f}.hdf5', monitor='val_loss', save_best_only=True, mode='min') callbacks_list = [checkpoint] hist = new_model.fit_generator(generator = my_gen(train_generator), steps_per_epoch = STEP_SIZE_TRAIN, validation_data = my_gen(valid_generator), validation_steps = STEP_SIZE_VALID, epochs = 3, callbacks = callbacks_list)

Epoch 1/3 38/377 [==>...........................] - ETA: 3:54 - loss: 2.0859 - acc: 0.2188/usr/local/lib/python3.6/dist-packages/PIL/Image.py:914: UserWarning: Palette images with Transparency expressed in bytes should be converted to RGBA images 'to RGBA images') 117/377 [========>.....................] - ETA: 2:35 - loss: 2.0482 - acc: 0.2356/usr/local/lib/python3.6/dist-packages/PIL/TiffImagePlugin.py:742: UserWarning: Corrupt EXIF data. Expecting to read 4 bytes but only got 0. warnings.warn(str(msg)) 377/377 [==============================] - 221s 586ms/step - loss: 2.0704 - acc: 0.2301 - val_loss: 5.2917 - val_acc: 0.2820

TypeError Traceback (most recent call last) in () 4 validation_steps = STEP_SIZE_VALID, 5 epochs = 3, ----> 6 callbacks = callbacks_list)

12 frames /usr/local/lib/python3.6/dist-packages/keras/engine/saving.py in get_json_type(obj) 89 return obj.name 90 ---> 91 raise TypeError('Not JSON Serializable: %s' % (obj,)) 92 93 from .. import version as keras_version

TypeError: Not JSON Serializable: <module 'tensorflow' from '/usr/local/lib/python3.6/dist-packages/tensorflow/init.py'>
question

opened by wanx4910 8
Multiple outputs using PSPNet pre-trained model on Cityscapes

Hello, When the PSPNet pre-trained model is used, the output is a tuple of 2 tensors. Can someone clarify what they are and how to achieve 1 output among them such that computation time can be saved?

net = get_model('pspnet_resnetd101b_cityscapes',pretrained_backbone=True,pretrained = True) yp = net(X) yp.size()

This throws an error: AttributeError Traceback (most recent call last) in 1 yp = net(X) # model outputs 2 different learning rates ----> 2 yp.size()

AttributeError: 'tuple' object has no attribute 'size'
question

opened by Swaraj-72 7

No pre-trained weights for the DPN models?

Hello,

I am using fastai to implement some computer vision project. I recently stumbled across this repo and found it a great add-on to my arsenal of models. However, I am trying to create a pretrained DPN98. I found someone in the fastai repo that is having a similar problem with Alexnet (though they didn't specify where they got the model from) here. I also tried DPN 68 with same result.

The traceback I get looks like this:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-21-096da45f530c> in <module>
      1 # Setup the model and
----> 2 learn = cnn_learner(data, dpn98, metrics=[precision, accuracy], callback_fns=ShowGraph)
      3 learn.model_dir='/kaggle/working/'
      4 learn.freeze()

/opt/conda/lib/python3.6/site-packages/fastai/vision/learner.py in cnn_learner(data, base_arch, cut, pretrained, lin_ftrs, ps, custom_head, split_on, bn_final, init, concat_pool, **kwargs)
     96     meta = cnn_config(base_arch)
     97     model = create_cnn_model(base_arch, data.c, cut, pretrained, lin_ftrs, ps=ps, custom_head=custom_head,
---> 98         bn_final=bn_final, concat_pool=concat_pool)
     99     learn = Learner(data, model, **kwargs)
    100     learn.split(split_on or meta['split'])

/opt/conda/lib/python3.6/site-packages/fastai/vision/learner.py in create_cnn_model(base_arch, nc, cut, pretrained, lin_ftrs, ps, custom_head, bn_final, concat_pool)
     84     body = create_body(base_arch, pretrained, cut)
     85     if custom_head is None:
---> 86         nf = num_features_model(nn.Sequential(*body.children())) * (2 if concat_pool else 1)
     87         head = create_head(nf, nc, lin_ftrs, ps=ps, concat_pool=concat_pool, bn_final=bn_final)
     88     else: head = custom_head

/opt/conda/lib/python3.6/site-packages/fastai/callbacks/hooks.py in num_features_model(m)
    118     sz = 64
    119     while True:
--> 120         try: return model_sizes(m, size=(sz,sz))[-1][1]
    121         except Exception as e:
    122             sz *= 2

/opt/conda/lib/python3.6/site-packages/fastai/callbacks/hooks.py in model_sizes(m, size)
    111     "Pass a dummy input through the model `m` to get the various sizes of activations."
    112     with hook_outputs(m) as hooks:
--> 113         x = dummy_eval(m, size)
    114         return [o.stored.shape for o in hooks]
    115 

/opt/conda/lib/python3.6/site-packages/fastai/callbacks/hooks.py in dummy_eval(m, size)
    106 def dummy_eval(m:nn.Module, size:tuple=(64,64)):
    107     "Pass a `dummy_batch` in evaluation mode in `m` with `size`."
--> 108     return m.eval()(dummy_batch(m, size))
    109 
    110 def model_sizes(m:nn.Module, size:tuple=(64,64))->Tuple[Sizes,Tensor,Hooks]:

/opt/conda/lib/python3.6/site-packages/fastai/callbacks/hooks.py in dummy_batch(m, size)
    101 def dummy_batch(m: nn.Module, size:tuple=(64,64))->Tensor:
    102     "Create a dummy batch to go through `m` with `size`."
--> 103     ch_in = in_channels(m)
    104     return one_param(m).new(1, ch_in, *size).requires_grad_(False).uniform_(-1.,1.)
    105 

/opt/conda/lib/python3.6/site-packages/fastai/torch_core.py in in_channels(m)
    261     for l in flatten_model(m):
    262         if hasattr(l, 'weight'): return l.weight.shape[1]
--> 263     raise Exception('No weight layer')
    264 
    265 class ModelOnCPU():

Exception: No weight layer

I'll include the proceeding code I have up to this point to help you get an idea of what environment I am running. Note, this is in Kaggle.

from fastai.vision import *
# For more models
!pip install pytorchcv
from pytorchcv.model_provider import get_model as ptcv_get_model

# Data augs
tfms = get_transforms()

# Get databunch to feed to the network
data = ImageDataBunch.from_folder(path, valid_pct=0.2, size = 1028, bs = 2, ds_tfms = tfms, padding_mode='zeros').normalize(imagenet_stats)

# Custom arch
def dpn98(pretrained=False):
    return ptcv_get_model("dpn98", pretrained=False).features

# Get custom precision metric
precision = Precision(pos_label = 0)

# Setup the model and 
learn = cnn_learner(data, dpn98, metrics=[precision, accuracy], callback_fns=ShowGraph)

question

opened by djpecot 7

How to modify pre-trained models?

Is there a good way to go about modifying the pre-trained models? I want to tweak the forward() function to return activations at a few layers. I'm going to be comparing between several CIFAR models so training them all myself isn't really viable.

Thanks!
question

opened by DWhettam 7
incomplete example

hi, thank you for the great repository. The problem that I came to is that I cannot find a list of classes for which the tf2 models of imagenet1k were trained for. Also the website of imagenet is not very helpfull in that regard.

For example, how could I load an image, classify it and print the class label? An example for how to do this would be very helpful to add.
question

opened by thunderbug1 6
Inference results in weird values

Hi,

I tried to use ImageNet classifier with a pretrained model SE-ResNext50. I added just a Softmax activation. It results in values with mean 0.01 and range about 0.005-0.02, which don't seem like proper softmax predictions. Actual results are incorrect.

If I change 1 line to use torchvision.models.resnet50, inference works like a charm: it detects dogs, cars, etc. Is there anything I'm missing? I'm using pytorchcv==0.0.45.

Btw, maybe you should add __version__ into the package.

Cheers!
question

opened by artyompal 6
questions while a experimental train
Hi Thanks for such a brilliant work.

i was going to train a ghost net model on tf2 by train_tf2.py. And i used a tiny dataset at first which contains 78 images in training set and 40 for val and set epoch as 1, batch size as 4. But it seem cannot stop this single epoch. Let me show you in the code. I put some print and counter into the loop, like.

print(train_img_count) num_epochs = args.num_epochs print(num_epochs) for epoch in range(num_epochs): print(20) #meanless thing conuter= 0 for images, labels in train_data: print(conuter) conuter+=1 train_step(images, labels) print(16) #meanless thing

In this case, the terminal keep printing out the counter . Although it didn't throw any errors or excepts, it just keep runing. The counter is much bigger than the number of train images(300 vs 78). The terminal output going like

Found 78 images belonging to 2 classes. Found 40 images belonging to 2 classes. 78 # train_img_count 1 # num_epochs 20 # that meanless number 0 #counter started from here 1 2 3 4 5 6 ....... 300

Any idea about this?
question
opened by JayFu 5

How to change default output shape?

The imagenet-pretrained model has 1000 classes, but when I only want to replace the last dense layer, I got the error. It looks like classes=1000 cannot be changed. I would suggest add some argument include_top=False just like in tf.keras.applications.ResNet50, which we can customize the last dense layer.

net = kecv_get_model("resnet50", pretrained=True, classes=100) https://github.com/osmr/imgclsmob/blob/4b01a0e635e54d08929d9b340e8d369f5add0275/keras_/kerascv/models/resnet.py#L223

AssertionError                            Traceback (most recent call last)
<ipython-input-26-c5cb45b6fa44> in <module>
----> 1 net = kecv_get_model("resnet50", pretrained=True, classes=100)

~/miniconda3/envs/tf114/lib/python3.6/site-packages/kerascv/model_provider.py in get_model(name, **kwargs)
    246     if name not in _models:
    247         raise ValueError("Unsupported model: {}".format(name))
--> 248     net = _models[name](**kwargs)
    249     return net

~/miniconda3/envs/tf114/lib/python3.6/site-packages/kerascv/models/resnet.py in resnet50(**kwargs)
    585         Location for keeping the model parameters.
    586     """
--> 587     return get_resnet(blocks=50, model_name="resnet50", **kwargs)
    588 
    589 

~/miniconda3/envs/tf114/lib/python3.6/site-packages/kerascv/models/resnet.py in get_resnet(blocks, bottleneck, conv1_stride, width_scale, model_name, pretrained, root, **kwargs)
    376             net=net,
    377             model_name=model_name,
--> 378             local_model_store_dir_path=root)
    379 
    380     return net

~/miniconda3/envs/tf114/lib/python3.6/site-packages/kerascv/models/model_store.py in download_model(net, model_name, local_model_store_dir_path)
    511         file_path=get_model_file(
    512             model_name=model_name,
--> 513             local_model_store_dir_path=local_model_store_dir_path))

~/miniconda3/envs/tf114/lib/python3.6/site-packages/kerascv/models/model_store.py in load_model(net, file_path, skip_mismatch)
    489             _load_weights_from_hdf5_group(
    490                 f=f,
--> 491                 layers=net.layers)
    492 
    493 

~/miniconda3/envs/tf114/lib/python3.6/site-packages/kerascv/models/model_store.py in _load_weights_from_hdf5_group(f, layers)
    391         weight_values = _preprocess_weights_for_loading(
    392             layer=layer,
--> 393             weights=weight_values)
    394         if len(weight_values) != len(symbolic_weights):
    395             raise ValueError('Layer #' + str(k) +

~/miniconda3/envs/tf114/lib/python3.6/site-packages/kerascv/models/model_store.py in _preprocess_weights_for_loading(layer, weights)
    346             weights[0] = np.transpose(weights[0], (2, 3, 0, 1))
    347     for i in range(len(weights)):
--> 348         assert (K.int_shape(layer.weights[i]) == weights[i].shape)
    349     return weights
    350 

AssertionError:

question

opened by zihaozhihao 5

[PyTorch] simplepose_resnet18_coco model weights loading error

The simplepose_resnet18_coco pretrained weights cannot be loaded using pytorchcv.

How to reproduce

Create and activate a new environment

conda create -n avl_simplepose python=3.9
conda activate avl_simplepose

Install pytorchcv and torch packages (pip). Note: the exact torch install command may vary. I used the one from the official site for CUDA 11.6

pip install pytorchcv
pip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116

Make sure no .torch folder exists in the home directory (may not be needed to reproduce the issue).
Run the following commands (+log):

Python 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:58:50) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> from pytorchcv.model_provider import get_model as ptcv_get_model
>>> 
>>> ptcv_get_model("simplepose_resnet18_coco", pretrained=True)
Downloading /home/lorenzo/.torch/models/simplepose_resnet18_coco-6631-7c3656b3.pth.zip from https://github.com/osmr/imgclsmob/releases/download/v0.0.455/simplepose_resnet18_coco-6631-7c3656b3.pth.zip...

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/lorenzo/miniconda3/envs/prova_avl_simplepose/lib/python3.9/site-packages/pytorchcv/model_provider.py", line 1233, in get_model
    net = _models[name](**kwargs)
  File "/home/lorenzo/miniconda3/envs/prova_avl_simplepose/lib/python3.9/site-packages/pytorchcv/models/simplepose_coco.py", line 155, in simplepose_resnet18_coco
    return get_simplepose(backbone=backbone, backbone_out_channels=512, keypoints=keypoints,
  File "/home/lorenzo/miniconda3/envs/prova_avl_simplepose/lib/python3.9/site-packages/pytorchcv/models/simplepose_coco.py", line 129, in get_simplepose
    download_model(
  File "/home/lorenzo/miniconda3/envs/prova_avl_simplepose/lib/python3.9/site-packages/pytorchcv/models/model_store.py", line 827, in download_model
    load_model(
  File "/home/lorenzo/miniconda3/envs/prova_avl_simplepose/lib/python3.9/site-packages/pytorchcv/models/model_store.py", line 804, in load_model
    net.load_state_dict(pretrained_state)
  File "/home/lorenzo/miniconda3/envs/prova_avl_simplepose/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1667, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for SimplePose:
        Missing key(s) in state_dict: "backbone.0.conv.conv.weight", "backbone.0.conv.bn.weight", "backbone.0.conv.bn.bias", "backbone.0.conv.bn.running_mean", "backbone.0.conv.bn.running_var", "backbone.1.unit1.body.conv1.conv.weight", "backbone.1.unit1.body.conv1.bn.weight", "backbone.1.unit1.body.conv1.bn.bias", "backbone.1.unit1.body.conv1.bn.running_mean", "backbone.1.unit1.body.conv1.bn.running_var", "backbone.1.unit1.body.conv2.conv.weight", "backbone.1.unit1.body.conv2.bn.weight", "backbone.1.unit1.body.conv2.bn.bias", "backbone.1.unit1.body.conv2.bn.running_mean", "backbone.1.unit1.body.conv2.bn.running_var", "backbone.1.unit2.body.conv1.conv.weight", "backbone.1.unit2.body.conv1.bn.weight", "backbone.1.unit2.body.conv1.bn.bias", "backbone.1.unit2.body.conv1.bn.running_mean", "backbone.1.unit2.body.conv1.bn.running_var", "backbone.1.unit2.body.conv2.conv.weight", "backbone.1.unit2.body.conv2.bn.weight", "backbone.1.unit2.body.conv2.bn.bias", "backbone.1.unit2.body.conv2.bn.running_mean", "backbone.1.unit2.body.conv2.bn.running_var", "backbone.2.unit1.body.conv1.conv.weight", "backbone.2.unit1.body.conv1.bn.weight", "backbone.2.unit1.body.conv1.bn.bias", "backbone.2.unit1.body.conv1.bn.running_mean", "backbone.2.unit1.body.conv1.bn.running_var", "backbone.2.unit1.body.conv2.conv.weight", "backbone.2.unit1.body.conv2.bn.weight", "backbone.2.unit1.body.conv2.bn.bias", "backbone.2.unit1.body.conv2.bn.running_mean", "backbone.2.unit1.body.conv2.bn.running_var", "backbone.2.unit1.identity_conv.conv.weight", "backbone.2.unit1.identity_conv.bn.weight", "backbone.2.unit1.identity_conv.bn.bias", "backbone.2.unit1.identity_conv.bn.running_mean", "backbone.2.unit1.identity_conv.bn.running_var", "backbone.2.unit2.body.conv1.conv.weight", "backbone.2.unit2.body.conv1.bn.weight", "backbone.2.unit2.body.conv1.bn.bias", "backbone.2.unit2.body.conv1.bn.running_mean", "backbone.2.unit2.body.conv1.bn.running_var", "backbone.2.unit2.body.conv2.conv.weight", "backbone.2.unit2.body.conv2.bn.weight", "backbone.2.unit2.body.conv2.bn.bias", "backbone.2.unit2.body.conv2.bn.running_mean", "backbone.2.unit2.body.conv2.bn.running_var", "backbone.3.unit1.body.conv1.conv.weight", "backbone.3.unit1.body.conv1.bn.weight", "backbone.3.unit1.body.conv1.bn.bias", "backbone.3.unit1.body.conv1.bn.running_mean", "backbone.3.unit1.body.conv1.bn.running_var", "backbone.3.unit1.body.conv2.conv.weight", "backbone.3.unit1.body.conv2.bn.weight", "backbone.3.unit1.body.conv2.bn.bias", "backbone.3.unit1.body.conv2.bn.running_mean", "backbone.3.unit1.body.conv2.bn.running_var", "backbone.3.unit1.identity_conv.conv.weight", "backbone.3.unit1.identity_conv.bn.weight", "backbone.3.unit1.identity_conv.bn.bias", "backbone.3.unit1.identity_conv.bn.running_mean", "backbone.3.unit1.identity_conv.bn.running_var", "backbone.3.unit2.body.conv1.conv.weight", "backbone.3.unit2.body.conv1.bn.weight", "backbone.3.unit2.body.conv1.bn.bias", "backbone.3.unit2.body.conv1.bn.running_mean", "backbone.3.unit2.body.conv1.bn.running_var", "backbone.3.unit2.body.conv2.conv.weight", "backbone.3.unit2.body.conv2.bn.weight", "backbone.3.unit2.body.conv2.bn.bias", "backbone.3.unit2.body.conv2.bn.running_mean", "backbone.3.unit2.body.conv2.bn.running_var", "backbone.4.unit1.body.conv1.conv.weight", "backbone.4.unit1.body.conv1.bn.weight", "backbone.4.unit1.body.conv1.bn.bias", "backbone.4.unit1.body.conv1.bn.running_mean", "backbone.4.unit1.body.conv1.bn.running_var", "backbone.4.unit1.body.conv2.conv.weight", "backbone.4.unit1.body.conv2.bn.weight", "backbone.4.unit1.body.conv2.bn.bias", "backbone.4.unit1.body.conv2.bn.running_mean", "backbone.4.unit1.body.conv2.bn.running_var", "backbone.4.unit1.identity_conv.conv.weight", "backbone.4.unit1.identity_conv.bn.weight", "backbone.4.unit1.identity_conv.bn.bias", "backbone.4.unit1.identity_conv.bn.running_mean", "backbone.4.unit1.identity_conv.bn.running_var", "backbone.4.unit2.body.conv1.conv.weight", "backbone.4.unit2.body.conv1.bn.weight", "backbone.4.unit2.body.conv1.bn.bias", "backbone.4.unit2.body.conv1.bn.running_mean", "backbone.4.unit2.body.conv1.bn.running_var", "backbone.4.unit2.body.conv2.conv.weight", "backbone.4.unit2.body.conv2.bn.weight", "backbone.4.unit2.body.conv2.bn.bias", "backbone.4.unit2.body.conv2.bn.running_mean", "backbone.4.unit2.body.conv2.bn.running_var".

opened by lrzpellegrini 0

Inplace RunError when testing backward of RevNet with PyTorch 1.11.0

Hi, thank you for your good code, recently I've tried to reproduce RevNet with PyTorch 1.11.0, and I use your code. However, I got a RunError as follows:

File ~\Documents\RevNet\revnet.py:71, in ReversibleBlockFunction.backward(ctx, grad_y)
     68 gm = ctx.gm
     70 with torch.autograd.set_detect_anomaly(True):
---> 71     x, y = ctx.saved_variables
     72 # x, y = ctx.saved_tensors
     73 y1, y2 = torch.chunk(y, chunks=2, dim=1)

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [0]], which is output 0 of ReversibleBlockFunctionBackward, is at version 3; expected version 2 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I've searched potential solution on the Internet but I still can not solve it. One solution is to downgrade the pytorch version to 1.4.0 but this version does not support the GPU I use. Could you provide some suggestions for me? I appreciate your help. Thanks!

opened by taokz 0

added a get config method for wrn cifar in order to be able to deserialize it

Not sure if you would like me to do the same for all the other TF models, but I can. Also did not add unit tests, but this is also doable, just tell me.

opened by zaccharieramzi 0
Support the XDG Base Directory Specification

pytorch now supports the XDG Base Directory specification https://github.com/pytorch/pytorch/issues/14693.

The pytorchcv module, provided by this repo is still using the hardcoded ~/.torch/models path.

You can see the "correct" logic for finding the .torch cache directory here.

opened by RuRo 0
PyramidNet maybe wrong.

PyramidNet's residual block(d) is not the same as pre-resnet's(a). It deletes the first ReLU and adds a new BN at the end. I noticed that in the pytorch version pyramidnet.py, the model is built just on pre_conv1x1_block. Waitting for your verification.

opened by IsidoreSong 0
pytorchcv in_size argument

Hi thank you for wonderful DL networks repo.

I have one question to ask about pytorchcv

in pytorch/pytorchcv/models/squeezenext.py I found that SqueezeNext class has in_size argument but never being used.

I would like to modify my input img size form get_model function by changing in_size argument.

Is there any reason you are not using in_size argument currently?

opened by Younghoon-Lee 0