Keras attention models including botnet,CoaT,CoAtNet,CMT,cotnet,halonet,resnest,resnext,resnetd,volo,mlp-mixer,resmlp,gmlp,levit

Overview

Keras_cv_attention_models


Usage

Basic Usage

  • Current under works: CMT, CoAtNet training.
  • Install as pip package:
    pip install -U keras-cv-attention-models
    # Or
    pip install -U git+https://github.com/leondgarse/keras_cv_attention_models
    Refer to each sub directory for detail usage.
  • Basic model prediction
    from keras_cv_attention_models import volo
    mm = volo.VOLO_d1(pretrained="imagenet")
    
    """ Run predict """
    import tensorflow as tf
    from tensorflow import keras
    from skimage.data import chelsea
    img = chelsea() # Chelsea the cat
    imm = keras.applications.imagenet_utils.preprocess_input(img, mode='torch')
    pred = mm(tf.expand_dims(tf.image.resize(imm, mm.input_shape[1:3]), 0)).numpy()
    pred = tf.nn.softmax(pred).numpy()  # If classifier activation is not softmax
    print(keras.applications.imagenet_utils.decode_predictions(pred)[0])
    # [('n02124075', 'Egyptian_cat', 0.9692954),
    #  ('n02123045', 'tabby', 0.020203391),
    #  ('n02123159', 'tiger_cat', 0.006867502),
    #  ('n02127052', 'lynx', 0.00017674894),
    #  ('n02123597', 'Siamese_cat', 4.9493494e-05)]
  • Exclude model top layers by set num_classes=0
    from keras_cv_attention_models import resnest
    mm = resnest.ResNest50(num_classes=0)
    print(mm.output_shape)
    # (None, 7, 7, 2048)

Layers

  • attention_layers is __init__.py only, which imports core layers defined in model architectures. Like RelativePositionalEmbedding from botnet, outlook_attention from volo.
from keras_cv_attention_models import attention_layers
aa = attention_layers.RelativePositionalEmbedding()
print(f"{aa(tf.ones([1, 4, 14, 16, 256])).shape = }")
# aa(tf.ones([1, 4, 14, 16, 256])).shape = TensorShape([1, 4, 14, 16, 14, 16])

Model surgery

  • model_surgery including functions used to change model parameters after built.
from keras_cv_attention_models import model_surgery
# Replace all ReLU with PReLU
mm = model_surgery.replace_ReLU(keras.applications.ResNet50(), target_activation='PReLU')

AotNet

  • Keras AotNet is just a ResNet / ResNetV2 like framework, that set parameters like attn_types and se_ratio and others, which is used to apply different types attention layer.
    # Mixing se and outlook and halo and mhsa and cot_attention, 21M parameters
    # 50 is just a picked number that larger than the relative `num_block`
    from keras_cv_attention_models import aotnet
    attn_types = [None, "outlook", ["mhsa", "halo"] * 50, "cot"]
    se_ratio = [0.25, 0, 0, 0]
    mm = aotnet.AotNet50V2(attn_types=attn_types, se_ratio=se_ratio, deep_stem=True, strides=1)

ResNetD

Model Params Image resolution Top1 Acc Download
ResNet50D 25.58M 224 80.530 resnet50d.h5
ResNet101D 44.57M 224 83.022 resnet101d.h5
ResNet152D 60.21M 224 83.680 resnet152d.h5
ResNet200D 64.69 224 83.962 resnet200d.h5

ResNeXt

Model Params Image resolution Top1 Acc Download
ResNeXt50 (32x4d) 25M 224 79.768 resnext50_imagenet.h5
- SWSL 25M 224 82.182 resnext50_swsl.h5
ResNeXt50D (32x4d + deep) 25M 224 79.676 resnext50d_imagenet.h5
ResNeXt101 (32x4d) 42M 224 80.334 resnext101_imagenet.h5
- SWSL 42M 224 83.230 resnext101_swsl.h5
ResNeXt101W (32x8d) 89M 224 79.308 resnext101_imagenet.h5
- SWSL 89M 224 84.284 resnext101w_swsl.h5

ResNetQ

Model Params Image resolution Top1 Acc Download
ResNet51Q 35.7M 224 82.36 resnet51q.h5

BotNet

Model Params Image resolution Top1 Acc Download
botnet50 21M 224 77.604 botnet50_imagenet.h5
botnet101 41M 224
botnet152 56M 224

VOLO

Model Params Image resolution Top1 Acc Download
volo_d1 27M 224 84.2 volo_d1_224.h5
volo_d1 ↑384 27M 384 85.2 volo_d1_384.h5
volo_d2 59M 224 85.2 volo_d2_224.h5
volo_d2 ↑384 59M 384 86.0 volo_d2_384.h5
volo_d3 86M 224 85.4 volo_d3_224.h5
volo_d3 ↑448 86M 448 86.3 volo_d3_448.h5
volo_d4 193M 224 85.7 volo_d4_224.h5
volo_d4 ↑448 193M 448 86.8 volo_d4_448.h5
volo_d5 296M 224 86.1 volo_d5_224.h5
volo_d5 ↑448 296M 448 87.0 volo_d5_448.h5
volo_d5 ↑512 296M 512 87.1 volo_d5_512.h5

ResNeSt

Model Params Image resolution Top1 Acc Download
resnest50 28M 224 81.03 resnest50.h5
resnest101 49M 256 82.83 resnest101.h5
resnest200 71M 320 83.84 resnest200.h5
resnest269 111M 416 84.54 resnest269.h5

HaloNet

Model Params Image resolution Top1 Acc
HaloNetH0 6.6M 256 77.9
HaloNetH1 9.1M 256 79.9
HaloNetH2 10.3M 256 80.4
HaloNetH3 12.5M 320 81.9
HaloNetH4 19.5M 384 83.3
- 21k 19.5M 384 85.5
HaloNetH5 31.6M 448 84.0
HaloNetH6 44.3M 512 84.4
HaloNetH7 67.9M 600 84.9

CoTNet

Model Params Image resolution FLOPs Top1 Acc Download
CoTNet-50 22.2M 224 3.3 81.3 cotnet50_224.h5
CoTNeXt-50 30.1M 224 4.3 82.1
SE-CoTNetD-50 23.1M 224 4.1 81.6 se_cotnetd50_224.h5
CoTNet-101 38.3M 224 6.1 82.8 cotnet101_224.h5
CoTNeXt-101 53.4M 224 8.2 83.2
SE-CoTNetD-101 40.9M 224 8.5 83.2 se_cotnetd101_224.h5
SE-CoTNetD-152 55.8M 224 17.0 84.0 se_cotnetd152_224.h5
SE-CoTNetD-152 55.8M 320 26.5 84.6 se_cotnetd152_320.h5

CoAtNet

Model Params Image resolution Top1 Acc
CoAtNet-0 25M 224 81.6
CoAtNet-1 42M 224 83.3
CoAtNet-2 75M 224 84.1
CoAtNet-2, ImageNet-21k pretrain 75M 224 87.1
CoAtNet-3 168M 224 84.5
CoAtNet-3, ImageNet-21k pretrain 168M 224 87.6
CoAtNet-3, ImageNet-21k pretrain 168M 512 87.9
CoAtNet-4, ImageNet-21k pretrain 275M 512 88.1
CoAtNet-4, ImageNet-21K + PT-RA-E150 275M 512 88.56

CMT

Model Params Image resolution Top1 Acc
CMTTiny 9.5M 160 79.2
CMTXS 15.2M 192 81.8
CMTSmall 25.1M 224 83.5
CMTBig 45.7M 256 84.5

CoaT

Model Params Image resolution Top1 Acc Download
CoaTLiteTiny 5.7M 224 77.5 coat_lite_tiny_imagenet.h5
CoaTLiteMini 11M 224 79.1 coat_lite_mini_imagenet.h5
CoaTLiteSmall 20M 224 81.9 coat_lite_small_imagenet.h5
CoaTTiny 5.5M 224 78.3 coat_tiny_imagenet.h5
CoaTMini 10M 224 81.0 coat_mini_imagenet.h5

MLP mixer

Model Params Top1 Acc ImageNet Imagenet21k ImageNet SAM
MLPMixerS32 19.1M 68.70
MLPMixerS16 18.5M 73.83
MLPMixerB32 60.3M 75.53 b32_imagenet_sam.h5
MLPMixerB16 59.9M 80.00 b16_imagenet.h5 b16_imagenet21k.h5 b16_imagenet_sam.h5
MLPMixerL32 206.9M 80.67
MLPMixerL16 208.2M 84.82 l16_imagenet.h5 l16_imagenet21k.h5
- input 448 208.2M 86.78
MLPMixerH14 432.3M 86.32
- input 448 432.3M 87.94

ResMLP

Model Params Image resolution Top1 Acc ImageNet
ResMLP12 15M 224 77.8 resmlp12_imagenet.h5
ResMLP24 30M 224 80.8 resmlp24_imagenet.h5
ResMLP36 116M 224 81.1 resmlp36_imagenet.h5
ResMLP_B24 129M 224 83.6 resmlp_b24_imagenet.h5
- imagenet22k 129M 224 84.4 resmlp_b24_imagenet22k.h5

GMLP

Model Params Image resolution Top1 Acc ImageNet
GMLPTiny16 6M 224 72.3
GMLPS16 20M 224 79.6 gmlp_s16_imagenet.h5
GMLPB16 73M 224 81.6

LeViT

Model Params Image resolution Top1 Acc ImageNet
LeViT128S 7.8M 224 76.6 levit128s_imagenet.h5
LeViT128 9.2M 224 78.6 levit128_imagenet.h5
LeViT192 11M 224 80.0 levit192_imagenet.h5
LeViT256 19M 224 81.6 levit256_imagenet.h5
LeViT384 39M 224 82.6 levit384_imagenet.h5

Other implemented keras models


Comments
  • TPU support for VOLO

    TPU support for VOLO

    While trying VOLO with TPU I'm getting this error, any idea how to reolve this?

    InvalidArgumentError: 9 root error(s) found.
      (0) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_127]]
      (1) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5]]
    	 [[tpu_compile_succeeded_assert/_17543318848583046929/_5/_103]]
      (2) Invalid argument: {{function_node __inference_train_function_137027}} Compilation failure: Detected unsupported operations when trying to compile graph cluster_train_function_5876961707884240013[] on XLA_TPU_JIT: ExtractImagePatches (No registered 'ExtractImagePatches' OpKernel for XLA_TPU_JIT devices compatible with node {{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}
    	 (OpKernel was found, but attributes didn't match) Requested Attributes: T=DT_INT64, _xla_inferred_shapes=[[1,?,?,9]], ksizes=[1, 3, 3, 1], padding="VALID", rates=[1, 1, 1, 1], strides=[1, 2, 2, 1], _device="/device:TPU_REPLICATED_CORE"){{node gradient_tape/model/unfold_matmul_fold_3/ExtractImagePatches}}One approach is to outside compile the unsupported ops to run on CPUs by enabling soft placement `tf.config.set_soft_device_placement(True)`. This has a potential performance penalty.
    	TPU compilation failed
    	 [[tpu_compile_succeeded_assert/_17543318848583046929 ... [truncated]
    
    enhancement 
    opened by awsaf49 14
  • Use YoloR with swin transformer as backbone.

    Use YoloR with swin transformer as backbone.

    @leondgarse I am trying to get inference using yolor with swin backbone but getting the following results. What can be the issue?

    from keras_cv_attention_models import efficientnet, yolor
    from keras_cv_attention_models import swin_transformer_v2
    
    from keras_cv_attention_models import efficientnet, yolor
    bb = swin_transformer_v2.SwinTransformerV2Small_window16(input_shape=(256, 256, 3), num_classes=1000)
    model = yolor.YOLOR(backbone=bb) 
    
    from keras_cv_attention_models import test_images
    imm = test_images.dog_cat()
    preds = model(model.preprocess_input(imm))
    bboxs, lables, confidences = model.decode_predictions(preds)[0]
    
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, lables, confidences)
    

    resulting output download

    opened by farazBhatti 10
  • MobileViT

    MobileViT

    Tried to run MobileViT_S model with input shape 256, 256, 3 and got the following error

    UnimplementedError Traceback (most recent call last) in () 2 3 history = model.fit(get_training_dataset_with_oversample(repeat_dataset=True, oversample=True), steps_per_epoch=STEPS_PER_EPOCH, epochs=EPOCHS, ----> 4 validation_data=get_validation_dataset(), validation_steps=VALIDATION_STEPS) 5

    1 frames /usr/local/lib/python3.7/dist-packages/tensorflow/python/framework/ops.py in _numpy(self) 1189 return self._numpy_internal() 1190 except core._NotOkStatusException as e: # pylint: disable=protected-access -> 1191 raise core._status_to_exception(e) from None # pylint: disable=protected-access 1192 1193 @property

    UnimplementedError: 9 root error(s) found. (0) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] (1) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_35/_445]] (2) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/strided_slice_23/_381]] (3) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Pad_8/_407]] (4) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f32[<=32,16,4,2304]{3,2,1,0} dynamic-reshape(f32[<=1024,2,16,144]{3,1,2,0} %transpose.13551, s32[] %divide.13584, s32[] %reshape.13571, s32[] %reshape.13574, s32[] %reshape.13577), metadata={op_type="Reshape" op_name="while/body/_1/while/mobilevit_s/tf.reshape_1/Reshape"} [[{{function_node while_body_1010992}}{{node while/TPUReplicateMetadata}}]] [[while/body/_1/while/Maximum_2/y/_341]] (5) UNIMPLEMENTED: {{function_node __inference_train_function_1032011}} Dynamic input dimension to reshape that is both splitted and combined is not supported %dynamic-reshape.13585 = f3 ... [truncated]

    bug good first issue 
    opened by KyloRen1 10
  • [General Questions] Rough estimates for training time for pre-training CoAtNet?

    [General Questions] Rough estimates for training time for pre-training CoAtNet?

    Hi, 👋 Thanks for such an amazing library and taking out the time to implement so many parts of the CoatNet paper!

    In your CoAtNet README, you mentioned you use TPU accelerators. Could you provide a ballpark for the amount of time it took for you to train the biggest models and the corresponding accelerators? I have a task for which I wish to use scaled-up models, but I'd have to pre-train on Imagenet first because of low data amount (<5-10M) and squeeze out maximum accuracy from fine-tuning.

    I assume there might've been a few bottlenecks also, perhaps data? 🤔 If you could describe your setup, it would be very helpful to my experiments!

    Sorry for bothering you with minor questions, and again thank you for all your work!

    opened by neel04 9
  • Visualize saliency map with the attention models

    Visualize saliency map with the attention models

    It would be great if some functional code could be included for plotting attention maps using the attention models. Such a functionality has been provided for the vision transformer models at https://github.com/faustomorales/vit-keras. Thanks and looking forward.

    enhancement good first issue 
    opened by sivaramakrishnan-rajaraman 9
  • How to save models ?

    How to save models ?

    @leondgarse I want to save the models in saved_model format. How to do that? When I am attempting it, it is showing me the error

    WARNING:tensorflow:Compiled the loaded model, but the compiled metrics have yet to be built. `model.compile_metrics` will be empty until you train or evaluate the model.
    

    What can be the soluion for this?

    Code:

    import os
    from keras_cv_attention_models import mobilevit
    pretrained = '/content/mobilevit_xxs_imagenet.h5'
    model = mobilevit.MobileViT_XXS(pretrained=pretrained)
    model.save('mobilevit_xxs_imagenet1k')
    
    opened by sayannath 7
  • The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    The order of height and width seems wrong in `tf.meshgrid(range(height), range(width))`

    In Line 44 of beit.py, you use tf.meshgrid(range(height), range(width)), while it should be tf.meshgrid(range(width), range(height)), isn't it?

    When I ran the code from Line 44 to Line 52 with height=3 and width=4, it gives the output

    [[17 16 15 10  9  8  3  2  1 -4 -5 -6]
     [18 17 16 11 10  9  4  3  2 -3 -4 -5]
     [19 18 17 12 11 10  5  4  3 -2 -3 -4]
     [24 23 22 17 16 15 10  9  8  3  2  1]
     [25 24 23 18 17 16 11 10  9  4  3  2]
     [26 25 24 19 18 17 12 11 10  5  4  3]
     [31 30 29 24 23 22 17 16 15 10  9  8]
     [32 31 30 25 24 23 18 17 16 11 10  9]
     [33 32 31 26 25 24 19 18 17 12 11 10]
     [38 37 36 31 30 29 24 23 22 17 16 15]
     [39 38 37 32 31 30 25 24 23 18 17 16]
     [40 39 38 33 32 31 26 25 24 19 18 17]], shape=(12, 12), dtype=int32)
    

    which seems incorrect.

    Of course, this is not a problem if you assume height==width, but I think tf.meshgrid(range(width), range(height)) gives more readability and can potentially prevent bugs if height != width is supported in the future.

    bug enhancement 
    opened by xskxzr 6
  • Training of YoloXS Model on Coco dataset

    Training of YoloXS Model on Coco dataset

    Hi, I am currently reproducing the coco training on YoloXS model with line below:

    python leondgarse/coco_train_script.py --det_header yolox.YOLOXS --data_name coco/2014 --batch_size 16

    After my training using 30 epochs, I am getting poor result, as

    # Show result
    from keras_cv_attention_models.coco import data
    data.show_image_with_bboxes(imm, bboxs, labels, confidences, num_classes=80)
    

    b8fb80fa-e897-4a40-a8be-50d4a59b23a1

    Do I have anything configure wrongly? Or any suggestion could I change? Thanks!

    opened by ThePaperFish 5
  • Update for EdgeNeXt

    Update for EdgeNeXt

    I reproduced EdgeNeXt based on torch and your project, Is there any mistake with this code? Why can't it show all layers details,looks like it's missing some layers in “summary”

    import tensorflow as tf
    from tensorflow import keras
    from keras_cv_attention_models.common_layers import (
        layer_norm, activation_by_name
    )
    from tensorflow.keras import initializers
    from keras_cv_attention_models.attention_layers import (
        conv2d_no_bias,
        drop_block,
    )
    import math
    
    BATCH_NORM_DECAY = 0.9
    BATCH_NORM_EPSILON = 1e-5
    TF_BATCH_NORM_EPSILON = 0.001
    LAYER_NORM_EPSILON = 1e-5
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class PositionalEncodingFourier(keras.layers.Layer):
        def __init__(self, hidden_dim=32, dim=768, temperature=10000):
            super(PositionalEncodingFourier, self).__init__()
            self.token_projection = tf.keras.layers.Conv2D(dim, kernel_size=1)
            self.scale = 2 * math.pi
            self.temperature = temperature
            self.hidden_dim = hidden_dim
            self.dim = dim
            self.eps = 1e-6
    
        def __call__(self, B, H, W, *args, **kwargs):
            mask_tf = tf.zeros([B, H, W])
            not_mask_tf = 1 - mask_tf
            y_embed_tf = tf.cumsum(not_mask_tf, axis=1)
            x_embed_tf = tf.cumsum(not_mask_tf, axis=2)
            y_embed_tf = y_embed_tf / (y_embed_tf[:, -1:, :] + self.eps) * self.scale  # 2 * math.pi
            x_embed_tf = x_embed_tf / (x_embed_tf[:, :, -1:] + self.eps) * self.scale  # 2 * math.pi
            dim_t_tf = tf.range(self.hidden_dim, dtype=tf.float32)
            dim_t_tf = self.temperature ** (2 * (dim_t_tf // 2) / self.hidden_dim)
            pos_x_tf = x_embed_tf[:, :, :, None] / dim_t_tf
            pos_y_tf = y_embed_tf[:, :, :, None] / dim_t_tf
            pos_x_tf = tf.reshape(tf.stack([tf.math.sin(pos_x_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_x_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_y_tf = tf.reshape(tf.stack([tf.math.sin(pos_y_tf[:, :, :, 0::2]),
                                            tf.math.cos(pos_y_tf[:, :, :, 1::2])], axis=4),
                                  shape=[B, H, W, self.hidden_dim])
            pos_tf = tf.concat([pos_y_tf, pos_x_tf], axis=-1)
            pos_tf = self.token_projection(pos_tf)
    
            return pos_tf
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"token_projection": self.token_projection, "scale": self.scale,
                                "temperature": self.temperature, "hidden_dim": self.hidden_dim,
                                "dim": self.dim, "eps": self.eps})
            return base_config
    
    
    def EdgeNeXt(input_shape=(256, 256, 3), depths=[3, 3, 9, 3], dims=[24, 48, 88, 168],
                 global_block=[0, 0, 0, 3], global_block_type=['None', 'None', 'None', 'SDTA'],
                 drop_path_rate=1, layer_scale_init_value=1e-6, head_init_scale=1., expan_ratio=4,
                 kernel_sizes=[7, 7, 7, 7], heads=[8, 8, 8, 8], use_pos_embd_xca=[False, False, False, False],
                 use_pos_embd_global=False, d2_scales=[2, 3, 4, 5], epsilon=1e-6, model_name='EdgeNeXt'):
        inputs = keras.layers.Input(input_shape, batch_size=2)
    
        nn = conv2d_no_bias(inputs, dims[0], kernel_size=4, strides=4, padding="valid", name="stem_")
        nn = layer_norm(nn, epsilon=epsilon, name='stem_')
    
        drop_connect_rates = tf.linspace(0, stop=drop_path_rate, num=int(
            sum(depths)))  # drop_connect_rates_split(num_blocks, start=0.0, end=drop_connect_rate)
        cur = 0
        for i in range(4):
            for j in range(depths[i]):
                if j > depths[i] - global_block[i] - 1:
                    if global_block_type[i] == 'SDTA':
                        SDTA_encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                     expan_ratio=expan_ratio, scales=d2_scales[i],
                                     use_pos_emb=use_pos_embd_xca[i], num_heads=heads[i], name='stage_'+str(i)+'_SDTA_encoder_'+str(j))(nn)
                    else:
                        raise NotImplementedError
                else:
                    if i != 0 and j == 0:
                        nn = layer_norm(nn, epsilon=epsilon, name='stage_' + str(i) + '_')
                        nn = conv2d_no_bias(nn, dims[i], kernel_size=2, strides=2, padding="valid",
                                            name='stage_' + str(i) + '_')
    
                    Conv_Encoder(dim=dims[i], drop_path=drop_connect_rates[cur + j],
                                 layer_scale_init_value=layer_scale_init_value,
                                 expan_ratio=expan_ratio, kernel_size=kernel_sizes[i], name='stage_'+str(i)+'_Conv_Encoder_'+str(j) + '_')(nn)  # drop_connect_rates[cur + j]
    
        model = keras.models.Model(inputs, nn, name=model_name)
        return model
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class Conv_Encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4, kernel_size=7, epsilon=1e-6,
                     name=''):
    
            super(Conv_Encoder, self).__init__()
            self.encoder_name = name
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_path = drop_path
            self.dim = dim
            self.expan_ratio = expan_ratio
            self.kernel_size = kernel_size
            self.epsilon = epsilon
    
        def __call__(self, x, *args, **kwargs):
            inputs = x
            x = keras.layers.Conv2D(self.dim, kernel_size=self.kernel_size, padding="SAME", name=self.encoder_name +'Conv2D')(x)
            x = layer_norm(x, epsilon=self.epsilon, name=self.encoder_name)
            x = keras.layers.Dense(self.expan_ratio * self.dim)(x)
            x = activation_by_name(x, activation="gelu")
            x = keras.layers.Dense(self.dim)(x)
            if self.gamma is not None:
                x = self.gamma * x
    
            x = inputs + drop_block(x, drop_rate=0.)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"gamma": self.gamma, "drop_path": self.drop_path,
                                "dim": self.dim, "expan_ratio": self.expan_ratio,
                                "kernel_size": self.kernel_size})
            return base_config
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class SDTA_encoder(keras.layers.Layer):
        def __init__(self, dim, drop_path=0., layer_scale_init_value=1e-6, expan_ratio=4,
                     use_pos_emb=True, num_heads=8, qkv_bias=True, attn_drop=0., drop=0., scales=1, zero_gamma=False,
                     activation='gelu', use_bias=False, name='sdf'):
            super(SDTA_encoder, self).__init__()
            self.expan_ratio = expan_ratio
            self.width = max(int(math.ceil(dim / scales)), int(math.floor(dim // scales)))
            self.width_list = [self.width] * (scales - 1)
            self.width_list.append(dim - self.width * (scales - 1))
            self.dim = dim
            self.scales = scales
            if scales == 1:
                self.nums = 1
            else:
                self.nums = scales - 1
            self.pos_embd = None
            if use_pos_emb:
                self.pos_embd = PositionalEncodingFourier(dim=dim)
            self.xca = XCA(dim, num_heads=num_heads, qkv_bias=qkv_bias, attn_drop=attn_drop, proj_drop=drop)
            self.gamma_xca = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                         name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.gamma = tf.Variable(layer_scale_init_value * tf.ones(dim), trainable=True,
                                     name=name + 'gamma') if layer_scale_init_value > 0 else None
            self.drop_rate = drop_path
            self.drop_path = keras.layers.Dropout(drop_path)
            gamma_initializer = tf.zeros_initializer() if zero_gamma else tf.ones_initializer()
            self.norm = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                        name=name and name + "ln")
            self.norm_xca = keras.layers.LayerNormalization(epsilon=LAYER_NORM_EPSILON, gamma_initializer=gamma_initializer,
                                                            name=name and name + "norm_xca")
            self.activation = activation
            self.use_bias = use_bias
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"width": self.width, "dim": self.dim,
                                "nums": self.nums, "pos_embd": self.pos_embd,
                                "xca": self.xca, "gamma_xca": self.gamma_xca,
                                "gamma": self.gamma, "norm": self.norm,
                                "activation": self.activation, "use_bias": self.use_bias,
                                })
            return base_config
    
        def __call__(self, inputs, *args, **kwargs):
            x = inputs
            spx = tf.split(inputs, self.width_list, axis=-1)
            for i in range(self.nums):
                if i == 0:
                    sp = spx[i]
                else:
                    sp = sp + spx[i]
                sp = keras.layers.Conv2D(self.width, kernel_size=3, padding='SAME')(sp)  # , groups=self.width
                if i == 0:
                    out = sp
                else:
                    out = tf.concat([out, sp], -1)
            inputs = tf.concat([out, spx[self.nums]], -1)
    
            # XCA
            B, H, W, C = inputs.shape
            inputs = tf.reshape(inputs, (-1, H * W, C))  # tf.transpose(), perm=[0, 2, 1])
    
            if self.pos_embd:
                pos_encoding = tf.reshape(self.pos_embd(B, H, W), (-1, H * W, C))
                inputs += pos_encoding
    
            if self.gamma_xca is not None:
                inputs = self.gamma_xca * inputs
            input_xca = self.gamma_xca * self.xca(self.norm_xca(inputs))
            inputs = inputs + drop_block(input_xca, drop_rate=self.drop_rate, name="SDTA_encoder_")
            inputs = tf.reshape(inputs, (-1, H, W, C))
    
            # Inverted Bottleneck
            inputs = self.norm(inputs)
            inputs = keras.layers.Conv2D(self.expan_ratio * self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            inputs = activation_by_name(inputs, activation=self.activation)
            inputs = keras.layers.Conv2D(self.dim, kernel_size=1, use_bias=self.use_bias)(inputs)
            if self.gamma is not None:
                inputs = self.gamma * inputs
    
            x = x + self.drop_path(inputs)
            return x
    
    
    @tf.keras.utils.register_keras_serializable(package="EdgeNeXt")
    class XCA(keras.layers.Layer):
        def __init__(self, dim, num_heads=8, qkv_bias=False, attn_drop=0., proj_drop=0., name=""):
            super(XCA, self).__init__()
            self.num_heads = num_heads
            self.temperature = tf.Variable(tf.ones(num_heads, 1, 1), trainable=True, name=name + 'gamma')
    
            self.qkv = keras.layers.Dense(dim * 3, use_bias=qkv_bias)
            self.attn_drop = keras.layers.Dropout(attn_drop)
            self.k_ini = initializers.GlorotUniform()
            self.b_ini = initializers.Zeros()
            self.proj = keras.layers.Dense(dim, name="out",
                                           kernel_initializer=self.k_ini, bias_initializer=self.b_ini)
            self.proj_drop = keras.layers.Dropout(proj_drop)
    
        def __call__(self, inputs, training=None, *args, **kwargs):
            input_shape = inputs.shape
            qkv = self.qkv(inputs)
            qkv = tf.reshape(qkv, (input_shape[0], input_shape[1], 3,
                                   self.num_heads,
                                   input_shape[2] // self.num_heads))  # [batch, hh * ww, 3, num_heads, dims_per_head]
            qkv = tf.transpose(qkv, perm=[2, 0, 3, 4, 1])  # [3, batch, num_heads, dims_per_head, hh * ww]
            query, key, value = tf.split(qkv, 3, axis=0)  # [batch, num_heads, dims_per_head, hh * ww]
    
            norm_query, norm_key = tf.nn.l2_normalize(tf.squeeze(query), axis=-1, epsilon=1e-6), \
                                   tf.nn.l2_normalize(tf.squeeze(key), axis=-1, epsilon=1e-6)
            attn = tf.matmul(norm_query, norm_key, transpose_b=True)
            attn = tf.transpose(tf.transpose(attn, perm=[0, 2, 3, 1]) * self.temperature, perm=[0, 3, 2, 1])
    
            attn = tf.nn.softmax(attn, axis=-1)
            attn = self.attn_drop(attn, training=training)  # [batch, num_heads, hh * ww, hh * ww]
    
            x = tf.matmul(attn, value)  # [batch, num_heads, hh * ww, dims_per_head]
            x = tf.reshape(x, [input_shape[0], input_shape[1], input_shape[2]])
    
            x = self.proj(x)
            x = self.proj_drop(x)
    
            return x
    
        def get_config(self):
            base_config = super().get_config()
            base_config.update({"num_heads": self.num_heads, "temperature": self.temperature,
                                "qkv": self.qkv, "attn_drop": self.attn_drop,
                                "proj": self.proj, "proj_drop": self.proj_drop})
            return base_config
    
    
    def edgenext_xx_small(pretrained=False, **kwargs):
        # 1.33M & 260.58M @ 256 resolution
        # 71.23% Top-1 accuracy
        # No AA, Color Jitter=0.4, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=51.66 versus 47.67 for MobileViT_XXS
        # For A100: FPS @ BS=1: 212.13 & @ BS=256: 7042.06 versus FPS @ BS=1: 96.68 & @ BS=256: 4624.71 for MobileViT_XXS
        model = EdgeNeXt(depths=[2, 2, 6, 2], dims=[24, 48, 88, 168], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_x_small(pretrained=False, **kwargs):
        # 2.34M & 538.0M @ 256 resolution
        # 75.00% Top-1 accuracy
        # No AA, No Mixup & Cutmix, DropPath=0.0, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=31.61 versus 28.49 for MobileViT_XS
        # For A100: FPS @ BS=1: 179.55 & @ BS=256: 4404.95 versus FPS @ BS=1: 94.55 & @ BS=256: 2361.53 for MobileViT_XS
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[32, 64, 100, 192], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         heads=[4, 4, 4, 4],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    def edgenext_small(pretrained=False, **kwargs):
        # 5.59M & 1260.59M @ 256 resolution
        # 79.43% Top-1 accuracy
        # AA=True, No Mixup & Cutmix, DropPath=0.1, BS=4096, lr=0.006, multi-scale-sampler
        # Jetson FPS=20.47 versus 18.86 for MobileViT_S
        # For A100: FPS @ BS=1: 172.33 & @ BS=256: 3010.25 versus FPS @ BS=1: 93.84 & @ BS=256: 1785.92 for MobileViT_S
        model = EdgeNeXt(depths=[3, 3, 9, 3], dims=[48, 96, 160, 304], expan_ratio=4,
                         global_block=[0, 1, 1, 1],
                         global_block_type=['None', 'SDTA', 'SDTA', 'SDTA'],
                         use_pos_embd_xca=[False, True, False, False],
                         kernel_sizes=[3, 5, 7, 9],
                         d2_scales=[2, 2, 3, 4],
                         **kwargs)
    
        return model
    
    
    if __name__ == '__main__':
        model = edgenext_small()
        model.summary()
        # from download_and_load import keras_reload_from_torch_model
        # keras_reload_from_torch_model(
        #     'D:\GitHub\EdgeNeXt\edgenext_small.pth',
        #     keras_model=model,
        #     # tail_align_dict=tail_align_dict,
        #     # full_name_align_dict=full_name_align_dict,
        #     # additional_transfer=additional_transfer,
        #     input_shape=(256, 256),
        #     do_convert=True,
        #     save_name="adaface_ir101_webface4m.h5",
        # )
    
    
    
    ```
    
    
    
    
    
    opened by whalefa1I 5
  • custom layer issue at tflite conversion

    custom layer issue at tflite conversion

    Hi, thanks for the good references.

    I have implemented MobileViT with your package, and tried to convert the trained model into tflite format. At there, I met an error saying,

    Unknown layer: Addons>GroupNormalization. Please ensure this object is passed to the `custom_objects` argument. See https://www.tensorflow.org/guide/keras/save_and_serialize#registering_the_custom_object for details
    

    I tried to addd custom layer name as a parameter of model load, but still facing the issue.

    model = tf.keras.models.load_model('./checkpoints/model_best.h5', custom_objects={'AttentionLayer': AttentionLayer})
    

    Is there any way to solve this?

    Thanks,

    bug good first issue 
    opened by mhyeonsoo 4
  • coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    coat.CoaTMini(input_shape=(200, 240, 1) ...) error: Dimensions must be equal, but are 730 and 677 for ...

    Hi,

    I try to train a model with:

        model = coat.CoaTMini(input_shape=(200, 240, 1), num_classes=240, pretrained=None)
    

    but the model cannot be build, error out with:

    ValueError: Exception encountered when calling layer "tf.__operators__.add" (type TFOpLambda).
    
    Dimensions must be equal, but are 730 and 677 for '{{node tf.__operators__.add/AddV2}} = AddV2[T=DT_FLOAT](Placeholder, Placeholder_1)' with input shapes: [?,730,216], [?,677,216].
    
    Call arguments received:
      • x=tf.Tensor(shape=(None, 730, 216), dtype=float32)
      • y=tf.Tensor(shape=(None, 677, 216), dtype=float32)
      • name=None
    

    I just wonder something wrong with coat?

    Thanks.

    bug enhancement 
    opened by mw66 4
  • Can you provide the code for converting pytorch weights to tf?

    Can you provide the code for converting pytorch weights to tf?

    Hi. Can you provide the code for converting pytorch weights to tf, such as beit. Because I wanted to try the effect of beitv2's pre-training weights. Thanks!

    opened by 131404060321 1
  • tflite conversion - GPU/XNNPACK fails

    tflite conversion - GPU/XNNPACK fails

    Hi! Thanks for great repo! I have converted the EfficientFormer model to tflite. However, applying both XNNPACK and GPU delegates fail.

    GPU delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite delegate for GPU. Failed to apply GPU delegate. Benchmarking failed.

    XNNPACK delegate created. INFO: Initialized TensorFlow Lite runtime. INFO: Created TensorFlow Lite XNNPACK delegate for CPU. Failed to apply XNNPACK delegate. Benchmarking failed.

    Do you know what could be the issue? Im using latest tensorflow version for conversion.

    opened by macsmy 3
Releases(yolov7)
Owner
null
Unofficial implementation of MLP-Mixer: An all-MLP Architecture for Vision

MLP-Mixer: An all-MLP Architecture for Vision This repo contains PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision. Usage : impo

Rishikesh (ऋषिकेश) 175 Dec 23, 2022
Implements MLP-Mixer: An all-MLP Architecture for Vision.

MLP-Mixer-CIFAR10 This repository implements MLP-Mixer as proposed in MLP-Mixer: An all-MLP Architecture for Vision. The paper introduces an all MLP (

Sayak Paul 51 Jan 4, 2023
Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision

MLP Mixer Implementation for paper MLP-Mixer: An all-MLP Architecture for Vision. Give us a star if you like this repo. Author: Github: bangoc123 Emai

Ngoc Nguyen Ba 86 Dec 10, 2022
Implementation of ResMLP, an all MLP solution to image classification, in Pytorch

ResMLP - Pytorch Implementation of ResMLP, an all MLP solution to image classification out of Facebook AI, in Pytorch Install $ pip install res-mlp-py

Phil Wang 178 Dec 2, 2022
Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Implementation of gMLP, an all-MLP replacement for Transformers, in Pytorch

Phil Wang 383 Jan 2, 2023
Pytorch implementation of MLP-Mixer with loading pre-trained models.

MLP-Mixer-Pytorch PyTorch implementation of MLP-Mixer: An all-MLP Architecture for Vision with the function of loading official ImageNet pre-trained p

Qiushi Yang 2 Sep 29, 2022
PyTorch implementation of MLP-Mixer

PyTorch implementation of MLP-Mixer MLP-Mixer: an all-MLP architecture composed of alternate token-mixing and channel-mixing operations. The token-mix

Duo Li 33 Nov 27, 2022
Unofficial Implementation of MLP-Mixer in TensorFlow

mlp-mixer-tf Unofficial Implementation of MLP-Mixer [abs, pdf] in TensorFlow. Note: This project may have some bugs in it. I'm still learning how to i

Rishabh Anand 24 Mar 23, 2022
Vision Transformer and MLP-Mixer Architectures

Vision Transformer and MLP-Mixer Architectures Update (2.7.2021): Added the "When Vision Transformers Outperform ResNets..." paper, and SAM (Sharpness

Google Research 6.4k Jan 4, 2023
Unofficial Implementation of MLP-Mixer, Image Classification Model

MLP-Mixer Unoffical Implementation of MLP-Mixer, easy to use with terminal. Train and test easly. https://arxiv.org/abs/2105.01601 MLP-Mixer is an arc

Oğuzhan Ercan 6 Dec 5, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 3, 2023
This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".

AS-MLP architecture for Image Classification Model Zoo Image Classification on ImageNet-1K Network Resolution Top-1 (%) Params FLOPs Throughput (image

SVIP Lab 106 Dec 12, 2022
CoaT: Co-Scale Conv-Attentional Image Transformers

CoaT: Co-Scale Conv-Attentional Image Transformers Introduction This repository contains the official code and pretrained models for CoaT: Co-Scale Co

mlpc-ucsd 191 Dec 3, 2022
LeViT a Vision Transformer in ConvNet's Clothing for Faster Inference

LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference This repository contains PyTorch evaluation code, training code and pretrained

Facebook Research 504 Jan 2, 2023
VOLO: Vision Outlooker for Visual Recognition

VOLO: Vision Outlooker for Visual Recognition, arxiv This is a PyTorch implementation of our paper. We present Vision Outlooker (VOLO). We show that o

Sea AI Lab 876 Dec 9, 2022
CMT: Convolutional Neural Networks Meet Vision Transformers

CMT: Convolutional Neural Networks Meet Vision Transformers [arxiv] 1. Introduction This repo is the CMT model which impelement with pytorch, no refer

FlyEgle 83 Dec 30, 2022
Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch

Segformer - Pytorch Implementation of Segformer, Attention + MLP neural network for segmentation, in Pytorch. Install $ pip install segformer-pytorch

Phil Wang 208 Dec 25, 2022
🍀 Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

?? Pytorch implementation of various Attention Mechanisms, MLP, Re-parameter, Convolution, which is helpful to further understand papers.⭐⭐⭐

xmu-xiaoma66 7.7k Jan 5, 2023
Classification models 1D Zoo - Keras and TF.Keras

Classification models 1D Zoo - Keras and TF.Keras This repository contains 1D variants of popular CNN models for classification like ResNets, DenseNet

Roman Solovyev 12 Jan 6, 2023