This is an official implementation for "SimMIM: A Simple Framework for Masked Image Modeling".

Related tags

Deep Learning SimMIM
Overview

Project

This repo has been populated by an initial template to help get you started. Please make sure to update the content to build a great experience for community-building.

As the maintainer of this project, please make a few updates:

  • Improving this README.MD file to provide a great experience
  • Updating SUPPORT.MD with content about this project's support experience
  • Understanding the security reporting process in SECURITY.MD
  • Remove this section from the README

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Comments
  • Confusion about fine-tune

    Confusion about fine-tune

    Thank you very much for your work. I tried to use your method to conduct pre-training on my data set, and then compared with the initialization model and imagenet supervised training model respectively. The results showed that the convergence speed of SimMIM pre-training model was similar to the initialization model, although the accuracy would be gradually better than the initialization model after several iterations. But not as good as the convergence of the supervised training model. Is your method as described by MAE in Fine-tune : still improves accuracy after many iterations? Looking forward to your reply.

    opened by Breeze-Zero 8
  • why masking embedding feature maps but not input images

    why masking embedding feature maps but not input images

    Hi! This is a wonderful work. After reading the paper, my intuition is that the input image will be masked for reconstruction. Then, while I read the code, I was surprised to find that you did not mask the input image directly, rather you masked the features maps output by Patch_Embed. See here https://github.com/microsoft/SimMIM/blob/519ae7b0999b9d720daa61e3848cd41b8fbd9978/models/simmim.py#L36

    I was wondering what is the reasoning behind this approach and how is it different from the more intuitive one of masking the original image directly. Thank you.

    opened by baibaidj 5
  • How can you resolve the mismatch of patch_size in <patch_embed> module between pretrained model and finetuned model?

    How can you resolve the mismatch of patch_size in module between pretrained model and finetuned model?

    As far as I know, for ViT experiment: you use random mask with patch size 32. But when running fine-tuning, you use a default patch size of 16. I got confused on how you tackle the mismatch in <patch_size> between these. As I dig into your utils.py, the load_pretrained function only resolves the mismatch patch size for Relative Position Embeddings. Please correct me if I miss some things?

    opened by hao-pt 2
  • USE_RPB and USE_SHARED_RPB are inconsistent in pretrain and finetune

    USE_RPB and USE_SHARED_RPB are inconsistent in pretrain and finetune

    Hi, Thanks for the amazing work and share the oss code! I was wondering if you could share the reason of this?

    MODEL.VIT.USE_RPB: True in simmim_finetune__vit_base__img224__800ep.yaml MODEL.VIT.USE_SHARED_RPB: True in simmim_pretrain__vit_base__img224__800ep.yaml

    Do we expect use the same RPB between pretrain and finetune?

    Thanks,

    opened by haooooooqi 1
  • About data augment

    About data augment

    thanks for sharing your excellent work! i have a question about data augment in pretraining: have you try other data augment like RandomResizedCrop() + RandomHorizontalFlip() + RandomVerticalFlip() or other compose? and RandomResizedCrop() + RandomHorizontalFlip() work the best?

    opened by peiyingxin 1
  • Information about relative positional encoding

    Information about relative positional encoding

    Hi thanks for sharing your work. I was wondering if someone can provide we with some additional information about the following line (and the two lines after that):

    https://github.com/microsoft/SimMIM/blob/519ae7b0999b9d720daa61e3848cd41b8fbd9978/models/vision_transformer.py#L78

    Why exactly do we subtract 3, 2 and 1 respectively? And isn't the statement on line 80 overwriting the statement two lines prior on line 78?

    What exactly does the relative_position_index matrix hold?

    opened by marc345 1
  • A question about masking strategy

    A question about masking strategy

    For Swin Transformer, the default masked patch size is 32x32, but the first patch embedding layer needs 4x4 patch size. So I want know if the learned masked token is designed for 4x4 patch size, but you also use 32x32 masking strategy?

    opened by xiaohu2015 1
  • any plan to release the code of DDP training of swinV2-G with multi machine?

    any plan to release the code of DDP training of swinV2-G with multi machine?

    Hi~, any plan to release the codes of DDP training of swinV2-G with multi machine in SimMIM? we want to do some research on big model(swinV2-G with 3B parameters) based on your research. Thanks for replay~

    opened by dongzhiwu 0
  • Questions about AvgDist

    Questions about AvgDist

    Hi @caoyue10 @ancientmooner Thanks a lot for your wonderful job! And I have a question about the AvgDist metric in the paper. In the experiment, you mention the "AvgDist" which is a newly proposed metric. However, I haven't found the reference of the metric in the paper or implemention in the code. So could you please give me any suggestion about this metric?

    Again, thanks a lot for your contributions~

    opened by susan-sun1999 0
  • Any plan to support 3D version?

    Any plan to support 3D version?

    Hi, thanks for the great codes! Just wondering if you have any plan to support the 3D version of SimMIM (Video Swin Masked Encoding)? I think there is some paper about video MAE but it is not in Swin styles...

    opened by james20141606 0
  • The specific configs and code for downstream tasks, like semantic segmentation and object detection

    The specific configs and code for downstream tasks, like semantic segmentation and object detection

    Nice work! I want to know the specific configs or code for semantic segmentation in ADE20K and object detection in COCO. When would you release the configs or code for these experiments?

    opened by Haoqing-Wang 0
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
Official implementation of AAAI-21 paper "Label Confusion Learning to Enhance Text Classification Models"

Description: This is the official implementation of our AAAI-21 accepted paper Label Confusion Learning to Enhance Text Classification Models. The str

null 101 Nov 25, 2022
Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

null 49 Nov 23, 2022
The official implementation of NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021]. https://arxiv.org/pdf/2101.12378.pdf

NeMo: Neural Mesh Models of Contrastive Features for Robust 3D Pose Estimation [ICLR-2021] Release Notes The offical PyTorch implementation of NeMo, p

Angtian Wang 76 Nov 23, 2022
StyleGAN2-ADA - Official PyTorch implementation

Abstract: Training generative adversarial networks (GAN) using too little data typically leads to discriminator overfitting, causing training to diverge. We propose an adaptive discriminator augmentation mechanism that significantly stabilizes training in limited data regimes.

NVIDIA Research Projects 3.2k Dec 30, 2022
Official implementation of the ICLR 2021 paper

You Only Need Adversarial Supervision for Semantic Image Synthesis Official PyTorch implementation of the ICLR 2021 paper "You Only Need Adversarial S

Bosch Research 272 Dec 28, 2022
Official PyTorch implementation of Joint Object Detection and Multi-Object Tracking with Graph Neural Networks

This is the official PyTorch implementation of our paper: "Joint Object Detection and Multi-Object Tracking with Graph Neural Networks". Our project website and video demos are here.

Richard Wang 443 Dec 6, 2022
Official implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis https://arxiv.org/abs/2011.13775

CIPS -- Official Pytorch Implementation of the paper Image Generators with Conditionally-Independent Pixel Synthesis Requirements pip install -r requi

Multimodal Lab @ Samsung AI Center Moscow 201 Dec 21, 2022
Official pytorch implementation of paper "Image-to-image Translation via Hierarchical Style Disentanglement".

HiSD: Image-to-image Translation via Hierarchical Style Disentanglement Official pytorch implementation of paper "Image-to-image Translation

null 364 Dec 14, 2022
Official pytorch implementation of paper "Inception Convolution with Efficient Dilation Search" (CVPR 2021 Oral).

IC-Conv This repository is an official implementation of the paper Inception Convolution with Efficient Dilation Search. Getting Started Download Imag

Jie Liu 111 Dec 31, 2022
Official PyTorch Implementation of Unsupervised Learning of Scene Flow Estimation Fusing with Local Rigidity

UnRigidFlow This is the official PyTorch implementation of UnRigidFlow (IJCAI2019). Here are two sample results (~10MB gif for each) of our unsupervis

Liang Liu 28 Nov 16, 2022
Official implementation of our paper "LLA: Loss-aware Label Assignment for Dense Pedestrian Detection" in Pytorch.

LLA: Loss-aware Label Assignment for Dense Pedestrian Detection This project provides an implementation for "LLA: Loss-aware Label Assignment for Dens

null 35 Dec 6, 2022
Official implementation of Self-supervised Graph Attention Networks (SuperGAT), ICLR 2021.

SuperGAT Official implementation of Self-supervised Graph Attention Networks (SuperGAT). This model is presented at How to Find Your Friendly Neighbor

Dongkwan Kim 127 Dec 28, 2022
An official implementation of "SFNet: Learning Object-aware Semantic Correspondence" (CVPR 2019, TPAMI 2020) in PyTorch.

PyTorch implementation of SFNet This is the implementation of the paper "SFNet: Learning Object-aware Semantic Correspondence". For more information,

CV Lab @ Yonsei University 87 Dec 30, 2022
This project is the official implementation of our accepted ICLR 2021 paper BiPointNet: Binary Neural Network for Point Clouds.

BiPointNet: Binary Neural Network for Point Clouds Created by Haotong Qin, Zhongang Cai, Mingyuan Zhang, Yifu Ding, Haiyu Zhao, Shuai Yi, Xianglong Li

Haotong Qin 59 Dec 17, 2022
Official code implementation for "Personalized Federated Learning using Hypernetworks"

Personalized Federated Learning using Hypernetworks This is an official implementation of Personalized Federated Learning using Hypernetworks paper. [

Aviv Shamsian 121 Dec 25, 2022
StyleGAN2 - Official TensorFlow Implementation

StyleGAN2 - Official TensorFlow Implementation

NVIDIA Research Projects 10.1k Dec 28, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
Official implementation of "GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators" (NeurIPS 2020)

GS-WGAN This repository contains the implementation for GS-WGAN: A Gradient-Sanitized Approach for Learning Differentially Private Generators (NeurIPS

null 46 Nov 9, 2022
Official PyTorch implementation of Spatial Dependency Networks.

Spatial Dependency Networks: Neural Layers for Improved Generative Image Modeling Đorđe Miladinović   Aleksandar Stanić   Stefan Bauer   Jürgen Schmid

Djordje Miladinovic 34 Jan 19, 2022