Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Overview

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation

Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar

[arXiv] [Project] [BibTeX]


Features

  • A single architecture for panoptic, instance and semantic segmentation.
  • Support major segmentation datasets: ADE20K, Cityscapes, COCO, Mapillary Vistas.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for Mask2Former.

See Getting Started with Mask2Former.

Advanced usage

See Advanced Usage of Mask2Former.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Mask2Former Model Zoo.

License

Shield: CC BY-NC 4.0

The majority of Mask2Former is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

CC BY-NC 4.0

However portions of the project are available under separate license terms: Swin-Transformer-Semantic-Segmentation is licensed under the MIT license, Deformable-DETR is licensed under the Apache-2.0 License.

Citing Mask2Former

If you use Mask2Former in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@article{cheng2021mask2former,
  title={Masked-attention Mask Transformer for Universal Image Segmentation},
  author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
  journal={arXiv},
  year={2021}
}

If you find the code useful, please also consider the following BibTeX entry.

@inproceedings{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={NeurIPS},
  year={2021}
}

Acknowledgement

Code is largely based on MaskFormer (https://github.com/facebookresearch/MaskFormer).

Comments
  • Colab demo doesn't work

    Colab demo doesn't work

    I would love to try out this model but I am struggling with installation. The Colab demo does not work either and gives the error:

    ---------------------------------------------------------------------------
    ModuleNotFoundError                       Traceback (most recent call last)
    /content/Mask2Former/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py in <module>()
         21 try:
    ---> 22     import MultiScaleDeformableAttention as MSDA
         23 except ModuleNotFoundError as e:
    
    ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'
    
    During handling of the above exception, another exception occurred:
    
    ModuleNotFoundError                       Traceback (most recent call last)
    7 frames
    /content/Mask2Former/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py in <module>()
         27         "\t`sh make.sh`\n"
         28     )
    ---> 29     raise ModuleNotFoundError(info_string)
         30 
         31 
    
    ModuleNotFoundError: 
    
    Please compile MultiScaleDeformableAttention CUDA op with the following commands:
    	`cd mask2former/modeling/pixel_decoder/ops`
    	`sh make.sh`
    
    
    ---------------------------------------------------------------------------
    NOTE: If your import is failing due to a missing package, you can
    manually install dependencies using either !pip or !apt.
    
    To view examples of installing some common dependencies, click the
    "Open Examples" button below.
    ---------------------------------------------------------------------------
    
    opened by blnfb 12
  • A training problem about Global alloc not supported yet

    A training problem about Global alloc not supported yet

    I created a new running environment for mask2former according to the steps. When I train the COCO dataset, I can train normally, but when I train my dataset, I encounter the following problems.

    2021-12-09 11-50-45 的屏幕截图

    I've been looking for a solution on Google for a long time, so I'd like to ask if you have any similar problems. Thank you very much for your reply.

    opened by xiehousen 10
  • The result of swin-small backbone on ADE

    The result of swin-small backbone on ADE

    Hi,

    I run Mask2Former on ADE (maskformer2_swin_small_bs16_160k.yaml) with 4 16GB V-100 GPUs. However, I can only achieve 49.6%, which is much worse than the reported result (51.3%). Could you provide the log for me to analysize the result?

    Thanks

    opened by zhihou7 7
  • reproduction of Panoptic segmentation on COCO

    reproduction of Panoptic segmentation on COCO

    Hi thank you for your excellent work. I meet a problem when re-run your experiments.

    I tried to follow your advice in Getting Started with Mask2Former, and run: python train_net.py --num-gpus 8 \ --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml after training, the log showed that "Start inference on 625 batches". But after a few days, there are still no new logs. So I kill this process and run python train_net.py \ --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ --eval-only MODEL.WEIGHTS ./output/model_0094999.pth, after evaluation, it showed that the result was image was lower that the result from Table 1 in the paper, image could you help me see what is the reason for this ^ ^

    opened by wmkai 5
  • Train on custom dataset for panoptic segmentation

    Train on custom dataset for panoptic segmentation

    I‘ve been trying to used it for a nuclei panoptic segmentation task. Dataset is prepared like ADE20K panoptic do. However, in evalutaion, it doesn't proposed any instance after a period time of training.

    File "/home/---/anaconda3/envs/mask2former/lib/python3.8/site-packages/panopticapi/evaluation.py", line 224, in pq_compute results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing) File "/home/---/anaconda3/envs/mask2former/lib/python3.8/site-packages/panopticapi/evaluation.py", line 73, in pq_average return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results ZeroDivisionError: division by zero

    There several possible reasons accounting for it I assume:

    • Dataset not well prepared: Are semantic and instance images label folder a must for panoptic? The labeled data I owned is not Detectorn2 format. But I referred to prepare_ade20k_sem_seg, prepare_ade20k_ins_seg and prepare_ade20k_pan_seg. Converted the labeled data to panoptic images (in a folder) and label json file. Commented the line "sem_seg_file_name": sem_label_file, in dataset_dict.
    • Configure file not well modified: Another reason maybe model not convergen. Is there any configuration like Mask-RCNN's anchor size or ratio in panopitc segmentation? Because nuclei in whole slide images (crop multiple patches in size 256*256, with one nuclei around (8~16)*(8~16) pixels) is rather small compared to common things in a natural image captioned by camera.
    opened by JasonRichard 5
  • Implement Transfer Learning API / Instructions

    Implement Transfer Learning API / Instructions

    It would be helpful if we could piggy back on your library of pre-trained model for transfer learning. Perhaps this may be accomplished by freezing the first 6 (2L) of 9 layers of Mask2Formers transformer decoder.

    Usage may look like this:

    export DETECTRON2_DATASETS=/path/to/dir/containing/new/dataset

    python train_net.py \
      --config-file <pretrained model config> \
      --pretrained-model /path/to/checkpoint_file
      --transfer-learning-dataset <new dataset>
    
    opened by lapp0 4
  • support ConvNeXt backbone

    support ConvNeXt backbone

    I've found a really nice backbone ConvNeXt, and I would like to use it with Mask2Former, so that I can achieve a new SOTA result in my own task.

    I've changed the code and tested it on the ADE20K dataset with convnext_base_22k_1k_384.pth as backbone weights. I can achieve a significantly better result(mIoU 55.06 VS 52.4(result from your repo)). My training script is as follows: image

    
    python train_net.py \
    --num-gpus 8 \  
    --config-file configs/ade20k/semantic-segmentation/convnext/maskformer2_convnext_base_384_160k_res640.yaml \  
    MODEL.WEIGHTS pretraind_model_weights/backbone/convnext_base_22k_1k_384.pkl \  
    SOLVER.IMS_PER_BATCH 64 \  
    SOLVER.BASE_LR 4e-4 \  
    SOLVER.MAX_ITER 50000
    

    Unfortunately, the repo has only released ImageNet-1K/22K pre-trained model. I was wondering If you guys could release the pre-trained model with the coco panoptic segmentation task since the model is too large and we can hardly train from scratch on our personal GPUs.

    Thanks!

    CLA Signed 
    opened by huliang2016 4
  • How to visualize the VIS results?

    How to visualize the VIS results?

    Hi,

    Thanks for your wonderful work and repo.

    Could you please provide the instructions on how to visualize the video instance segmentation results on images or videos? Thanks!

    opened by wjn922 4
  • ImportError: .../MultiScaleDeformableAttention.cpython-38-X86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8dataptrIdEEPT_v

    ImportError: .../MultiScaleDeformableAttention.cpython-38-X86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8dataptrIdEEPT_v

    Hello! Thank you for sharing. I follow your Example conda environment setup: image But when I run the code: image It doesnt work successifully. It makes an ImportError image

    opened by Wenzhiqiang16 4
  • Force cuda since torch ask for a device, not if cuda is in fact avail…

    Force cuda since torch ask for a device, not if cuda is in fact avail…

    Force cuda since torch ask for a device, not if cuda is in fact available.

    torch.cuda.is_available() is misleading since it actually checks for an available device, not if cuda is available. Building on build-nodes where a gpu device is not necessarily available then fails. Force cuda through FORCE_CUDA=1 to allow building.

    CLA Signed 
    opened by ccoulombe 4
  • Where can I download the initial model weights to train from scratch?

    Where can I download the initial model weights to train from scratch?

    Hi,

    Thank you for this great repository. Can you please advise from where I can download the initial weights mentioned in the config files? For eg: I was not able to find "swin_base_patch4_window12_384.pkl" mentioned in the config file 'Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml' for training from scratch.

    I could only find the pre-trained models in Mask2Former Model Zoo.

    Thanks.

    opened by rgkannan676 3
  • Running mask2former without a GPU

    Running mask2former without a GPU

    Hello, I am trying to run mask2former on my local laptop, but it requires a GPU which I do not have. I know that I can run it using Google Colab, but I am unfamiliar with how to use the platform and am not sure if I can manipulate the package in order to visualize the mask binary for the main object in the image. If I do not have a GPU on my computer, what are my options for running mask2former and manipulating the package as I need to? Can you please advise me on what to do in this situation?

    opened by ali20211 1
  • visualize the binary mask of the output (panoptic_result, instance_result, semantic_result)

    visualize the binary mask of the output (panoptic_result, instance_result, semantic_result)

    Dear authors of the mask2former paper, I just wanted to express my gratitude and appreciation for the excellent work you have done on the mask2former project. Your efforts and contributions have greatly benefited the research community, and I have personally found the insights and results presented in your paper to be incredibly valuable and informative. Thank you for your dedication and commitment to advancing the field. I am really sorry that I have a question, when i run the code with another dataset such as cifar data, i saw the output of segmentation is not good like the coco dataset or maybe because i didn't recognize very well that.

    So my question, can anyone provide me a code to visualize the mask binary of the output(panoptic_result, instance_result, semantic_result) to see the main object in order to see the boundaries around the object?

    Thank you so much in advance for your answer

    visualization of the mask binary of the output(panoptic_result, instance_result, semantic_result) to see the the main object.

    opened by ali20211 0
  • Some Instance Segmentation COCOEvaluator mAP scores are suspiciously round numbers (e.g. 90.000 mAP) or

    Some Instance Segmentation COCOEvaluator mAP scores are suspiciously round numbers (e.g. 90.000 mAP) or "NaN"

    Everything works fine with the public COCO dataset. However when using my custom dataset for instance segmentation in a polygon format, when training with Mask2Former some class mAP scores are suspiciously round numbers (e.g. 90.000 mAP) or "NaN".

    image

    1. Plotting predictions of the model works fine - every class is plotted correctly and shows the model has learned well
    2. Plotting the dataset shows all labels of all classes being correctly plotted by Detectron2 -> no visible issue with the annotations
    3. I used Shapely Python package to check the polygons for geometric correctness (polygon.is_valid()) -> every annotation is a valid polygon.

    I would be glad to hear how can I pinpoint the issue with the COCOEvaluator

    image image

    opened by Robotatron 0
  • How to use queries = 50 with pretrained swin-large model

    How to use queries = 50 with pretrained swin-large model

    I want to use queries = 50 I also want to use pretrained model(swin-large: queries=200) However, when I python trainnet.py Runtimeerror shows that size mismatch for sem_seg head.predictor.query_feat.weight How to handle this problem, thanks!

    opened by aihcyllop 0
  • RuntimeError: expand(CUDABoolType) in train mapillary semantic segmentation

    RuntimeError: expand(CUDABoolType) in train mapillary semantic segmentation

    when I train mapillary semantic segmentation,error occur below:

    Traceback (most recent call last): File "/workspace/Mask2Former/train_net.py", line 322, in launch( File "/workspace/detectron2/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "/workspace/Mask2Former/train_net.py", line 316, in main return trainer.train() File "/workspace/detectron2/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/workspace/detectron2/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/workspace/detectron2/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/workspace/detectron2/detectron2/engine/train_loop.py", line 413, in run_step loss_dict = self.model(data) File "/root/anaconda3/envs/segment/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/Mask2Former/mask2former/maskformer_model.py", line 204, in forward targets = self.prepare_targets(gt_instances, images) File "/workspace/Mask2Former/mask2former/maskformer_model.py", line 271, in prepare_targets padded_masks[:, : gt_masks.shape[1], : gt_masks.shape[2]] = gt_masks RuntimeError: expand(CUDABoolType{[21, 1024, 1024, 3]}, size=[21, 1024, 1024]): the number of sizes provided (3) must be greater or equal to the number of dimensions in the tensor (4)

    opened by yufeng89 0
  • visualize error

    visualize error

    Traceback (most recent call last): File "test.py", line 38, in panoptic_result = v.draw_panoptic_seg(outputs["panoptic_seg"][0].to("cpu"), outputs["panoptic_seg"][1]).get_image() KeyError: 'panoptic_seg'

    cfg = get_cfg() add_deeplab_config(cfg) add_maskformer2_config(cfg) #cfg.merge_from_file("configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml") cfg.MODEL.WEIGHTS = './model_final_f07440.pkl' cfg.MODEL.MASK_FORMER.TEST.SEMANTIC_ON = True cfg.MODEL.MASK_FORMER.TEST.INSTANCE_ON = True cfg.MODEL.MASK_FORMER.TEST.PANOPTIC_ON = True predictor = DefaultPredictor(cfg) outputs = predictor(im)

    my code

    Show panoptic/instance/semantic predictions:

    v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW) panoptic_result = v.draw_panoptic_seg(outputs["panoptic_seg"][0].to("cpu"), outputs["panoptic_seg"][1]).get_image() v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW) instance_result = v.draw_instance_predictions(outputs["instances"].to("cpu")).get_image() v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW) semantic_result = v.draw_sem_seg(outputs["sem_seg"].argmax(0).to("cpu")).get_image()

    opened by skyfallsss 0
Owner
Meta Research
Meta Research
The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

Yufei Wang 56 Dec 28, 2022
Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

Transferable Semantic Augmentation for Domain Adaptation Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021) Paper

null 66 Dec 16, 2022
This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashlight or camera with flash.

null 89 Dec 10, 2022
Code release for "COTR: Correspondence Transformer for Matching Across Images"

COTR: Correspondence Transformer for Matching Across Images This repository contains the inference code for COTR. We plan to release the training code

UBC Computer Vision Group 360 Jan 6, 2023
Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

The Boombox: Visual Reconstruction from Acoustic Vibrations Boyuan Chen, Mia Chiquier, Hod Lipson, Carl Vondrick Columbia University Project Website |

Boyuan Chen 12 Nov 30, 2022
We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

ConTNet Introduction ConTNet (Convlution-Tranformer Network) is proposed mainly in response to the following two issues: (1) ConvNets lack a large rec

null 93 Nov 8, 2022
Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Geometry-Aware Gradient Algorithms for Neural Architecture Search This repository contains the code required to run the experiments for the DARTS sear

null 18 May 27, 2022
This is the dataset and code release of the OpenRooms Dataset.

This is the dataset and code release of the OpenRooms Dataset.

Visual Intelligence Lab of UCSD 95 Jan 8, 2023
Code release of paper "Deep Multi-View Stereo gone wild"

Deep MVS gone wild Pytorch implementation of "Deep MVS gone wild" (Paper | website) This repository provides the code to reproduce the experiments of

François Darmon 53 Dec 24, 2022
Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Depth-supervised NeRF: Fewer Views and Faster Training for Free Project | Paper | YouTube Pytorch implementation of our method for learning neural rad

null 524 Jan 8, 2023
Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

BlockGAN Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images BlockGAN: Learning 3D Object-aware Scene Rep

null 41 May 18, 2022
Code Release for Learning to Adapt to Evolving Domains

EAML Code release for "Learning to Adapt to Evolving Domains" (NeurIPS 2020) Prerequisites PyTorch >= 0.4.0 (with suitable CUDA and CuDNN version) tor

null 23 Dec 7, 2022
Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Self-Tuning for Data-Efficient Deep Learning This repository contains the implementation code for paper: Self-Tuning for Data-Efficient Deep Learning

THUML @ Tsinghua University 101 Dec 11, 2022
Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo Thomas Kollar, Michael Laskey, Kevin Stone, Brijen Thananjeyan

null 68 Dec 14, 2022
Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification Code release for The Devil is in the Channels: Mutual-Channel

PRIS-CV: Computer Vision Group 230 Dec 31, 2022
Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

CoTuning Official implementation for NeurIPS 2020 paper Co-Tuning for Transfer Learning. [News] 2021/01/13 The COCO 70 dataset used in the paper is av

THUML @ Tsinghua University 35 Sep 23, 2022
Code release for NeuS

NeuS We present a novel neural surface reconstruction method, called NeuS, for reconstructing objects and scenes with high fidelity from 2D image inpu

Peng Wang 813 Jan 4, 2023
Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds (ICCV 2021 oral) **Project Page | Arxiv ** Runsong Zhu¹, Yuan Liu², Zhen Dong¹, Te

null 40 Dec 30, 2022
Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".

PixelTransformer Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation". Project Page Installation Please insta

Shubham Tulsiani 24 Dec 17, 2022