Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Meta Research

Last update: Jan 2, 2023

Related tags

Deep Learning Mask2Former

Overview

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation

Bowen Cheng, Ishan Misra, Alexander G. Schwing, Alexander Kirillov, Rohit Girdhar

[arXiv] [Project] [BibTeX]

Features

A single architecture for panoptic, instance and semantic segmentation.
Support major segmentation datasets: ADE20K, Cityscapes, COCO, Mapillary Vistas.

Installation

See installation instructions.

Getting Started

See Preparing Datasets for Mask2Former.

See Getting Started with Mask2Former.

Advanced usage

See Advanced Usage of Mask2Former.

Model Zoo and Baselines

We provide a large set of baseline results and trained models available for download in the Mask2Former Model Zoo.

License

Shield:

The majority of Mask2Former is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

However portions of the project are available under separate license terms: Swin-Transformer-Semantic-Segmentation is licensed under the MIT license, Deformable-DETR is licensed under the Apache-2.0 License.

Citing Mask2Former

If you use Mask2Former in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@article{cheng2021mask2former,
  title={Masked-attention Mask Transformer for Universal Image Segmentation},
  author={Bowen Cheng and Ishan Misra and Alexander G. Schwing and Alexander Kirillov and Rohit Girdhar},
  journal={arXiv},
  year={2021}
}

If you find the code useful, please also consider the following BibTeX entry.

@inproceedings{cheng2021maskformer,
  title={Per-Pixel Classification is Not All You Need for Semantic Segmentation},
  author={Bowen Cheng and Alexander G. Schwing and Alexander Kirillov},
  journal={NeurIPS},
  year={2021}
}

Acknowledgement

Code is largely based on MaskFormer (https://github.com/facebookresearch/MaskFormer).

Comments

Colab demo doesn't work

I would love to try out this model but I am struggling with installation. The Colab demo does not work either and gives the error:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/content/Mask2Former/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py in <module>()
     21 try:
---> 22     import MultiScaleDeformableAttention as MSDA
     23 except ModuleNotFoundError as e:

ModuleNotFoundError: No module named 'MultiScaleDeformableAttention'

During handling of the above exception, another exception occurred:

ModuleNotFoundError                       Traceback (most recent call last)
7 frames
/content/Mask2Former/mask2former/modeling/pixel_decoder/ops/functions/ms_deform_attn_func.py in <module>()
     27         "\t`sh make.sh`\n"
     28     )
---> 29     raise ModuleNotFoundError(info_string)
     30 
     31 

ModuleNotFoundError: 

Please compile MultiScaleDeformableAttention CUDA op with the following commands:
	`cd mask2former/modeling/pixel_decoder/ops`
	`sh make.sh`


---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

opened by blnfb 12

A training problem about Global alloc not supported yet

I created a new running environment for mask2former according to the steps. When I train the COCO dataset, I can train normally, but when I train my dataset, I encounter the following problems.

I've been looking for a solution on Google for a long time, so I'd like to ask if you have any similar problems. Thank you very much for your reply.

opened by xiehousen 10
The result of swin-small backbone on ADE

Hi,

I run Mask2Former on ADE (maskformer2_swin_small_bs16_160k.yaml) with 4 16GB V-100 GPUs. However, I can only achieve 49.6%, which is much worse than the reported result (51.3%). Could you provide the log for me to analysize the result?

Thanks

opened by zhihou7 7
reproduction of Panoptic segmentation on COCO

Hi thank you for your excellent work. I meet a problem when re-run your experiments.

I tried to follow your advice in Getting Started with Mask2Former, and run: python train_net.py --num-gpus 8 \ --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml after training, the log showed that "Start inference on 625 batches". But after a few days, there are still no new logs. So I kill this process and run python train_net.py \ --config-file configs/coco/panoptic-segmentation/maskformer2_R50_bs16_50ep.yaml \ --eval-only MODEL.WEIGHTS ./output/model_0094999.pth, after evaluation, it showed that the result was was lower that the result from Table 1 in the paper, could you help me see what is the reason for this ^ ^

opened by wmkai 5
Train on custom dataset for panoptic segmentation
I‘ve been trying to used it for a nuclei panoptic segmentation task. Dataset is prepared like ADE20K panoptic do. However, in evalutaion, it doesn't proposed any instance after a period time of training.

File "/home/---/anaconda3/envs/mask2former/lib/python3.8/site-packages/panopticapi/evaluation.py", line 224, in pq_compute results[name], per_class_results = pq_stat.pq_average(categories, isthing=isthing) File "/home/---/anaconda3/envs/mask2former/lib/python3.8/site-packages/panopticapi/evaluation.py", line 73, in pq_average return {'pq': pq / n, 'sq': sq / n, 'rq': rq / n, 'n': n}, per_class_results ZeroDivisionError: division by zero

There several possible reasons accounting for it I assume:

Dataset not well prepared: Are semantic and instance images label folder a must for panoptic? The labeled data I owned is not Detectorn2 format. But I referred to prepare_ade20k_sem_seg, prepare_ade20k_ins_seg and prepare_ade20k_pan_seg. Converted the labeled data to panoptic images (in a folder) and label json file. Commented the line "sem_seg_file_name": sem_label_file, in dataset_dict.

Configure file not well modified: Another reason maybe model not convergen. Is there any configuration like Mask-RCNN's anchor size or ratio in panopitc segmentation? Because nuclei in whole slide images (crop multiple patches in size 256*256, with one nuclei around (8~16)*(8~16) pixels) is rather small compared to common things in a natural image captioned by camera.
opened by JasonRichard 5
Implement Transfer Learning API / Instructions
It would be helpful if we could piggy back on your library of pre-trained model for transfer learning. Perhaps this may be accomplished by freezing the first 6 (2L) of 9 layers of Mask2Formers transformer decoder.

Usage may look like this:

export DETECTRON2_DATASETS=/path/to/dir/containing/new/dataset

python train_net.py \ --config-file <pretrained model config> \ --pretrained-model /path/to/checkpoint_file --transfer-learning-dataset <new dataset>
opened by lapp0 4
support ConvNeXt backbone
I've found a really nice backbone ConvNeXt, and I would like to use it with Mask2Former, so that I can achieve a new SOTA result in my own task.

I've changed the code and tested it on the ADE20K dataset with convnext_base_22k_1k_384.pth as backbone weights. I can achieve a significantly better result(mIoU 55.06 VS 52.4(result from your repo)). My training script is as follows:

python train_net.py \ --num-gpus 8 \ --config-file configs/ade20k/semantic-segmentation/convnext/maskformer2_convnext_base_384_160k_res640.yaml \ MODEL.WEIGHTS pretraind_model_weights/backbone/convnext_base_22k_1k_384.pkl \ SOLVER.IMS_PER_BATCH 64 \ SOLVER.BASE_LR 4e-4 \ SOLVER.MAX_ITER 50000

Unfortunately, the repo has only released ImageNet-1K/22K pre-trained model. I was wondering If you guys could release the pre-trained model with the coco panoptic segmentation task since the model is too large and we can hardly train from scratch on our personal GPUs.

Thanks!
CLA Signed
opened by huliang2016 4
How to visualize the VIS results?

Hi,

Thanks for your wonderful work and repo.

Could you please provide the instructions on how to visualize the video instance segmentation results on images or videos? Thanks!

opened by wjn922 4
ImportError: .../MultiScaleDeformableAttention.cpython-38-X86_64-linux-gnu.so: undefined symbol: _ZNK2at10TensorBase8dataptrIdEEPT_v

Hello! Thank you for sharing. I follow your Example conda environment setup: But when I run the code: It doesnt work successifully. It makes an ImportError

opened by Wenzhiqiang16 4
Force cuda since torch ask for a device, not if cuda is in fact avail…

Force cuda since torch ask for a device, not if cuda is in fact available.

torch.cuda.is_available() is misleading since it actually checks for an available device, not if cuda is available. Building on build-nodes where a gpu device is not necessarily available then fails. Force cuda through FORCE_CUDA=1 to allow building.
CLA Signed

opened by ccoulombe 4
Where can I download the initial model weights to train from scratch?

Hi,

Thank you for this great repository. Can you please advise from where I can download the initial weights mentioned in the config files? For eg: I was not able to find "swin_base_patch4_window12_384.pkl" mentioned in the config file 'Mask2Former/configs/coco/instance-segmentation/swin/maskformer2_swin_base_384_bs16_50ep.yaml' for training from scratch.

I could only find the pre-trained models in Mask2Former Model Zoo.

Thanks.

opened by rgkannan676 3
Running mask2former without a GPU

Hello, I am trying to run mask2former on my local laptop, but it requires a GPU which I do not have. I know that I can run it using Google Colab, but I am unfamiliar with how to use the platform and am not sure if I can manipulate the package in order to visualize the mask binary for the main object in the image. If I do not have a GPU on my computer, what are my options for running mask2former and manipulating the package as I need to? Can you please advise me on what to do in this situation?

opened by ali20211 1
visualize the binary mask of the output (panoptic_result, instance_result, semantic_result)

Dear authors of the mask2former paper, I just wanted to express my gratitude and appreciation for the excellent work you have done on the mask2former project. Your efforts and contributions have greatly benefited the research community, and I have personally found the insights and results presented in your paper to be incredibly valuable and informative. Thank you for your dedication and commitment to advancing the field. I am really sorry that I have a question, when i run the code with another dataset such as cifar data, i saw the output of segmentation is not good like the coco dataset or maybe because i didn't recognize very well that.

So my question, can anyone provide me a code to visualize the mask binary of the output(panoptic_result, instance_result, semantic_result) to see the main object in order to see the boundaries around the object?

Thank you so much in advance for your answer

visualization of the mask binary of the output(panoptic_result, instance_result, semantic_result) to see the the main object.

opened by ali20211 0
Some Instance Segmentation COCOEvaluator mAP scores are suspiciously round numbers (e.g. 90.000 mAP) or "NaN"
Everything works fine with the public COCO dataset. However when using my custom dataset for instance segmentation in a polygon format, when training with Mask2Former some class mAP scores are suspiciously round numbers (e.g. 90.000 mAP) or "NaN".

Plotting predictions of the model works fine - every class is plotted correctly and shows the model has learned well

Plotting the dataset shows all labels of all classes being correctly plotted by Detectron2 -> no visible issue with the annotations

I used Shapely Python package to check the polygons for geometric correctness (polygon.is_valid()) -> every annotation is a valid polygon.

I would be glad to hear how can I pinpoint the issue with the COCOEvaluator
opened by Robotatron 0
How to use queries = 50 with pretrained swin-large model

I want to use queries = 50 I also want to use pretrained model(swin-large: queries=200) However, when I python trainnet.py Runtimeerror shows that size mismatch for sem_seg head.predictor.query_feat.weight How to handle this problem, thanks!

opened by aihcyllop 0
RuntimeError: expand(CUDABoolType) in train mapillary semantic segmentation

when I train mapillary semantic segmentation,error occur below:

Traceback (most recent call last): File "/workspace/Mask2Former/train_net.py", line 322, in launch( File "/workspace/detectron2/detectron2/engine/launch.py", line 82, in launch main_func(*args) File "/workspace/Mask2Former/train_net.py", line 316, in main return trainer.train() File "/workspace/detectron2/detectron2/engine/defaults.py", line 484, in train super().train(self.start_iter, self.max_iter) File "/workspace/detectron2/detectron2/engine/train_loop.py", line 149, in train self.run_step() File "/workspace/detectron2/detectron2/engine/defaults.py", line 494, in run_step self._trainer.run_step() File "/workspace/detectron2/detectron2/engine/train_loop.py", line 413, in run_step loss_dict = self.model(data) File "/root/anaconda3/envs/segment/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/workspace/Mask2Former/mask2former/maskformer_model.py", line 204, in forward targets = self.prepare_targets(gt_instances, images) File "/workspace/Mask2Former/mask2former/maskformer_model.py", line 271, in prepare_targets padded_masks[:, : gt_masks.shape[1], : gt_masks.shape[2]] = gt_masks RuntimeError: expand(CUDABoolType{[21, 1024, 1024, 3]}, size=[21, 1024, 1024]): the number of sizes provided (3) must be greater or equal to the number of dimensions in the tensor (4)

opened by yufeng89 0
visualize error

Traceback (most recent call last): File "test.py", line 38, in panoptic_result = v.draw_panoptic_seg(outputs["panoptic_seg"][0].to("cpu"), outputs["panoptic_seg"][1]).get_image() KeyError: 'panoptic_seg'

cfg = get_cfg() add_deeplab_config(cfg) add_maskformer2_config(cfg) #cfg.merge_from_file("configs/coco/panoptic-segmentation/swin/maskformer2_swin_large_IN21k_384_bs16_100ep.yaml") cfg.MODEL.WEIGHTS = './model_final_f07440.pkl' cfg.MODEL.MASK_FORMER.TEST.SEMANTIC_ON = True cfg.MODEL.MASK_FORMER.TEST.INSTANCE_ON = True cfg.MODEL.MASK_FORMER.TEST.PANOPTIC_ON = True predictor = DefaultPredictor(cfg) outputs = predictor(im)

my code

Show panoptic/instance/semantic predictions:

v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW) panoptic_result = v.draw_panoptic_seg(outputs["panoptic_seg"][0].to("cpu"), outputs["panoptic_seg"][1]).get_image() v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW) instance_result = v.draw_instance_predictions(outputs["instances"].to("cpu")).get_image() v = Visualizer(im[:, :, ::-1], coco_metadata, scale=1.2, instance_mode=ColorMode.IMAGE_BW) semantic_result = v.draw_sem_seg(outputs["sem_seg"].argmax(0).to("cpu")).get_image()

opened by skyfallsss 0

Owner

Meta Research

GitHub

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Domain Generalization for Medical Imaging Classification with Linear Dependency Regularization The code release of paper 'Domain Generalization for Me

56 Dec 28, 2022

Code release for "Masked-attention Mask Transformer for Universal Image Segmentation"

Related tags

Overview

Mask2Former: Masked-attention Mask Transformer for Universal Image Segmentation

Features

Installation

Getting Started

Advanced usage

Model Zoo and Baselines

License

Citing Mask2Former

Acknowledgement

Comments

I would be glad to hear how can I pinpoint the issue with the COCOEvaluator

Show panoptic/instance/semantic predictions:

Owner

Meta Research

The code release of paper 'Domain Generalization for Medical Imaging Classification with Linear-Dependency Regularization' NIPS 2020.

Code release for "Transferable Semantic Augmentation for Domain Adaptation" (CVPR 2021)

This is the official code release for the paper Shape and Material Capture at Home

Code release for "COTR: Correspondence Transformer for Matching Across Images"

Code release for paper: The Boombox: Visual Reconstruction from Acoustic Vibrations

We will release the code of "ConTNet: Why not use convolution and transformer at the same time?" in this repo

Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

This is the dataset and code release of the OpenRooms Dataset.

Code release of paper "Deep Multi-View Stereo gone wild"

Code release for DS-NeRF (Depth-supervised Neural Radiance Fields)

Code release for BlockGAN: Learning 3D Object-aware Scene Representations from Unlabelled Images

Code Release for Learning to Adapt to Evolving Domains

Code release for "Self-Tuning for Data-Efficient Deep Learning" (ICML 2021)

Code release for our paper, "SimNet: Enabling Robust Unknown Object Manipulation from Pure Synthetic Data via Stereo"

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

Code release for NeurIPS 2020 paper "Co-Tuning for Transfer Learning"

Code release for NeuS

Code Release for ICCV 2021 (oral), "AdaFit: Rethinking Learning-based Normal Estimation on Point Clouds"

Code release for the ICML 2021 paper "PixelTransformer: Sample Conditioned Signal Generation".