[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

Overview

RegionProxy

Figure 2. Performance vs. GFLOPs on ADE20K val split.

Semantic Segmentation by Early Region Proxy

Yifan Zhang, Bo Pang, Cewu Lu

CVPR 2022 (Poster) [arXiv]

Installation

Note: recommend using the exact version of the packages to avoid running issues.

  1. Install PyTorch 1.7.1 and torchvision 0.8.2 following the official guide.

  2. Install timm 0.4.12 and einops:

    pip install timm==0.4.12 einops
    
  3. This project depends on mmsegmentation 0.17 and mmcv 1.3.13, so you may follow its instructions to setup environment and prepare datasets.

Models

ADE20K

backbone Resolution FLOPs #params. mIoU mIoU (ms+flip) FPS download
ViT-Ti/16 512x512 3.9G 5.8M 42.1 43.1 38.9 [model]
ViT-S/16 512x512 15G 22M 47.6 48.5 32.1 [model]
R26+ViT-S/32 512x512 16G 36M 47.8 49.1 28.5 [model]
ViT-B/16 512x512 59G 87M 49.8 50.5 20.1 [model]
R50+ViT-L/32 640x640 82G 323M 51.0 51.7 12.7 [model]
ViT-L/16 640x640 326G 306M 52.9 53.4 6.6 [model]

Cityscapes

backbone Resolution FLOPs #params. mIoU mIoU (ms+flip) download
ViT-Ti/16 768x768 69G 6M 76.5 77.7 [model]
ViT-S/16 768x768 270G 23M 79.8 81.5 [model]
ViT-B/16 768x768 1064G 88M 81.0 82.2 [model]
ViT-L/16 768x768 - 307M 81.4 82.7 [model]

Evaluation

You may evaluate the model on single GPU by running:

python test.py \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt \
	--eval mIoU

To evaluate on multiple GPUs, run:

python -m torch.distributed.launch --nproc_per_node 8 test.py \
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt 
	--eval mIoU

You may add --aug-test to enable multi-scale + flip evaluation. The test.py script is mostly copy-pasted from mmsegmentation. Please refer to this link for more usage (e.g., visualization).

Training

The first step is to prepare the pre-trained weights. Following Segmenter, we use AugReg pre-trained weights on our tiny, small and large models, and we use DeiT pre-trained weights on our base models. Do following steps to prepare the pre-trained weights for model initialization:

  1. For DeiT weight, simply download from this link. For AugReg weights, first acquire the timm-style models:

    import timm
    m = timm.create_model('vit_tiny_patch16_384', pretrained=True)

    The full list of entries can be found here (vanilla ViTs) and here (hybrid models).

  2. Convert the timm models to mmsegmentation style using this script.

We train all models on 8 V100 GPUs. For example, to train RegProxy-Ti/16, run:

python -m torch.distributed.launch --nproc_per_node 8 train.py 
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--work-dir /path/to/workdir \
	--options model.pretrained=/path/to/pretrained/model

You may need to adjust data.samples_per_gpu if you plan to train on less GPUs. Please refer to this link for more training optioins.

Citation

@article{zhang2022semantic,
  title={Semantic Segmentation by Early Region Proxy},
  author={Zhang, Yifan and Pang, Bo and Lu, Cewu},
  journal={arXiv preprint arXiv:2203.14043},
  year={2022}
}
Comments
  • About Region Encoder

    About Region Encoder

    Hi,YiF.I have read the literature with reference to your open source code in the past two days, and have the following questions about the Region Encoder part:

    1. H^W=N, then N is the number of tokens, do h and w change with the changes of H and W?
    2. For a 512^512 image, H^h=512, W^w=512?
    3. h and w are both 4 in your source code, so for a 512^512 image, H=W=128 and N=16384, is that too much?
    question 
    opened by wanghr-git 2
  • AttributeError: 'PatchEmbed' object has no attribute 'DH'

    AttributeError: 'PatchEmbed' object has no attribute 'DH'

    亲爱的作者你们好,非常感谢你们的贡献! 我在运行 test.py 时遇到了如下的错误:

    File "/home/disk/xxx/git/RegionProxy/models/vit.py", line 113, in forward x, hw_shape = self.patch_embed(inputs), (self.patch_embed.DH, File "/home/xxx/anaconda3/envs/region_proxy/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1185, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'PatchEmbed' object has no attribute 'DH'

    我运行的参数为:

    --config configs/regproxy_cityscapes/regproxy-s16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.py --checkpoint checkpoints/regproxy-s16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.pth --eval mIoU

    以下为我的文件目录结构:

    ├── checkpoints │   └── regproxy-s16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.pth ├── configs │   ├── base │   │   ├── datasets │   │   ├── default_runtime.py │   │   ├── models │   │   └── schedules │   ├── regproxy_ade20k │   │   ├── regproxy-b16-sub4+implicit-mid-4+512x512+160k+adamw-cr+ade20k.py │   │   ├── regproxy-l16-sub4+implicit-mid-9+640x640+160k+adamw-cr+ade20k.py │   │   ├── regproxy-r26-s32-sub4+implicit-mid-n1+512x512+160k+adamw-poly+ade20k.py │   │   ├── regproxy-r50-l32-sub4+implicit-mid-n1+640x640+160k+adamw-cr+ade20k.py │   │   ├── regproxy-s16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py │   │   └── regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py │   └── regproxy_cityscapes │   ├── regproxy-b16-sub4+implicit-mid-2+768x768+80k+adamw-cr+cityscapes.py │   ├── regproxy-l16-sub4+implicit-mid-5+768x768+80k+adamw-cr+cityscapes.py │   ├── regproxy-s16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.py │   └── regproxy-t16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.py ├── data │   └── cityscapes │   ├── gtFine │   ├── leftImg8bit │   ├── test.txt │   ├── train.txt │   └── val.txt ├── LICENSE ├── models │   ├── init.py │   ├── proxy_head.py │   ├── segmentors.py │   └── vit.py ├── README.md ├── test.py ├── train.py └── utils ├── checkpoint.py ├── init.py

    help wanted 
    opened by hollow-503 2
  • About the handle borders

    About the handle borders

    Hi, thanks for your code! There are some about handle borders in your code. And I'm not very clear on the purpose of this. Could you please explain that? Thank you very much!

    opened by GuoQingqing 1
  • How does h,w in the paper and F.unfold()function in the code work?

    How does h,w in the paper and F.unfold()function in the code work?

    1、 About h,w The sentence "(Hh) × (W w) matches the size of the output segmentation map and (h, w) is the relative stride of the initial token gird" in the paper indicate that h,w is the downsample stride of segmentation map, but when I reading the code, I feel confused how it works, throught rerange the token_logits and matrix multiplication we get the final segmentation map,which is as large as the input image. So why do you set the extra parameter h and w, and how do h,w relate with stride?

    2、About F.unfold() Official Implement Code token_logits = F.unfold(token_logits, kernel_size=3, padding=1).reshape(B, -1, 9, H, W) # (B, C, 9, H, W) pseudocode in the paper # get neighbors for each cell y = rar(y, "B N K -> B K H W") nb = im2col(y, kernel_size=3, padding=1) nb = rar(nb, "B (K n) (H W) -> B H W n K") The other is what does F.unfold() do in the code ,in the paper ,you show the process of proxy head using pseudocode,and say im2col( i.e. F.unfold() ) is using to get neighbors for each cell, I can not understand this well ,too.

    Looking forward to your reply!!! Thank you ~~~

    opened by stte0v0 0
  • Visualization of regions

    Visualization of regions

    Dear authors, Thanks for sharing your great work. I want to know how to visualize the regions as shown in Fig.6, and Fig 8. Can you release the code?

    Thanks

    opened by dingjiansw101 0
  • Unofficial implementation of RegionProxy based on Pytorch

    Unofficial implementation of RegionProxy based on Pytorch

    Hi,YiF.Based on your open source code, I implemented an unofficial implementation of RegionProxy based only on Pytorch.This is the link of it[https://github.com/wanghr-git/RegionProxy](url).Can you check it for correctness if you have time?
    
    opened by wanghr-git 0
  • About pretrained model

    About pretrained model

    Hi, YIF@YiF-Zhang If my dataset is 512*512 resolution, how can I use the pretrained model? Can the pre-trained model at 224 or 384 resolution be used directly?

    opened by wanghr-git 2
Owner
Yifan
Yifan
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 16 Nov 26, 2022
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 74 Dec 27, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

University1652-Baseline [Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍] This

Zhedong Zheng 335 Jan 6, 2023
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

null 184 Dec 11, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Yihong Sun 12 Nov 15, 2022
Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

yzf 1 Jun 12, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 210 Dec 28, 2022
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

Kevin Lu 1.4k Jan 7, 2023
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Laura Smith 70 Dec 7, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

null 47 Dec 28, 2022
Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

null 48 Dec 20, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.9k Jan 4, 2023
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan                                             32 Dec 23, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 701 Jan 3, 2023
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022