[CVPR22] Official codebase of Semantic Segmentation by Early Region Proxy.

Overview

RegionProxy

Figure 2. Performance vs. GFLOPs on ADE20K val split.

Semantic Segmentation by Early Region Proxy

Yifan Zhang, Bo Pang, Cewu Lu

CVPR 2022 (Poster) [arXiv]

Installation

Note: recommend using the exact version of the packages to avoid running issues.

  1. Install PyTorch 1.7.1 and torchvision 0.8.2 following the official guide.

  2. Install timm 0.4.12 and einops:

    pip install timm==0.4.12 einops
    
  3. This project depends on mmsegmentation 0.17 and mmcv 1.3.13, so you may follow its instructions to setup environment and prepare datasets.

Models

ADE20K

backbone Resolution FLOPs #params. mIoU mIoU (ms+flip) FPS download
ViT-Ti/16 512x512 3.9G 5.8M 42.1 43.1 38.9 [model]
ViT-S/16 512x512 15G 22M 47.6 48.5 32.1 [model]
R26+ViT-S/32 512x512 16G 36M 47.8 49.1 28.5 [model]
ViT-B/16 512x512 59G 87M 49.8 50.5 20.1 [model]
R50+ViT-L/32 640x640 82G 323M 51.0 51.7 12.7 [model]
ViT-L/16 640x640 326G 306M 52.9 53.4 6.6 [model]

Cityscapes

backbone Resolution FLOPs #params. mIoU mIoU (ms+flip) download
ViT-Ti/16 768x768 69G 6M 76.5 77.7 [model]
ViT-S/16 768x768 270G 23M 79.8 81.5 [model]
ViT-B/16 768x768 1064G 88M 81.0 82.2 [model]
ViT-L/16 768x768 - 307M 81.4 82.7 [model]

Evaluation

You may evaluate the model on single GPU by running:

python test.py \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt \
	--eval mIoU

To evaluate on multiple GPUs, run:

python -m torch.distributed.launch --nproc_per_node 8 test.py \
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--checkpoint /path/to/ckpt 
	--eval mIoU

You may add --aug-test to enable multi-scale + flip evaluation. The test.py script is mostly copy-pasted from mmsegmentation. Please refer to this link for more usage (e.g., visualization).

Training

The first step is to prepare the pre-trained weights. Following Segmenter, we use AugReg pre-trained weights on our tiny, small and large models, and we use DeiT pre-trained weights on our base models. Do following steps to prepare the pre-trained weights for model initialization:

  1. For DeiT weight, simply download from this link. For AugReg weights, first acquire the timm-style models:

    import timm
    m = timm.create_model('vit_tiny_patch16_384', pretrained=True)

    The full list of entries can be found here (vanilla ViTs) and here (hybrid models).

  2. Convert the timm models to mmsegmentation style using this script.

We train all models on 8 V100 GPUs. For example, to train RegProxy-Ti/16, run:

python -m torch.distributed.launch --nproc_per_node 8 train.py 
	--launcher pytorch \
	--config configs/regproxy_ade20k/regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py \
	--work-dir /path/to/workdir \
	--options model.pretrained=/path/to/pretrained/model

You may need to adjust data.samples_per_gpu if you plan to train on less GPUs. Please refer to this link for more training optioins.

Citation

@article{zhang2022semantic,
  title={Semantic Segmentation by Early Region Proxy},
  author={Zhang, Yifan and Pang, Bo and Lu, Cewu},
  journal={arXiv preprint arXiv:2203.14043},
  year={2022}
}
Comments
  • About Region Encoder

    About Region Encoder

    Hi,YiF.I have read the literature with reference to your open source code in the past two days, and have the following questions about the Region Encoder part:

    1. H^W=N, then N is the number of tokens, do h and w change with the changes of H and W?
    2. For a 512^512 image, H^h=512, W^w=512?
    3. h and w are both 4 in your source code, so for a 512^512 image, H=W=128 and N=16384, is that too much?
    question 
    opened by wanghr-git 2
  • AttributeError: 'PatchEmbed' object has no attribute 'DH'

    AttributeError: 'PatchEmbed' object has no attribute 'DH'

    亲爱的作者你们好,非常感谢你们的贡献! 我在运行 test.py 时遇到了如下的错误:

    File "/home/disk/xxx/git/RegionProxy/models/vit.py", line 113, in forward x, hw_shape = self.patch_embed(inputs), (self.patch_embed.DH, File "/home/xxx/anaconda3/envs/region_proxy/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1185, in getattr raise AttributeError("'{}' object has no attribute '{}'".format( AttributeError: 'PatchEmbed' object has no attribute 'DH'

    我运行的参数为:

    --config configs/regproxy_cityscapes/regproxy-s16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.py --checkpoint checkpoints/regproxy-s16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.pth --eval mIoU

    以下为我的文件目录结构:

    ├── checkpoints │   └── regproxy-s16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.pth ├── configs │   ├── base │   │   ├── datasets │   │   ├── default_runtime.py │   │   ├── models │   │   └── schedules │   ├── regproxy_ade20k │   │   ├── regproxy-b16-sub4+implicit-mid-4+512x512+160k+adamw-cr+ade20k.py │   │   ├── regproxy-l16-sub4+implicit-mid-9+640x640+160k+adamw-cr+ade20k.py │   │   ├── regproxy-r26-s32-sub4+implicit-mid-n1+512x512+160k+adamw-poly+ade20k.py │   │   ├── regproxy-r50-l32-sub4+implicit-mid-n1+640x640+160k+adamw-cr+ade20k.py │   │   ├── regproxy-s16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py │   │   └── regproxy-t16-sub4+implicit-mid-4+512x512+160k+adamw-poly+ade20k.py │   └── regproxy_cityscapes │   ├── regproxy-b16-sub4+implicit-mid-2+768x768+80k+adamw-cr+cityscapes.py │   ├── regproxy-l16-sub4+implicit-mid-5+768x768+80k+adamw-cr+cityscapes.py │   ├── regproxy-s16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.py │   └── regproxy-t16-sub4+implicit-mid-2+768x768+80k+adamw-poly+cityscapes.py ├── data │   └── cityscapes │   ├── gtFine │   ├── leftImg8bit │   ├── test.txt │   ├── train.txt │   └── val.txt ├── LICENSE ├── models │   ├── init.py │   ├── proxy_head.py │   ├── segmentors.py │   └── vit.py ├── README.md ├── test.py ├── train.py └── utils ├── checkpoint.py ├── init.py

    help wanted 
    opened by hollow-503 2
  • About the handle borders

    About the handle borders

    Hi, thanks for your code! There are some about handle borders in your code. And I'm not very clear on the purpose of this. Could you please explain that? Thank you very much!

    opened by GuoQingqing 1
  • Visualization of regions

    Visualization of regions

    Dear authors, Thanks for sharing your great work. I want to know how to visualize the regions as shown in Fig.6, and Fig 8. Can you release the code?

    Thanks

    opened by dingjiansw101 0
  • Unofficial implementation of RegionProxy based on Pytorch

    Unofficial implementation of RegionProxy based on Pytorch

    Hi,YiF.Based on your open source code, I implemented an unofficial implementation of RegionProxy based only on Pytorch.This is the link of it[https://github.com/wanghr-git/RegionProxy](url).Can you check it for correctness if you have time?
    
    opened by wanghr-git 0
  • About pretrained model

    About pretrained model

    Hi, [email protected] If my dataset is 512*512 resolution, how can I use the pretrained model? Can the pre-trained model at 224 or 384 resolution be used directly?

    opened by wanghr-git 1
Owner
Yifan
Yifan
Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation

Cross-Image Region Mining with Region Prototypical Network for Weakly Supervised Segmentation The code of: Cross-Image Region Mining with Region Proto

LiuWeide 14 Jul 6, 2022
Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation

Discriminative Region Suppression for Weakly-Supervised Semantic Segmentation (AAAI 2021) Official pytorch implementation of our paper: Discriminative

Beom 67 Sep 19, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
A Transformer-Based Feature Segmentation and Region Alignment Method For UAV-View Geo-Localization

University1652-Baseline [Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍] This

Zhedong Zheng 316 Sep 8, 2022
Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Codebase for Amodal Segmentation through Out-of-Task andOut-of-Distribution Generalization with a Bayesian Model

Yihong Sun 10 Sep 20, 2022
Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

Organseg dags - The repository contains the codebase for multi-organ segmentation with directed acyclic graphs (DAGs) in CT.

yzf 1 Jun 12, 2022
Official Implementation and Dataset of "PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask and Group-Level Consistency", CVPR 2021

Portrait Photo Retouching with PPR10K Paper | Supplementary Material PPR10K: A Large-Scale Portrait Photo Retouching Dataset with Human-Region Mask an

null 180 Sep 14, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 51 Jul 17, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 51 Jul 17, 2022
Official codebase for Pretrained Transformers as Universal Computation Engines.

universal-computation Overview Official codebase for Pretrained Transformers as Universal Computation Engines. Contains demo notebook and scripts to r

Kevin Lu 203 Sep 23, 2022
Official codebase for Decision Transformer: Reinforcement Learning via Sequence Modeling.

Decision Transformer Lili Chen*, Kevin Lu*, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas†, and Igor M

Kevin Lu 1.3k Sep 29, 2022
Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Legged Robots that Keep on Learning Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World, whic

Laura Smith 64 Sep 21, 2022
This codebase is the official implementation of Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization (NeurIPS2021, Spotlight)

Test-Time Classifier Adjustment Module for Model-Agnostic Domain Generalization This codebase is the official implementation of Test-Time Classifier A

null 32 Sep 8, 2022
Official codebase for "B-Pref: Benchmarking Preference-BasedReinforcement Learning" contains scripts to reproduce experiments.

B-Pref Official codebase for B-Pref: Benchmarking Preference-BasedReinforcement Learning contains scripts to reproduce experiments. Install conda env

null 38 Sep 22, 2022
Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

OpenAI 2.7k Sep 28, 2022
Official codebase for ICLR oral paper Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling

CLIORA This is the official codebase for ICLR oral paper: Unsupervised Vision-Language Grammar Induction with Shared Structure Modeling. We introduce

Bo Wan                                             26 Sep 16, 2022
Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Big Vision This codebase is designed for training large-scale vision models on Cloud TPU VMs. It is based on Jax/Flax libraries, and uses tf.data and

Google Research 584 Sep 22, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 329 Sep 21, 2022
Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning" (AAAI 2021)

Proxy Synthesis: Learning with Synthetic Classes for Deep Metric Learning Official PyTorch implementation of "Proxy Synthesis: Learning with Synthetic

NAVER/LINE Vision 29 Jun 30, 2022