[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group

Last update: Jan 5, 2023

Related tags

Deep Learning SETR

Overview

SEgmentation TRansformers -- SETR

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers,
Sixiao Zheng, Jiachen Lu, Hengshuang Zhao, Xiatian Zhu, Zekun Luo, Yabiao Wang, Yanwei Fu, Jianfeng Feng, Tao Xiang, Philip HS Torr, Li Zhang,
CVPR 2021

Installation

Our project is developed based on mmsegmentation. Please follow the official mmsegmentation INSTALL.md and getting_started.md for installation and dataset preparation.

Main results

Cityscapes

Method	Crop Size	Batch size	iteration	set	mIoU
SETR-Naive	768x768	8	40k	val	77.37	model config
SETR-Naive	768x768	8	80k	val	77.90	model config
SETR-MLA	768x768	8	40k	val	76.65	model config
SETR-MLA	768x768	8	80k	val	77.24	model config
SETR-PUP	768x768	8	40k	val	78.39	model config
SETR-PUP	768x768	8	80k	val	79.34	model config
SETR-Naive-DeiT	768x768	8	40k	val	77.85	model config
SETR-Naive-DeiT	768x768	8	80k	val	78.66	model config
SETR-MLA-DeiT	768x768	8	40k	val	78.04	model config
SETR-MLA-DeiT	768x768	8	80k	val	78.98	model config
SETR-PUP-DeiT	768x768	8	40k	val	78.79	model config
SETR-PUP-DeiT	768x768	8	80k	val	79.45	model config

ADE20K

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	512x512	16	160k	Val	48.06	48.80	model config
SETR-MLA	512x512	8	160k	val	48.27	50.03	model config
SETR-MLA	512x512	16	160k	val	48.64	50.28	model config
SETR-PUP	512x512	16	160k	val	48.58	50.09	model config

Pascal Context

Method	Crop Size	Batch size	iteration	set	mIoU	mIoU(ms+flip)
SETR-Naive	480x480	16	80k	val	52.89	53.61	model config
SETR-MLA	480x480	8	80k	val	54.39	55.39	model config
SETR-MLA	480x480	16	80k	val	54.87	55.83	model config
SETR-PUP	480x480	16	80k	val	54.40	55.27	model config

Get Started

Train

./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} 
# For example, train a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_train.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py 8

Single-scale testing

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Multi-scale testing

Use the config file ending in _MS.py in configs/SETR.

./tools/dist_test.sh ${CONFIG_FILE} ${CHECKPOINT_FILE} ${GPU_NUM}  [--eval ${EVAL_METRICS}]
# For example, test a SETR-PUP on Cityscapes dataset with 8 GPUs
./tools/dist_test.sh configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8_MS.py \
work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8/iter_40000.pth \
8 --eval mIoU

Please see getting_started.md for the more basic usage of training and testing.

Reference

@inproceedings{SETR,
    title={Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers}, 
    author={Zheng, Sixiao and Lu, Jiachen and Zhao, Hengshuang and Zhu, Xiatian and Luo, Zekun and Wang, Yabiao and Fu, Yanwei and Feng, Jianfeng and Xiang, Tao and Torr, Philip H.S. and Zhang, Li},
    booktitle={CVPR},
    year={2021}
}

License

MIT

Acknowledgement

Thanks to previous open-sourced repo:
mmsegmentation
pytorch-image-models

Comments

Questions about "cls_token" and "pos_embed" in the code

The "cls_token" and "pos_embed" are defined as all-zero matrices, what is the meaning? How is it applied in the model later? I am not doing this direction, just want to learn from your work, but also hope that you can help me answer!

"self.cls_token = nn.Parameter(torch.zeros(1, 1, self.embed_dim)) self.pos_embed = nn.Parameter(torch.zeros( 1, self.num_patches + 1, self.embed_dim)) "

opened by wscc123 10
About the position embeddings for patches

Since the patches come from a 2D images, the position information consists of two directions, in other words, x-axis and y-axis indexes. This is different from the case in 1-D sequence. How do you implement the position embedding? Can you share the details since the code is not released?

opened by 2iyuye 7
Cant achieve the best miou when batchsize=4

I cannt achieve the most miou proposed in original paper while i didnt change the hyperparameter except tuning the bs from 8 to 4. How to achieve the best miou? Is that related to batchsize?

opened by cocolord 6
GPU memory

Hello，thanks for your code. How much GPU memory is needed for training SETR ? I have 2 P40 GPU but I cann't start training cus OOM. Looking forward to your reply.

opened by SherlockHua1995 6
SETR-Naive-Base model

Hi, do you have a google drive link for the models with T-Base referenced in the paper (such as SETR-Naive-Base) as well as the corresponding configuration files?

Alternatively, what configuration can I use to train the model if it is not readily available? I tried changing the depth in SETR/configs/base/models/setr_naive_pup.py to 12, but that errors out with "RuntimeError: shape '[2, 1025, 3, 12, 85]' is invalid for input of size 6297600" when using the ADE20K configuration file (https://github.com/fudan-zvg/SETR/blob/main/configs/SETR/SETR_Naive_512x512_160k_ade20k_bs_16.py) for training. Changing the embedding dimension in this file from 1024 results in a lot of shape mismatches with the pretrained imagenet21k model as well. The default training with the T-large depth and embedding dimension work for me with the same file.

Thanks for your help.

opened by kavyasreedhar 5
MMCV Error(mmcv-full 1.2.2 torch1.6 python3.7)

Can you help me solve this problem? I use the dataset in VOC format.

Traceback (most recent call last): File "tools/train.py", line 163, in main() File "tools/train.py", line 159, in main meta=meta) File "/home/ubuntu/disk1/user/SETR-main/mmseg/apis/train.py", line 91, in train_segmentor val_dataset = build_dataset(cfg.data.val, dict(test_mode=True)) File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/builder.py", line 73, in build_dataset dataset = build_from_cfg(cfg, DATASETS, default_args) File "/home/ubuntu/anaconda3/envs/setr/lib/python3.7/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg return obj_cls(**args) File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/pascal_context.py", line 53, in init **kwargs) File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/custom.py", line 86, in init self.pipeline = Compose(pipeline) File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/pipelines/compose.py", line 22, in init transform = build_from_cfg(transform, PIPELINES) File "/home/ubuntu/anaconda3/envs/setr/lib/python3.7/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg return obj_cls(**args) File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/pipelines/test_time_aug.py", line 59, in init self.transforms = Compose(transforms) File "/home/ubuntu/disk1/user/SETR-main/mmseg/datasets/pipelines/compose.py", line 22, in init transform = build_from_cfg(transform, PIPELINES) File "/home/ubuntu/anaconda3/envs/setr/lib/python3.7/site-packages/mmcv/utils/registry.py", line 171, in build_from_cfg return obj_cls(**args) TypeError: init() got an unexpected keyword argument 'dataset'

opened by rfww 3
always CUDA out of memory

@lzrobots @VictorLlu @sixiaozheng Hi, thank you for your sharing. however, when i run "./tools/dist_test.sh configs/SETR/SETR_PUP_512x512_160k_ade20k_bs_16_MS.py", i got the error: CUDA out of memory. I have 2 NVIDIA Tesla P100 about 16GB per GPU. Could you please tell me what is wrong. Thank you.

opened by daixiaolei623 3

AssertionError: Default process group is not initialized

Hi, authors,

I got the following error after executing command: python tools/train.py configs/SETR/SETR_PUP_768x768_40k_cityscapes_bs_8.py

2021-04-08 08:03:22,265 - mmseg - INFO - Loaded 2975 images
2021-04-08 08:03:24,275 - mmseg - INFO - Loaded 500 images
2021-04-08 08:03:24,276 - mmseg - INFO - Start running, host: root@milton-LabPC, work_dir: /media/root/mdata/data/code13/SETR/work_dirs/SETR_PUP_768x768_40k_cityscapes_bs_8
2021-04-08 08:03:24,276 - mmseg - INFO - workflow: [('train', 1)], max: 40000 iters
Traceback (most recent call last):
  File "tools/train.py", line 161, in <module>
    main()
  File "tools/train.py", line 150, in main
    train_segmentor(
  File "/media/root/mdata/data/code13/SETR/mmseg/apis/train.py", line 106, in train_segmentor
    runner.run(data_loaders, cfg.workflow, cfg.total_iters)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 130, in run
    iter_runner(iter_loaders[i], **kwargs)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
    outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/mmcv/parallel/data_parallel.py", line 67, in train_step
    return self.module.train_step(*inputs[0], **kwargs[0])
  File "/media/root/mdata/data/code13/SETR/mmseg/models/segmentors/base.py", line 152, in train_step
    losses = self(**data_batch)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
    return old_func(*args, **kwargs)
  File "/media/root/mdata/data/code13/SETR/mmseg/models/segmentors/base.py", line 122, in forward
    return self.forward_train(img, img_metas, **kwargs)
  File "/media/root/mdata/data/code13/SETR/mmseg/models/segmentors/encoder_decoder.py", line 157, in forward_train
    loss_decode = self._decode_head_forward_train(x, img_metas,
  File "/media/root/mdata/data/code13/SETR/mmseg/models/segmentors/encoder_decoder.py", line 100, in _decode_head_forward_train
    loss_decode = self.decode_head.forward_train(x, img_metas,
  File "/media/root/mdata/data/code13/SETR/mmseg/models/decode_heads/decode_head.py", line 185, in forward_train
    seg_logits = self.forward(inputs)
  File "/media/root/mdata/data/code13/SETR/mmseg/models/decode_heads/vit_up_head.py", line 93, in forward
    x = self.syncbn_fc_0(x)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 519, in forward
    world_size = torch.distributed.get_world_size(process_group)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 625, in get_world_size
    return _get_group_size(group)
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 220, in _get_group_size
    _check_default_pg()
  File "/root/anaconda3/envs/pytorch1.7.0/lib/python3.8/site-packages/torch/distributed/distributed_c10d.py", line 210, in _check_default_pg
    assert _default_pg is not None, \
AssertionError: Default process group is not initialized
(pytorch1.7.0) root@milton-LabPC:/data/code13/SETR

As I use a single GPU device to perform the training, it seems the error is related to distributed training. Any hints to solve this issue?

THX!

opened by amiltonwong 3

The model efficiency and speed
@lzrobots The paper seems promising, but some question about the efficiency are unanswered:

For CPU-only inferencing, how much memory is required for inferencing a 1024 x 1024 image?

For CPU-only inferencing, what is the fps count for 1024*1024 images?

Number of FLOPS and Parameters?
opened by seekingdeep 3
Question about the method of handling the multi-patch inputs

After reading your paper, I have a confusion that how do you handle the multi-patch (256) inputs in the encoder? It seems that in the encoder, the network fuses the 256 patches and learns one feature map (with size: (H/16, W/16, D)) of the whole original image (instead of the patch-wise image), and then decode this feature map to generate the segmentatoin map. Wonder how to process and fue the 256 patches in the encoder?

opened by QiushiYang 3
error for using dist_train.sh

Excuse me I'm Trainning with multiple GPUs,for example:./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments] and I have 2 GPUs try to use Traceback (most recent call last): File "/home/anaconda3/envs/py37_torch1.6/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/anaconda3/envs/py37_torch1.6/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/anaconda3/envs/py37_torch1.6/lib/python3.7/site-packages/torch/distributed/launch.py", line 261, in main() File "/home/anaconda3/envs/py37_torch1.6/lib/python3.7/site-packages/torch/distributed/launch.py", line 257, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/anaconda3/envs/py37_torch1.6/bin/python', '-u', './tools/train.py', '--local_rank=1', 'configs/SETR/SETR_Naive_768x768_40k_cityscapes_bs_8.py', '--launcher', 'pytorch', '--load-from=./pth/jx_vit_large_p16_384-b3be5167.pth']' returned non-zero exit status 1.

Thanks for your answer！

opened by Lsz-20 2

RecursionError while training the custom dataset

I encountered the error: **RecursionError: maximum recursion depth exceeded in comparison ** while training with my custom dataset. I tried to set num_workers to 0 but the issue didn't get resolved. Please provide the fix.

The following is the code for the config file for the custom dataset

dataset_type = 'MyDataset'
# Correct path of your dataset
data_root = 'data/my_dataset'

img_norm_cfg = dict( # This img_norm_cfg is widely used because it is mean and std of ImageNet 1K pretrained model
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)

crop_size = (512, 512) # Crop size of image in training

train_pipeline=[]
test_pipeline=[]
data = dict(
    samples_per_gpu=4, # Batch size of a single GPU
    workers_per_gpu=0, # Worker to pre-fetch data for each single GPU
    train=dict( # Train dataset config
        type=dataset_type, # Type of dataset, refer to mmseg/datasets/ for details.
        data_root=data_root, # The root of dataset.
        img_dir='img_dir/train', # The image directory of dataset.
        ann_dir='ann_dir/train',  # The annotation directory of dataset.
        pipeline=train_pipeline), # pipeline, this is passed by the train_pipeline created before.
    val=dict( # Validation dataset config.
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=test_pipeline), # Pipeline is passed by test_pipeline created before.
    test=dict(
        type=dataset_type,
        data_root=data_root,
        img_dir='img_dir/val',
        ann_dir='ann_dir/val',
        pipeline=test_pipeline))

The following is the python configuration file of the intended SETR model based on SETR_MLA.

_base_ = [
    '../_base_/models/setr_mla.py',
    '../_base_/datasets/my_dataset_config.py', '../_base_/default_runtime.py',
    '../_base_/schedules/schedule_80k.py'
]

# model settings
norm_cfg = dict(type='SyncBN', requires_grad=True)
model = dict(
    type='EncoderDecoder',
    backbone=dict(
        type='VIT_MLA',
        model_name='vit_large_patch16_384',
        img_size=512,
        patch_size=16,
        in_chans=3,
        embed_dim=1024,
        depth=24,
        num_heads=16,
        num_classes=3,
        drop_rate=0.1,
        norm_cfg=norm_cfg,
        pos_embed_interp=True,
        align_corners=False,
        mla_channels=256,
        mla_index=(5, 11, 17, 23)
    ),
    decode_head=dict(
        type='VIT_MLAHead',
        in_channels=1024,
        channels=512,
        img_size=512, 
        mla_channels=256,
        mlahead_channels=128,
        num_classes=3,
        norm_cfg=norm_cfg,
        align_corners=False,
        loss_decode=dict(
            type='CrossEntropyLoss', use_sigmoid=False, loss_weight=1.0)))
# model training and testing settings
train_cfg = dict()
test_cfg = dict(mode='whole')
optimizer = dict(lr=0.002, weight_decay=0.0,
                 paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)})
                 )
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
crop_size = (512, 512)
test_cfg = dict(mode='slide', crop_size=crop_size, stride=(512, 512))
find_unused_parameters = True
data = dict(samples_per_gpu=1)
test_pipeline=[]

data = dict(
    val=dict(pipeline=test_pipeline),
    test=dict(pipeline=test_pipeline))

opened by Priyadrasta-2111CS10 0

Difference with ViT

It looks like this paper use ViT https://arxiv.org/abs/2010.11929 as backbone, with a simple decoder for segmentation?

To be honest it should be just an ablation study from ViT, instead of proposing a new paper.

If I am wrong please correct

opened by TechChuanyu 0
multi-scale testing

Hi,

I downloaded a SETR_MLA model (512x512 with a batch size of 8) to test its performance using ADE20K validation set. Since I have only two RTX Titan, I replaced the #GPU 8 with 2 for testing.

The SS testing is same as its report, which is 47.79%.

For MS testing, I followed the instruction that uses the config file ending in _MS.py in configs/SETR. However, the MS testing is just 47.91%, which is far way lower than its report 50.03%.

So, I'm wondering what could be the possible reasons? Thanks.

opened by ZhengyuXia 0
Question about optimizer config.

"paramwise_cfg=dict(custom_keys={'head': dict(lr_mult=10.)}"

Hi, thank you for open-source your code firstly. I have a question about the configuration of the optimizer. I found there is "decode_head" in your model, not "head" used in 'custom_keys'. Will 'lr_mult=10' takes effect while we training the model?

Thanks~

opened by EricKani 2

Owner

Fudan Zhang Vision Group

Zhang Vision Group at the School of Data Science of the Fudan University, led by Professor Li Zhang

GitHub

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

112 Dec 16, 2022

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification This is the official pytorch implementation of t

5 Nov 14, 2022

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach This is the repo to host the dataset TextSeg and code for TexRNe

174 Dec 19, 2022

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

Pixel-Level Cycle Association This is the Pytorch implementation of our NeurIPS 2020 Oral paper Pixel-Level Cycle Association: A New Perspective for D

87 Oct 19, 2022

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

Rethinking Semantic Segmentation: A Prototype View Rethinking Semantic Segmentation: A Prototype View, Tianfei Zhou, Wenguan Wang, Ender Konukoglu and

239 Dec 26, 2022

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021) Introduction This repository is the offical Pytorch implementation of

37 Nov 21, 2022

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

337 Dec 15, 2022

Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes (CVPR 2021) Project page | Paper | Colab | Colab for Drawing App Rethinking Style

153 Jan 4, 2023

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

32 Sep 21, 2022

Spectralformer: Rethinking hyperspectral image classification with transformers

The code in this toolbox implements the "Spectralformer: Rethinking hyperspectral image classification with transformers". More specifically, it is detailed as follow.

104 Jan 4, 2023

Paddle pit - Rethinking Spatial Dimensions of Vision Transformers

基于Paddle实现PiT ——Rethinking Spatial Dimensions of Vision Transformers,arxiv 官方原版代

4 Jan 15, 2022

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

EPro-PnP EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation In CVPR 2022 (Oral). [paper] Hanshen

同济大学智能汽车研究所综合感知研究组 ( Comprehensive Perception Research Group under Institute of Intelligent Vehicles, School of Automotive Studies, Tongji University)

842 Jan 4, 2023

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Related tags

Overview

SEgmentation TRansformers -- SETR

Installation

Main results

Cityscapes

ADE20K

Pascal Context

Get Started

Train

Single-scale testing

Multi-scale testing

Reference

License

Acknowledgement

Comments

Owner

Fudan Zhang Vision Group

Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

GB-CosFace: Rethinking Softmax-based Face Recognition from the Perspective of Open Set Classification

[CVPR 2021] Rethinking Text Segmentation: A Novel Dataset and A Text-Specific Refinement Approach

Pytorch Implementation for NeurIPS (oral) paper: Pixel Level Cycle Association: A New Perspective for Domain Adaptive Semantic Segmentation

CVPR2022 (Oral) - Rethinking Semantic Segmentation: A Prototype View

This repository is the offical Pytorch implementation of ContextPose: Context Modeling in 3D Human Pose Estimation: A Unified Perspective (CVPR 2021).

Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Code for our CVPR 2021 Paper "Rethinking Style Transfer: From Pixels to Parameterized Brushstrokes".

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Spectralformer: Rethinking hyperspectral image classification with transformers

Paddle pit - Rethinking Spatial Dimensions of Vision Transformers

[CVPR 2022 Oral] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation

Prototypical Pseudo Label Denoising and Target Structure Learning for Domain Adaptive Semantic Segmentation (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Official code of CVPR 2021's PLOP: Learning without Forgetting for Continual Semantic Segmentation

[CVPR 2021] Few-shot 3D Point Cloud Semantic Segmentation

Anti-Adversarially Manipulated Attributions for Weakly and Semi-Supervised Semantic Segmentation (CVPR 2021)

Self-supervised Augmentation Consistency for Adapting Semantic Segmentation (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)