Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Overview

Semantic Segmentation

Easy to use and customizable SOTA Semantic Segmentation models with abundant datasets in PyTorch

Open In Colab

banner

Features

  • Applicable to following tasks:
    • Scene Parsing
    • Human Parsing
    • Face Parsing
  • 20+ Datasets
  • 10+ SOTA Backbones
  • 10+ SOTA Semantic Segmentation Models
  • PyTorch, ONNX, TFLite and OpenVINO Inference

Model Zoo

Supported Backbones:

Supported Heads/Methods:

Supported Standalone Models:

Supported Modules:

  • PPM (CVPR 2017)
  • PSA (ArXiv 2021)
ADE20K-val (Scene Parsing)
Method Backbone mIoU (%) Params
(M)
GFLOPs
(512x512)
Weights
SegFormer MiT-B1 43.1 14 16 pt
MiT-B2 47.5 28 62 pt
MiT-B3 50.0 47 79 pt
CityScapes-val (Scene Parsing)
Method Backbone mIoU (%) Params (M) GFLOPs Img Size Weights
SegFormer MiT-B0 78.1 4 126 1024x1024 N/A
MiT-B1 80.0 14 244 1024x1024 N/A
FaPN ResNet-50 80.0 33 - 512x1024 N/A
SFNet ResNetD-18 79.0 13 - 1024x1024 N/A
FCHarDNet HarDNet-70 77.7 4 35 1024x1024 pt
DDRNet DDRNet-23slim 77.8 6 36 1024x2048 pt
HELEN-val (Face Parsing)
Method Backbone mIoU (%) Params
(M)
GFLOPs
(512x512)
FPS
(GTX1660ti)
Weights
BiSeNetv1 MobileNetV2-1.0 58.22 5 5 160 pt
BiSeNetv1 ResNet-18 58.50 14 13 263 pt
BiSeNetv2 - 58.58 18 15 195 pt
FCHarDNet HarDNet-70 59.38 4 4 130 pt
DDRNet DDRNet-23slim 61.11 6 5 180 pt|tflite(fp32)|tflite(fp16)|tflite(int8)
SegFormer MiT-B0 59.31 4 8 75 pt
SFNet ResNetD-18 61.00 14 31 56 pt
Backbones
Model Variants ImageNet-1k Top-1 Acc (%) Params (M) GFLOPs Weights
MicroNet M1|M2|M3 51.4|59.4|62.5 1|2|3 6M|12M|21M download
MobileNetV2 1.0 71.9 3 300M download
MobileNetV3 S|L 67.7|74.0 3|5 56M|219M S|L
ResNet 18|50|101 69.8|76.1|77.4 12|25|44 2|4|8 download
ResNetD 18|50|101 - 12|25|44 2|4|8 download
MiT B1|B2|B3 - 14|25|45 2|4|8 download
PVTv2 B1|B2|B4 78.7|82.0|83.6 14|25|63 2|4|10 download
ResT S|B|L 79.6|81.6|83.6 14|30|52 2|4|8 download

Notes: Download backbones' weights for HarDNet-70 and DDRNet-23slim.

Supported Datasets

Dataset Type Categories Train
Images
Val
Images
Test
Images
Image Size
(HxW)
COCO-Stuff General Scene Parsing 171 118,000 5,000 20,000 -
ADE20K General Scene Parsing 150 20,210 2,000 3,352 -
PASCALContext General Scene Parsing 59 4,996 5,104 9,637 -
SUN RGB-D Indoor Scene Parsing 37 2,666 2,619 5,050+labels -
Mapillary Vistas Street Scene Parsing 65 18,000 2,000 5,000 1080x1920
CityScapes Street Scene Parsing 19 2,975 500 1,525+labels 1024x2048
CamVid Street Scene Parsing 11 367 101 233+labels 720x960
MHPv2 Multi-Human Parsing 59 15,403 5,000 5,000 -
MHPv1 Multi-Human Parsing 19 3,000 1,000 980+labels -
LIP Multi-Human Parsing 20 30,462 10,000 - -
CCIHP Multi-Human Parsing 22 28,280 5,000 5,000 -
CIHP Multi-Human Parsing 20 28,280 5,000 5,000 -
ATR Single-Human Parsing 18 16,000 700 1,000+labels -
HELEN Face Parsing 11 2,000 230 100+labels -
LaPa Face Parsing 11 18,176 2,000 2,000+labels -
iBugMask Face Parsing 11 21,866 - 1,000+labels -
CelebAMaskHQ Face Parsing 19 24,183 2,993 2,824+labels 512x512
FaceSynthetics Face Parsing (Synthetic) 19 100,000 1,000 100+labels 512x512
SUIM Underwater Imagery 8 1,525 - 110+labels -

Check DATASETS to find more segmentation datasets.

Datasets Structure (click to expand)

Datasets should have the following structure:

data
|__ ADEChallenge
    |__ ADEChallengeData2016
        |__ images
            |__ training
            |__ validation
        |__ annotations
            |__ training
            |__ validation

|__ CityScapes
    |__ leftImg8bit
        |__ train
        |__ val
        |__ test
    |__ gtFine
        |__ train
        |__ val
        |__ test

|__ CamVid
    |__ train
    |__ val
    |__ test
    |__ train_labels
    |__ val_labels
    |__ test_labels
    
|__ VOCdevkit
    |__ VOC2010
        |__ JPEGImages
        |__ SegmentationClassContext
        |__ ImageSets
            |__ SegmentationContext
                |__ train.txt
                |__ val.txt
    
|__ COCO
    |__ images
        |__ train2017
        |__ val2017
    |__ labels
        |__ train2017
        |__ val2017

|__ MHPv1
    |__ images
    |__ annotations
    |__ train_list.txt
    |__ test_list.txt

|__ MHPv2
    |__ train
        |__ images
        |__ parsing_annos
    |__ val
        |__ images
        |__ parsing_annos

|__ LIP
    |__ LIP
        |__ TrainVal_images
            |__ train_images
            |__ val_images
        |__ TrainVal_parsing_annotations
            |__ train_segmentations
            |__ val_segmentations

    |__ CIHP/CCIHP
        |__ instance-leve_human_parsing
            |__ Training
                |__ Images
                |__ Category_ids
            |__ Validation
                |__ Images
                |__ Category_ids

    |__ ATR
        |__ humanparsing
            |__ JPEGImages
            |__ SegmentationClassAug

|__ SUIM
    |__ train_val
        |__ images
        |__ masks
    |__ TEST
        |__ images
        |__ masks

|__ SunRGBD
    |__ SUNRGBD
        |__ kv1/kv2/realsense/xtion
    |__ SUNRGBDtoolbox
        |__ traintestSUNRGBD
            |__ allsplit.mat

|__ Mapillary
    |__ training
        |__ images
        |__ labels
    |__ validation
        |__ images
        |__ labels

|__ SmithCVPR2013_dataset_resized (HELEN)
    |__ images
    |__ labels
    |__ exemplars.txt
    |__ testing.txt
    |__ tuning.txt

|__ CelebAMask-HQ
    |__ CelebA-HQ-img
    |__ CelebAMask-HQ-mask-anno
    |__ CelebA-HQ-to-CelebA-mapping.txt

|__ LaPa
    |__ train
        |__ images
        |__ labels
    |__ val
        |__ images
        |__ labels
    |__ test
        |__ images
        |__ labels

|__ ibugmask_release
    |__ train
    |__ test

|__ FaceSynthetics
    |__ dataset_100000
    |__ dataset_1000
    |__ dataset_100

Note: For PASCALContext, download the annotations from here and put it in VOC2010.

Note: For CelebAMask-HQ, run the preprocess script. python3 scripts/preprocess_celebamaskhq.py --root .


Augmentations (click to expand)

Check out the notebook here to test the augmentation effects.

Pixel-level Transforms:

  • ColorJitter (Brightness, Contrast, Saturation, Hue)
  • Gamma, Sharpness, AutoContrast, Equalize, Posterize
  • GaussianBlur, Grayscale

Spatial-level Transforms:

  • Affine, RandomRotation
  • HorizontalFlip, VerticalFlip
  • CenterCrop, RandomCrop
  • Pad, ResizePad, Resize
  • RandomResizedCrop

Usage

Requirements
  • python >= 3.6
  • torch >= 1.8.1
  • torchvision >= 0.9.1

Other requirements can be installed with pip install -r requirements.txt.


Configuration (click to expand)

Create a configuration file in configs. Sample configuration for ADE20K dataset can be found here. Then edit the fields you think if it is needed. This configuration file is needed for all of training, evaluation and prediction scripts.


Training (click to expand)

To train with a single GPU:

$ python tools/train.py --cfg configs/CONFIG_FILE.yaml

To train with multiple gpus, set DDP field in config file to true and run as follows:

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env tools/train.py --cfg configs/<CONFIG_FILE_NAME>.yaml

Evaluation (click to expand)

Make sure to set MODEL_PATH of the configuration file to your trained model directory.

$ python tools/val.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To evaluate with multi-scale and flip, change ENABLE field in MSF to true and run the same command as above.


Inference

To make an inference, edit the parameters of the config file from below.

  • Change MODEL >> NAME and VARIANT to your desired pretrained model.
  • Change DATASET >> NAME to the dataset name depending on the pretrained model.
  • Set TEST >> MODEL_PATH to pretrained weights of the testing model.
  • Change TEST >> FILE to the file or image folder path you want to test.
  • Testing results will be saved in SAVE_DIR.
## example using ade20k pretrained models
$ python tools/infer.py --cfg configs/ade20k.yaml

Example test results:

test_result


Convert to other Frameworks (ONNX, CoreML, OpenVINO, TFLite)

To convert to ONNX and CoreML, run:

$ python tools/export.py --cfg configs/<CONFIG_FILE_NAME>.yaml

To convert to OpenVINO and TFLite, see torch_optimize.


Inference (ONNX, OpenVINO, TFLite)
## ONNX Inference
$ python scripts/onnx_infer.py --model <ONNX_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## OpenVINO Inference
$ python scripts/openvino_infer.py --model <OpenVINO_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

## TFLite Inference
$ python scripts/tflite_infer.py --model <TFLite_MODEL_PATH> --img-path <TEST_IMAGE_PATH>

References (click to expand)

Citations (click to expand)
@article{xie2021segformer,
  title={SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers},
  author={Xie, Enze and Wang, Wenhai and Yu, Zhiding and Anandkumar, Anima and Alvarez, Jose M and Luo, Ping},
  journal={arXiv preprint arXiv:2105.15203},
  year={2021}
}

@misc{xiao2018unified,
  title={Unified Perceptual Parsing for Scene Understanding}, 
  author={Tete Xiao and Yingcheng Liu and Bolei Zhou and Yuning Jiang and Jian Sun},
  year={2018},
  eprint={1807.10221},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{hong2021deep,
  title={Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes},
  author={Hong, Yuanduo and Pan, Huihui and Sun, Weichao and Jia, Yisong},
  journal={arXiv preprint arXiv:2101.06085},
  year={2021}
}

@misc{zhang2021rest,
  title={ResT: An Efficient Transformer for Visual Recognition}, 
  author={Qinglong Zhang and Yubin Yang},
  year={2021},
  eprint={2105.13677},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{huang2021fapn,
  title={FaPN: Feature-aligned Pyramid Network for Dense Image Prediction}, 
  author={Shihua Huang and Zhichao Lu and Ran Cheng and Cheng He},
  year={2021},
  eprint={2108.07058},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@misc{wang2021pvtv2,
  title={PVTv2: Improved Baselines with Pyramid Vision Transformer}, 
  author={Wenhai Wang and Enze Xie and Xiang Li and Deng-Ping Fan and Kaitao Song and Ding Liang and Tong Lu and Ping Luo and Ling Shao},
  year={2021},
  eprint={2106.13797},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@article{Liu2021PSA,
  title={Polarized Self-Attention: Towards High-quality Pixel-wise Regression},
  author={Huajun Liu and Fuqiang Liu and Xinyi Fan and Dong Huang},
  journal={Arxiv Pre-Print arXiv:2107.00782 },
  year={2021}
}

@misc{chao2019hardnet,
  title={HarDNet: A Low Memory Traffic Network}, 
  author={Ping Chao and Chao-Yang Kao and Yu-Shan Ruan and Chien-Hsiang Huang and Youn-Long Lin},
  year={2019},
  eprint={1909.00948},
  archivePrefix={arXiv},
  primaryClass={cs.CV}
}

@inproceedings{sfnet,
  title={Semantic Flow for Fast and Accurate Scene Parsing},
  author={Li, Xiangtai and You, Ansheng and Zhu, Zhen and Zhao, Houlong and Yang, Maoke and Yang, Kuiyuan and Tong, Yunhai},
  booktitle={ECCV},
  year={2020}
}

@article{Li2020SRNet,
  title={Towards Efficient Scene Understanding via Squeeze Reasoning},
  author={Xiangtai Li and Xia Li and Ansheng You and Li Zhang and Guang-Liang Cheng and Kuiyuan Yang and Y. Tong and Zhouchen Lin},
  journal={ArXiv},
  year={2020},
  volume={abs/2011.03308}
}

@ARTICLE{Yucondnet21,
  author={Yu, Changqian and Shao, Yuanjie and Gao, Changxin and Sang, Nong},
  journal={IEEE Signal Processing Letters}, 
  title={CondNet: Conditional Classifier for Scene Segmentation}, 
  year={2021},
  volume={28},
  number={},
  pages={758-762},
  doi={10.1109/LSP.2021.3070472}
}

Comments
  • RuntimeError: CUDA error: an illegal memory access was encountered

    RuntimeError: CUDA error: an illegal memory access was encountered

    The error was occured after changing batch_size from 8 to 4 in cityscapes.yaml.

    Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/185] LR: 0.00010049 Loss: 7.71337414: 1%|▌ | 1/185 [00:03<11:41, 3.81s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 97, in main scaler.scale(loss).backward() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/_tensor.py", line 255, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/autograd/init.py", line 147, in backward Variable._execution_engine.run_backward( RuntimeError: CUDA error: an illegal memory access was encountered (semseg) liuwang@liuwang-OMEN-30L:~/Documents/projects/semantic-segmentation$ CUDA_LAUNCH_BLOCKING=1 python tools/train.py --cfg configs/DDRNet/cityscapes.yaml Found 2975 train images. Found 500 val images. Epoch: [1/500] Iter: [1/743] LR: 0.00010012 Loss: 7.53493118: 0%|▏ | 1/743 [00:03<37:21, 3.02s/it] Traceback (most recent call last): File "tools/train.py", line 153, in main(cfg, gpu, save_dir) File "tools/train.py", line 95, in main loss = loss_fn(logits, lbl) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in forward return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 43, in return sum([w * self._forward(pred, labels) for (pred, w) in zip(preds, self.aux_weights)]) File "/home/liuwang/Documents/projects/semantic-segmentation/semseg/losses.py", line 33, in _forward loss = self.criterion(preds, labels).view(-1) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl return forward_call(*input, **kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 1120, in forward return F.cross_entropy(input, target, weight=self.weight, File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/nn/functional.py", line 2824, in cross_entropy return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index) RuntimeError: CUDA error: an illegal memory access was encountered Exception in thread Thread-2: Traceback (most recent call last): File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 932, in _bootstrap_inner self.run() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/threading.py", line 870, in run self._target(*self._args, **self._kwargs) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/utils/data/_utils/pin_memory.py", line 28, in _pin_memory_loop r = in_queue.get(timeout=MP_STATUS_CHECK_INTERVAL) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/queues.py", line 116, in get return _ForkingPickler.loads(res) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/site-packages/torch/multiprocessing/reductions.py", line 289, in rebuild_storage_fd fd = df.detach() File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 57, in detach with _resource_sharer.get_connection(self._id) as conn: File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/resource_sharer.py", line 87, in get_connection c = Client(address, authkey=process.current_process().authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 508, in Client answer_challenge(c, authkey) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 752, in answer_challenge message = connection.recv_bytes(256) # reject large message File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 216, in recv_bytes buf = self._recv_bytes(maxlength) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 414, in _recv_bytes buf = self._recv(4) File "/home/liuwang/Softwares/anaconda3/envs/semseg/lib/python3.8/multiprocessing/connection.py", line 379, in _recv chunk = read(handle, remaining) ConnectionResetError: [Errno 104] Connection reset by peer

    opened by StuLiu 9
  • Augmentation configuration

    Augmentation configuration

    Presently the augmentations used during training are hard-coded in semseg/augmentations.py. By default horizontal flip is enabled, which will be problematic for datasets where the orientation of the image matters (e.g. facial datasets may label the left and right eyes independently, and flipping the image would also require swapping these two labels).

    Are there any future plans to allow for augmentation to be configurable?

    opened by markdjwilliams 5
  • Performance issues with ddrnet

    Performance issues with ddrnet

    Hello, I am using ddrnet , i tried several other models with the same dataset i am working on and they improve over time except ddrnet after 10 epoch the miou starts decreasing , Does that mean ddrnet perform bad or is something wrong with the code i should fix?

    opened by mhmd-mst 5
  • Question about the dimensionality of the mask.

    Question about the dimensionality of the mask.

    Thank you for your work.

    It would be nice to see the actual performance of the models in fps on specific hardware. Particularly on devices like jetson.

    The training requires the mask to be gray, and in the file that describes the dataset PALETTE has a dimension of 3. Can you tell me what should be the dimensionality of PALETTE (for example my labels will be (1,1,1) or 1 etc.)

    opened by MsWik 4
  • SegFormer B5 evaluation error on cityscapes

    SegFormer B5 evaluation error on cityscapes

    Hello, I try to implement an evaluation of segformer b5 to cityscapes dataset, but I keep getting an error related to the network's structure. I attached a file with the error. error.txt Even though I modified a custom.yaml file, which is like this one, I still keep getting an error. What can i do to solve this?

        MODEL:                                    
          NAME          : SegFormer                                           # name of the model you are using
          VARIANT       : B5                                                  # model variant
          PRETRAINED    : 'checkpoints/backbones/hardnet/hardnet_70.pth'              # backbone model's weight 
        
        DATASET:
          NAME          : cityscapes                                              # dataset name to be trained with (camvid, cityscapes, ade20k)
          ROOT          : '/media/evdo/DATA/diploma_project/cityscapes/gtFine_trainvaltest'                         # dataset root path
        
        TRAIN:
          IMAGE_SIZE    : [1024, 1024]    # training image size in (h, w)
          BATCH_SIZE    : 8               # batch size used to train
          EPOCHS        : 500             # number of epochs to train
          EVAL_INTERVAL : 50              # evaluation interval during training
          AMP           : false           # use AMP in training
          DDP           : false           # use DDP training
        
        LOSS:
          NAME          : ohemce          # loss function name (ohemce, ce, dice)
          CLS_WEIGHTS   : true            # use class weights in loss calculation
          THRESH        : 0.9             # ohemce threshold or dice delta if you choose ohemce loss or dice loss
        
        OPTIMIZER:
          NAME          : adamw           # optimizer name
          LR            : 0.01            # initial learning rate used in optimizer
          WEIGHT_DECAY  : 0.0001          # decay rate used in optimizer 
        
        SCHEDULER:
          NAME          : warmuppolylr    # scheduler name
          POWER         : 0.9             # scheduler power
          WARMUP        : 10              # warmup epochs used in scheduler
          WARMUP_RATIO  : 0.1             # warmup ratio
          
        
        EVAL:
          MODEL_PATH    : 'checkpoints/pretrained/segformer/segformer.b5.1024x1024.city.160k.pth'  # trained model file path
          IMAGE_SIZE    : [1024, 1024]                                                            # evaluation image size in (h, w)                       
          MSF:  
            ENABLE      : false                                                                 # multi-scale and flip evaluation  
            FLIP        : true                                                                  # use flip in evaluation  
            SCALES      : [0.5, 0.75, 1.0, 1.25, 1.5, 1.75]                                     # scales used in MSF evaluation                
        
        
        TEST:
          MODEL_PATH    : 'checkpoints/pretrained/ddrnet/ddrnet_23_city.pth'                     # trained model file path
          FILE          : 'cityscapes/leftImg8bit_trainextra/leftImg8bit/train_extra/wurzburg' # filename or foldername 
          IMAGE_SIZE    : [480, 640]                                                           # inference image size in (h, w)
          OVERLAY       : false      
    
    opened by evdokimos 3
  • About mIoU result on Cityscapes validation dataset

    About mIoU result on Cityscapes validation dataset

    Hi,

    Thanks for your work and providing code. I am just confuse about the mIoU reported in the paper on cityscapes validation dataset which is around 77%. Kindly let me know is this mIoU achieved on pretrained model (imagenet)? I am training DDR-Net-Slim23 from scratch and I am getting around 55% mIoU on validation data of Cityscapes.

    opened by tanveer6715 3
  • RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    I have 18 classes using the cityscapes dataset, and I used to train my own images but I get this error from the Dice loss function at this point: tp = torch.sum(targets*preds, dim=(2, 3)) RuntimeError: The size of tensor a (4) must match the size of tensor b (18) at non-singleton dimension 1

    opened by ghost 3
  • ImageDraw.textbbox Attribute error in Visualize.py function

    ImageDraw.textbbox Attribute error in Visualize.py function "draw_text" when calling infer.py

    I am running infer.py and I get this error, any idea why? Traceback (most recent call last): File "tools/infer.py", line 100, in segmap,oversegmap,image = semseg.predict(str(test_file), cfg['TEST']['OVERLAY']) File "tools/infer.py", line 73, in predict seg_map,over_seg_map,image = self.postprocess(torch.tensor(image1), seg_map, overlay) File "tools/infer.py", line 60, in postprocess image = draw_text(seg_image, seg_map, self.labels) File "/content/trial/semseg/utils/visualize.py", line 28, in draw_text bbox = draw.textbbox(center, cls, font=fonts) AttributeError: 'ImageDraw' object has no attribute 'textbbox'

    opened by mhmd-mst 2
  • Double Checking implementation detail in SegFormerHead

    Double Checking implementation detail in SegFormerHead

    The paper on SegFormer suggests an All MLP decoder.

    Screen Shot 2022-04-23 at 3 03 57 AM

    The SegformerHead.py shows the use of a Conv2D for the final layer.

    Screen Shot 2022-04-23 at 3 06 18 AM

    Can you help me understand if this is a deviation from the paper or mentioned in a followup paper somewhere? I apologize in advance if there is an obvious answer.

    opened by RahulSinghalChicago 2
  • UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

    UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`

    /usr/local/lib/python3.7/dist-packages/torch/optim/lr_scheduler.py:134: UserWarning: Detected call of lr_scheduler.step() before optimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order: optimizer.step() before lr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate "https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate", UserWarning)

    opened by RahulSinghalChicago 2
  • ModuleNotFoundError: No module named 'semseg'

    ModuleNotFoundError: No module named 'semseg'

    Hi there, When I tried to run infer.py, I got the following error: (sotas) xzhan2@cerlab27:~/semantic-segmentation$ python tools/infer.py --cfg configs/custom.yaml Traceback (most recent call last): File "tools/infer.py", line 12, in <module> from semseg.models import * ModuleNotFoundError: No module named 'semseg' I also tried to append the path to /semseg into sys.path, but it did not solve the problem. Could you please give a bit suggestions? Thanks

    opened by Shawn207 2
  • BiSeNetv2 implementation difference leads to strange results

    BiSeNetv2 implementation difference leads to strange results

    Hello, I train BiSeNetv2 in my custom dataset and find probability maps have a grid-like look, which is very unnatural.

    After further observation I find you use PixelShuffle on the last upsampling layer, which leads to the grid results(seems like you also use it in v1). But in BiSeNetv2 paper, the author uses simple bilinear interpolation to upsample results, which gives smoothing probability maps in my experiment. why use PixelShuffle here?

    So what's the idea of using PixelShuffle to upsample final results? Also, don't see an obvious improvement on validate score by using it tho.

    By the way, there are some differences between your implementation and the paper on Gather-and-Expansion Layer. Cannot figure out if this will affect training results significantly.

    opened by wasupandceacar 0
  • RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access

    RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access

    你好,请问我在使用DDR模型遇到这个错误'RuntimeError: transform: failed to sync: cudaErrorIllegalAddress: an 非法内存 access',请教一下遇到这样子错误吗?网上搜有说pytorch版本,还有标注文件 / 标签 等没有设置正确 以及网络模型、训练数据(图片和标签)没有放置到GPU上,我感觉这些都没有问题啊。期待您的回复

    opened by scl666 0
  • training error on colab

    training error on colab

    My all setup is successful on colab for training. However, when I run

    !python tools/train.py --cfg configs/CONFIG_FILE.yaml

    I get error:

    Found 20210 training images. Found 2000 validation images. Epoch: [1/500] Iter: [0/2526] LR: 0.00100000 Loss: 0.00000000: 0% 0/2526 [00:00<?, ?it/s] Traceback (most recent call last): File "tools/train.py", line 128, in main(cfg, gpu, save_dir) File "tools/train.py", line 69, in main for iter, (img, lbl) in pbar: File "/usr/local/lib/python3.7/dist-packages/tqdm/std.py", line 1195, in iter for obj in iterable: File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 681, in next data = self._next_data() File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1376, in _next_data return self._process_data(data) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py", line 1402, in _process_data data.reraise() File "/usr/local/lib/python3.7/dist-packages/torch/_utils.py", line 461, in reraise raise exception RuntimeError: Caught RuntimeError in DataLoader worker process 0. Original Traceback (most recent call last): File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/worker.py", line 302, in _worker_loop data = fetcher.fetch(index) File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py", line 49, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/semantic-segmentation/semseg/datasets/ade20k.py", line 73, in getitem image, label = self.transform(image, label) File "/content/semantic-segmentation/semseg/augmentations.py", line 20, in call img, mask = transform(img, mask) File "/content/semantic-segmentation/semseg/augmentations.py", line 329, in call mask = TF.pad(mask, padding, fill=self.seg_fill) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional.py", line 481, in pad return F_t.pad(img, padding=padding, fill=fill, padding_mode=padding_mode) File "/usr/local/lib/python3.7/dist-packages/torchvision/transforms/functional_tensor.py", line 418, in pad img = torch_pad(img, p, mode=padding_mode, value=float(fill)) RuntimeError: value cannot be converted to type uint8_t without overflow

    opened by hb0313 2
  • data format

    data format

    Is the annotation format the same for all? eg. ADE20K seems to be color images. I have COCO format original annotation data (grey scale) but it appears I need to change it to color?

    Thanks.

    opened by Tsardoz 1
Releases(v0.2.6)
Owner
sithu3
AI Developer
sithu3
A collection of SOTA Image Classification Models in PyTorch

A collection of SOTA Image Classification Models in PyTorch

sithu3 85 Dec 30, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

Aria Ghora Prabono 16 Jun 16, 2022
A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Torch-RecHub A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend. 安装 pip install torch-rechub 主要特性 scikit-learn风格易用

Mincai Lai 67 Jan 4, 2023
OpenCVのGrabCut()を利用したセマンティックセグメンテーション向けアノテーションツール(Annotation tool using GrabCut() of OpenCV. It can be used to create datasets for semantic segmentation.)

[Japanese/English] GrabCut-Annotation-Tool GrabCut-Annotation-Tool.mp4 OpenCVのGrabCut()を利用したアノテーションツールです。 セマンティックセグメンテーション向けのデータセット作成にご使用いただけます。 ※Grab

KazuhitoTakahashi 30 Nov 18, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
[ICCV 2021] Official Pytorch implementation for Discriminative Region-based Multi-Label Zero-Shot Learning SOTA results on NUS-WIDE and OpenImages

Discriminative Region-based Multi-Label Zero-Shot Learning (ICCV 2021) [arXiv][Project page >> coming soon] Sanath Narayan*, Akshita Gupta*, Salman Kh

Akshita Gupta 54 Nov 21, 2022
Cl datasets - PyTorch image dataloaders and utility functions to load datasets for supervised continual learning

Continual learning datasets Introduction This repository contains PyTorch image

berjaoui 5 Aug 28, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Phil Wang 12.6k Jan 9, 2023
TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.

TorchFlare TorchFlare is a simple, beginner-friendly and an easy-to-use PyTorch Framework train your models without much effort. It provides an almost

Atharva Phatak 85 Dec 26, 2022
Learning Pixel-level Semantic Affinity with Image-level Supervision for Weakly Supervised Semantic Segmentation, CVPR 2018

Learning Pixel-level Semantic Affinity with Image-level Supervision This code is deprecated. Please see https://github.com/jiwoon-ahn/irn instead. Int

Jiwoon Ahn 337 Dec 15, 2022
Implementation of STAM (Space Time Attention Model), a pure and simple attention model that reaches SOTA for video classification

STAM - Pytorch Implementation of STAM (Space Time Attention Model), yet another pure and simple SOTA attention model that bests all previous models in

Phil Wang 109 Dec 28, 2022
Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track (SIGIR 2021 Full Paper).

Optimizing Dense Retrieval Model Training with Hard Negatives Jingtao Zhan, Jiaxin Mao, Yiqun Liu, Jiafeng Guo, Min Zhang, Shaoping Ma This repo provi

Jingtao Zhan 99 Dec 27, 2022
Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Uniformer - Pytorch Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification ta

Phil Wang 90 Nov 24, 2022
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data.

Deep Learning Dataset Maker Deep Learning Datasets Maker is a QGIS plugin to make datasets creation easier for raster and vector data. How to use Down

deepbands 25 Dec 15, 2022
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP

Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP Abstract: We introduce a method that allows to automatically se

Daniil Pakhomov 134 Dec 19, 2022
SOTA model in CIFAR10

A PyTorch Implementation of CIFAR Tricks 调研了CIFAR10数据集上各种trick,数据增强,正则化方法,并进行了实现。目前项目告一段落,如果有更好的想法,或者希望一起维护这个项目可以提issue或者在我的主页找到我的联系方式。 0. Requirement

PJDong 58 Dec 21, 2022