Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Overview

Semantic Segmentation on MIT ADE20K dataset in PyTorch

This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).

ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: https://github.com/CSAILVision/sceneparsing

If you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and parse it!

You can also use this colab notebook playground here to tinker with the code for segmenting an image.

All pretrained models can be found at: http://sceneparsing.csail.mit.edu/model/pytorch

[From left to right: Test Image, Ground Truth, Predicted Result]

Color encoding of semantic categories can be found here: https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing

Updates

  • HRNet model is now supported.
  • We use configuration files to store most options which were in argument parser. The definitions of options are detailed in config/defaults.py.
  • We conform to Pytorch practice in data preprocessing (RGB [0, 1], substract mean, divide std).

Highlights

Syncronized Batch Normalization on PyTorch

This module computes the mean and standard-deviation across all devices during training. We empirically find that a reasonable large batch size is important for segmentation. We thank Jiayuan Mao for his kind contributions, please refer to Synchronized-BatchNorm-PyTorch for details.

The implementation is easy to use as:

  • It is pure-python, no C++ extra extension libs.
  • It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).
  • It is efficient, only 20% to 30% slower than UnsyncBN.

Dynamic scales of input for training with multiple GPUs

For the task of semantic segmentation, it is good to keep aspect ratio of images during training. So we re-implement the DataParallel module, and make it support distributing data to multiple GPUs in python dict, so that each gpu can process images of different sizes. At the same time, the dataloader also operates differently.

Now the batch size of a dataloader always equals to the number of GPUs, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for __getitem__ function, we just ignore such request and send a random batch dict. Also, the multiple workers forked by the dataloader all have the same seed, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for numpy.random before activating multiple worker in dataloader.

State-of-the-Art models

  • PSPNet is scene parsing network that aggregates global representation with Pyramid Pooling Module (PPM). It is the winner model of ILSVRC'16 MIT Scene Parsing Challenge. Please refer to https://arxiv.org/abs/1612.01105 for details.
  • UPerNet is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time-and-memory consuming. Without bells and whistles, it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory. Please refer to https://arxiv.org/abs/1807.10221 for details.
  • HRNet is a recently proposed model that retains high resolution representations throughout the model, without the traditional bottleneck design. It achieves the SOTA performance on a series of pixel labeling tasks. Please refer to https://arxiv.org/abs/1904.04514 for details.

Supported models

We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. We have provided some pre-configured models in the config folder.

Encoder:

  • MobileNetV2dilated
  • ResNet18/ResNet18dilated
  • ResNet50/ResNet50dilated
  • ResNet101/ResNet101dilated
  • HRNetV2 (W48)

Decoder:

  • C1 (one convolution module)
  • C1_deepsup (C1 + deep supervision trick)
  • PPM (Pyramid Pooling Module, see PSPNet paper for details.)
  • PPM_deepsup (PPM + deep supervision trick)
  • UPerNet (Pyramid Pooling + FPN head, see UperNet for details.)

Performance:

IMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.

Architecture MultiScale Testing Mean IoU Pixel Accuracy(%) Overall Score Inference Speed(fps)
MobileNetV2dilated + C1_deepsup No 34.84 75.75 54.07 17.2
Yes 33.84 76.80 55.32 10.3
MobileNetV2dilated + PPM_deepsup No 35.76 77.77 56.27 14.9
Yes 36.28 78.26 57.27 6.7
ResNet18dilated + C1_deepsup No 33.82 76.05 54.94 13.9
Yes 35.34 77.41 56.38 5.8
ResNet18dilated + PPM_deepsup No 38.00 78.64 58.32 11.7
Yes 38.81 79.29 59.05 4.2
ResNet50dilated + PPM_deepsup No 41.26 79.73 60.50 8.3
Yes 42.14 80.13 61.14 2.6
ResNet101dilated + PPM_deepsup No 42.19 80.59 61.39 6.8
Yes 42.53 80.91 61.72 2.0
UperNet50 No 40.44 79.80 60.12 8.4
Yes 41.55 80.23 60.89 2.9
UperNet101 No 42.00 80.79 61.40 7.8
Yes 42.66 81.01 61.84 2.3
HRNetV2 No 42.03 80.77 61.40 5.8
Yes 43.20 81.47 62.34 1.9

The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.

Environment

The code is developed under the following configurations.

  • Hardware: >=4 GPUs for training, >=1 GPU for testing (set [--gpus GPUS] accordingly)
  • Software: Ubuntu 16.04.3 LTS, CUDA>=8.0, Python>=3.5, PyTorch>=0.4.0
  • Dependencies: numpy, scipy, opencv, yacs, tqdm

Quick start: Test on an image using our trained model

  1. Here is a simple demo to do inference on a single image:
chmod +x demo_test.sh
./demo_test.sh

This script downloads a trained model (ResNet50dilated + PPM_deepsup) and a test image, runs the test script, and saves predicted segmentation (.png) to the working directory.

  1. To test on an image or a folder of images ($PATH_IMG), you can simply do the following:
python3 -u test.py --imgs $PATH_IMG --gpu $GPU --cfg $CFG

Training

  1. Download the ADE20K scene parsing dataset:
chmod +x download_ADE20K.sh
./download_ADE20K.sh
  1. Train a model by selecting the GPUs ($GPUS) and configuration file ($CFG) to use. During training, checkpoints by default are saved in folder ckpt.
python3 train.py --gpus $GPUS --cfg $CFG 
  • To choose which gpus to use, you can either do --gpus 0-7, or --gpus 0,2,4,6.

For example, you can start with our provided configurations:

  • Train MobileNetV2dilated + C1_deepsup
python3 train.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml
  • Train ResNet50dilated + PPM_deepsup
python3 train.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml
  • Train UPerNet101
python3 train.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml
  1. You can also override options in commandline, for example python3 train.py TRAIN.num_epoch 10 .

Evaluation

  1. Evaluate a trained model on the validation set. Add VAL.visualize True in argument to output visualizations as shown in teaser.

For example:

  • Evaluate MobileNetV2dilated + C1_deepsup
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml
  • Evaluate ResNet50dilated + PPM_deepsup
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml
  • Evaluate UPerNet101
python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml

Integration with other projects

This library can be installed via pip to easily integrate with another codebase

pip install git+https://github.com/CSAILVision/[email protected]

Now this library can easily be consumed programmatically. For example

from mit_semseg.config import cfg
from mit_semseg.dataset import TestDataset
from mit_semseg.models import ModelBuilder, SegmentationModule

Reference

If you find the code or pre-trained models useful, please cite the following papers:

Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)

@article{zhou2018semantic,
  title={Semantic understanding of scenes through the ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
  journal={International Journal on Computer Vision},
  year={2018}
}

Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)

@inproceedings{zhou2017scene,
    title={Scene Parsing through ADE20K Dataset},
    author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year={2017}
}
Comments
  • input and target size don't match for loss function

    input and target size don't match for loss function

    It looks like every combination but the default resnet50_dilated8/ppm_bilinear_deepsup leads to a mismatch in size between the input and the target of the loss function. I'm a bit mystified, I did not change any of the models. What I adapted was the number of labels (to 8 as one can see below).

    Encoder: resnet50_dilated8. Decoder: upernet RuntimeError: input and target batch or spatial sizes don't match: target [1 x 85 x 106], input [1 x 8 x 170 x 212] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24 Encoder: Resnet101. Decoder: ppm_bilinear_deepsup return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce) RuntimeError: input and target batch or spatial sizes don't match: target [1 x 75 x 94], input [1 x 8 x 19 x 24] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24

    Encoder: Resnet101. Decoder: Upernet RuntimeError: input and target batch or spatial sizes don't match: target [1 x 85 x 106], input [1 x 8 x 170 x 212] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24

    In cases where the program runs the last two dimensions seem to be consistent torch.Size([1, 8, 75, 94]) torch.Size([1, 75, 94])

    opened by heinzermch 14
  • Error during training for custom dataset

    Error during training for custom dataset

    When trying to train the model by the command below, a RuntimeError occurred, it seems that some problems with the GPUs (four GPU).

    command I run

    the command I run:

    python train.py --gpus 0,1,2,3 --cfg $cfg

    Error:

    [2019-10-06 08:56:13,423 INFO train.py line 246 3390] Outputing checkpoints to: ckpt/test-resnet50dilated-ppm_deepsup
    # samples: 7296
    1 Epoch = 5000 iters
    Traceback (most recent call last):
      File "train.py", line 273, in <module>
        main(cfg, gpus)
      File "train.py", line 200, in main
        train(segmentation_module, iterator_train, optimizers, history, epoch+1, cfg)
      File "train.py", line 32, in train
        batch_data = next(iterator)
      File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
        return self._process_next_batch(batch)
      File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
        raise batch.exc_type(batch.exc_msg)
    AssertionError: Traceback (most recent call last):
      File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
        samples = collate_fn([dataset[i] for i in batch_indices])
      File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
        samples = collate_fn([dataset[i] for i in batch_indices])
      File "/home/bruno/xView2/semantic-segmentation-pytorch/dataset.py", line 162, in __getitem__
        assert(segm.mode == "L")
    AssertionError
    
    opened by bao18 9
  • dimension mismatch

    dimension mismatch

    I got this error

    RuntimeError: Given groups=1, weight of size [512, 320, 1, 1], expected input[1, 2048, 1, 1] to have 320 channels, but got 2048 channels instead

    I saw the other thread and I tried changing it to 512 but it still didn't work

    opened by johnathanchiu 9
  • Class that corresponds to each color

    Class that corresponds to each color

    Hello,

    @hangzhaomit thanks for the Pytorch implementation on ADE20K dataset! I wonder if there is a way to know which classes correspond to each color in the .png image we get as a result. Would you be so kind to provide me with any information about it?

    Thank you in advance

    opened by nefelipk 8
  • Jobs stop after random iterations without any error information.

    Jobs stop after random iterations without any error information.

    I have implemented my method based on the code base.

    I try to train my model from scratch and find that the jobs always stop outputting any log information after a random number of iterations, and it is no more than one epoch over all the data.

    For example, the job ended without updating the log information and without releasing the GPU memory.

    Epoch: [1][4517/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019349, lr_decoder: 0.019349, Accuracy: 62.36, Loss: 2.131847
    Epoch: [1][4518/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131737
    Epoch: [1][4519/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131553
    Epoch: [1][4520/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131408
    Epoch: [1][4521/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131315
    Epoch: [1][4522/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131346
    Epoch: [1][4523/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131252
    Epoch: [1][4524/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131484
    Epoch: [1][4525/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131410
    Epoch: [1][4526/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131326
    Epoch: [1][4527/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131221
    Epoch: [1][4528/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131181
    Epoch: [1][4529/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131092
    Epoch: [1][4530/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.38, Loss: 2.130914
    
    opened by PkuRainBow 8
  • it will execute multiple times when import from test.py

    it will execute multiple times when import from test.py

    In my commamd line, the 'test.py' works well .

    I want use this file in other python file, So I add a function to 'test.py', it is:

        def run(self, imgs=None, model="config/ade20k-resnet50dilated-ppm_deepsup.yaml", gpu=0):
            if imgs == None:
                return False
            cfg.merge_from_file(model)
    
            cfg.MODEL.arch_encoder = cfg.MODEL.arch_encoder.lower()
            cfg.MODEL.arch_decoder = cfg.MODEL.arch_decoder.lower()
    
            # absolute paths of model weights
            cfg.MODEL.weights_encoder = os.path.join(
                cfg.DIR, 'encoder' + cfg.TEST.suffix)
            cfg.MODEL.weights_decoder = os.path.join(
                cfg.DIR, 'decoder' + cfg.TEST.suffix)
    
            if not os.path.exists(cfg.MODEL.weights_encoder):
                return  (False, "checkpoint does not exitst!")
    
            # generate testing image list
            if os.path.isdir(imgs[0]):
                imgs = find_recursive(imgs[0])
            else:
                imgs = [imgs]
            assert len(imgs), "imgs should be a path to image (.jpg) or directory."
            cfg.list_test = [{'fpath_img': x} for x in imgs]
            if not os.path.isdir(cfg.TEST.result):
                os.makedirs(cfg.TEST.result)
            
            main(cfg, gpu)
    

    However, In another file, when I use this function like this:

    import run from test
    run("test.jpg")
    

    It outputs error:

    Loading weights for net_encoder
    Loading weights for net_decoder
    # samples: 1
    Loading weights for net_encoder
    Loading weights for net_encoder
    Loading weights for net_encoder
    Loading weights for net_encoder
    Loading weights for net_encoder
    Loading weights for net_decoder
    Loading weights for net_decoder
    Loading weights for net_decoder
    # samples: 1
    Loading weights for net_decoder
    Loading weights for net_decoder
    # samples: 1
    # samples: 1
    # samples: 1
    ......
    

    then my command line is crash.

    However, to use this function in the file test.py works well. Only in other file will it failed.

    This function is as same as the code in the if __name__ == '__main__':, so where is wrong?

    Thanks.

    opened by acdzh 7
  • test script for process a list of images

    test script for process a list of images

    Hi @hangzhaomit , i have write a script (based on your test .py) to read a list of images for batch convert. Hope it helps~

    import os
    import datetime
    import argparse
    from distutils.version import LooseVersion
    # Numerical libs
    import numpy as np
    import torch
    import torch.nn as nn
    from scipy.io import loadmat
    # Our libs
    from dataset import TestDataset
    from models import ModelBuilder, SegmentationModule
    from utils import colorEncode
    from lib.nn import user_scattered_collate, async_copy_to
    from lib.utils import as_numpy, mark_volatile
    import lib.utils.data as torchdata
    import cv2
    
    def visualize_result(data, preds, args):
        colors = loadmat('data/color150.mat')['colors']
        (img, info) = data
        pred_color = colorEncode(preds, colors)
    
        im_vis = np.concatenate((img, pred_color),
                                axis=1).astype(np.uint8)
    
        img_name = info.split('/')[-1]
        cv2.imwrite(os.path.join(args.result,
                    img_name.replace('.jpg', '.png')), im_vis)
    
    def test(segmentation_module, loader, args):
        segmentation_module.eval()
    
        for i, batch_data in enumerate(loader):
            # process data
            batch_data = batch_data[0]
            segSize = (batch_data['img_ori'].shape[0],
                       batch_data['img_ori'].shape[1])
    
            img_resized_list = batch_data['img_data']
    
            with torch.no_grad():
                pred = torch.zeros(1, args.num_class, segSize[0], segSize[1])
    
                for img in img_resized_list:
                    feed_dict = batch_data.copy()
                    feed_dict['img_data'] = img
                    del feed_dict['img_ori']
                    del feed_dict['info']
                    feed_dict = async_copy_to(feed_dict, args.gpu_id)
    
                    # forward pass
                    pred_tmp = segmentation_module(feed_dict, segSize=segSize)
                    pred = pred + pred_tmp.cpu() / len(args.imgSize)
    
                _, preds = torch.max(pred, dim=1)
                preds = as_numpy(preds.squeeze(0))
    
            visualize_result(
                (batch_data['img_ori'], batch_data['info']),
                preds, args)
    
            print('[{}] iter {}'
                  .format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), i))
    
    
    def main(args):
        torch.cuda.set_device(args.gpu_id)
        builder = ModelBuilder()
        net_encoder = builder.build_encoder(
            arch=args.arch_encoder,
            fc_dim=args.fc_dim,
            weights=args.weights_encoder)
        net_decoder = builder.build_decoder(
            arch=args.arch_decoder,
            fc_dim=args.fc_dim,
            num_class=args.num_class,
            weights=args.weights_decoder,
            use_softmax=True)
    
        crit = nn.NLLLoss(ignore_index=-1)
    
        segmentation_module = SegmentationModule(net_encoder, net_decoder, crit)
    
     ################# modified #################
       with open(args.test_list, 'r') as f:
             files = f.read().splitlines()
             for file_ in files:
                 test_img =[{'fpath_img':file_}]
                 dataset_val = TestDataset(test_img, args, max_sample=args.num_val)
                 loader_val = torchdata.DataLoader(
                    dataset_val,
                    batch_size=args.batch_size,
                   shuffle=False,
                   collate_fn=user_scattered_collate,
                   num_workers=5,
                  drop_last=True)
    
             segmentation_module.cuda()
             test(segmentation_module, loader_val, args)
    
             print('Inference done!')
     ################# modified #################
    
    if __name__ == '__main__':
        assert LooseVersion(torch.__version__) >= LooseVersion('0.4.0'), \
            'PyTorch>=0.4.0 is required'
    
        parser = argparse.ArgumentParser()
        # Path related arguments
        parser.add_argument('--test_img', required=True)
        parser.add_argument('--model_path', required=True,
                            help='folder to model path')
        parser.add_argument('--suffix', default='_epoch_20.pth',
                            help="which snapshot to load")
    
        # Model related arguments
        parser.add_argument('--arch_encoder', default='resnet50_dilated8',
                            help="architecture of net_encoder")
        parser.add_argument('--arch_decoder', default='ppm_bilinear_deepsup',
                            help="architecture of net_decoder")
        parser.add_argument('--fc_dim', default=2048, type=int,
                            help='number of features between encoder and decoder')
    
        # Data related arguments
        parser.add_argument('--num_val', default=-1, type=int,
                            help='number of images to evalutate')
        parser.add_argument('--num_class', default=150, type=int,
                            help='number of classes')
        parser.add_argument('--batch_size', default=1, type=int,
                            help='batchsize. current only supports 1')
        parser.add_argument('--imgSize', default=[300, 400, 500, 600],
                            nargs='+', type=int,
                            help='list of input image sizes.'
                                 'for multiscale testing, e.g. 300 400 500')
        parser.add_argument('--imgMaxSize', default=1000, type=int,
                            help='maximum input image size of long edge')
        parser.add_argument('--padding_constant', default=8, type=int,
                            help='maxmimum downsampling rate of the network')
        parser.add_argument('--segm_downsampling_rate', default=8, type=int,
                            help='downsampling rate of the segmentation label')
    
       
        parser.add_argument('--result', default='.',
                            help='folder to output visualization results')
        parser.add_argument('--gpu_id', default=0, type=int,
                            help='gpu_id for evaluation')
    
        args = parser.parse_args()
        print(args)
    
       
        args.weights_encoder = os.path.join(args.model_path,
                                            'encoder' + args.suffix)
        args.weights_decoder = os.path.join(args.model_path,
                                            'decoder' + args.suffix)
    
        assert os.path.exists(args.weights_encoder) and \
            os.path.exists(args.weights_encoder), 'checkpoint does not exitst!'
    
        if not os.path.isdir(args.result):
            os.makedirs(args.result)
    
        main(args)
    
    opened by visonpon 7
  • ADE20k 150 Challenge instance segmentation

    ADE20k 150 Challenge instance segmentation

    Hi,

    Is it possible to get the original image (RGB masks) for the gray scale masks in the challenge, like a mapping of file names? I want to use them to create bounding boxes.

    Thank you for your time.

    opened by darleybarreto 6
  • Training stucks with hrnet: problems while loading data?

    Training stucks with hrnet: problems while loading data?

    First of all, thank you for this very good repository!! I already launched successfully a training over a binary custom dataset and also got very good results in evaluation and visualization. I used the model with hrnet+c1. Now I am trying to train with another custom dataset, with 34 classes including "undefined" (which has been labeled as 0 to be ignored): everything starts as usual, but it seems to block while creating the iterator over the training dataset. Here is the output of my program until it stops and remains stuck:

    Training started on sáb abr  4 16:37:42 CEST 2020
    [2020-04-04 16:37:43,581 INFO train.py line 243 28431] Loaded configuration file ./config/customdataset-hrnetv2-c1.yaml
    [2020-04-04 16:37:43,582 INFO train.py line 244 28431] Running with config:
    DATASET:
      imgMaxSize: 4000
      imgSizes: (254, 267, 300, 350, 363, 372, 396, 400, 410, 420, 421, 425, 426, 429, 436, 440, 441, 456, 466, 467, 480, 496, 498, 500, 506, 525, 531, 538, 549, 559, 600, 605, 639, 640, 654, 662, 664, 680, 702, 714, 720, 750, 751, 768, 800, 808, 843, 860, 873, 900, 938, 954, 957, 960, 1000, 1015, 1024, 1025, 1080, 1087, 1102, 1118, 1200, 1283, 1333, 1390, 1600, 1789, 2000, 2247, 2332, 2400, 3000, 3079, 3264)
      list_train: data/customdataset/training.odgt
      list_val: data/customdataset/validation.odgt
      num_class: 34
      padding_constant: 32
      random_flip: True
      root_dataset: data/customdataset/
      segm_downsampling_rate: 4
    DIR: ckpt/customdataset-hrnetv2-c1
    MODEL:
      arch_decoder: c1
      arch_encoder: hrnetv2
      fc_dim: 720
      weights_decoder: 
      weights_encoder: 
    TEST:
      batch_size: 1
      checkpoint: epoch_4.pth
      result: ./result/customdataset/exp01
    TRAIN:
      batch_size_per_gpu: 2
      beta1: 0.9
      deep_sup_scale: 0.4
      disp_iter: 1
      epoch_iters: 5000
      fix_bn: False
      lr_decoder: 0.02
      lr_encoder: 0.02
      lr_pow: 0.9
      num_epoch: 4
      optim: SGD
      seed: 304
      start_epoch: 0
      weight_decay: 0.0001
      workers: 16
    VAL:
      batch_size: 1
      checkpoint: epoch_4.pth
      visualize: True
    [2020-04-04 16:37:43,582 INFO train.py line 249 28431] Outputing checkpoints to: ckpt/customdataset-hrnetv2-c1
    # samples: 80
    1 Epoch = 5000 iters
    

    The pretrained model I am using is ade20k-hrnetv2-c1 and the same was for the previous experiments. I put some trivial print() in train.py, after every step that follows the last information actually printed. It seems that there are problems in creating the iterator of the training data:

    [178] print('1 Epoch = {} iters'.format(cfg.TRAIN.epoch_iters)) # this appears
    [179]
    [180] # create loader iterator
    [181] iterator_train = iter(loader_train)
    [182] print('Iterator train created') # this does not appear
    

    The GPU memory usage seems to confirm this guess: I am monitoring with nvidia-smi 2 Nvidia GPUs, Pascal Quadro P6000 + Titan RTX, both with 24GB memory (I understand that maybe it is not correct to use different architectures?). In the previous trainings, everything worked with both memories occupied more or less for the 75%, equally distributed as I expected. Now, instead, after an initial increase in memory usage, it stucks whith very unbalanced and low memory usage.

    What am I doing wrong? Any help is appreciated.

    opened by Fiordarancio 6
  • ImportError: cannot import name '_set_worker_signal_handlers'

    ImportError: cannot import name '_set_worker_signal_handlers'

    Hi, I found that the pytorch requires python>=3.6, so I installed pytorch0.4 with python3.6. When I do ./demo_test.sh, it gave me the error: File "test.py", line 13, in from dataset import TestDataset File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/dataset.py", line 4, in import lib.utils.data as torchdata File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/lib/utils/data/init.py", line 3, in from .dataloader import DataLoader File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/lib/utils/data/dataloader.py", line 3, in from torch._C import _set_worker_signal_handlers, _update_worker_pids,
    ImportError: cannot import name '_set_worker_signal_handlers' is there any solution?Thanks!

    invalid 
    opened by dongdonghy 6
  • Can't use cuda.

    Can't use cuda.

    Hi, I use conda install -c prigoyal pytorch=0.4.0 to install pytorch and run ./demo_test.sh. Everything goes well until line 105 of test.py:

    segmentation_module.cuda()
    

    The output is:

     ./demo_test.sh: line 30: 27040 Segmentation fault      (core dumped) python3 -u test.py --model_path $MODEL_PATH --test_img $TEST_IMG --arch_encoder resnet50_dilated8 --arch_decoder ppm_bilinear_deepsup --fc_dim 2048 --result $RESULT_PATH
    

    I guess it's the problem of pytorch, and test the code:

    import torch
    x = torch.Tensor(3,3)
    x.cuda()
    

    Similar error : Segmentation fault (core dumped).

    Any suggestions?

    BTW, my other projects of pytorch 0.2.0 and 0.3.0 on this machine work well.

    Thanks.

    bug 
    opened by zhiweifang 6
  • Could I use the pretrained model of ADE20K dataset for commercial purpose

    Could I use the pretrained model of ADE20K dataset for commercial purpose

    Hi, pretrained model ADE20K data is very helpful for one of my commercial project. Could I use the pretrained model for commercial purpose. I read the license term from https://groups.csail.mit.edu/vision/datasets/ADE20K/terms/ and it says that the dataset can't be used for commercial purpose. But is there any restriction to use pretrained model .

    opened by robi56 0
  • Trained with custom dataset model results

    Trained with custom dataset model results

    I prepared custom dataset that 7000 images has only floor class and added to ade20k dataset. Now i have dataset with 28000 images. I changed floor annotations color with #040404 in my dataset, because floor class had that color in ADE20K dataset. So my floor annotation looks like this; 0101 jpg___fuse (#040404 color is very close to black. If you look carefully you will see the floor annotation)

    and original images; 0101

    It seems everything normal my dataset. Then i changed start_epoch and num_epoch values in config yaml file, num_epoch: 23 start_epoch: 20 epoch_iters: 5000 . Training process done with successfully and i have encoder and decoder model that names are encoder_epoch_23.pth, decoder_epoch_23.pth. Everything is seems normal here as well

    I got results using theses models but result was not as expected. I got this result when i download model from here decoder_epoch_20.pth f89bce9bcb394a8a8aa785dfb847bf4a

    And i got this result when using i trained model;

    03046ef6e5ac431e8e33002854eebb05

    Results seem to be getting worse. What could i be doing wrong?

    opened by Muratoter 0
  • unable to save the model in torchscript format

    unable to save the model in torchscript format

    Hi team,

    Firstly, thanks for your repository. I would like to save the model in torchscript module format rather than the traditional way. when I try to save the model using

    torch.jit.script(model)

    I'm getting the below error,

    ---------------------------------------------------------------------------
    NotSupportedError                         Traceback (most recent call last)
    /tmp/ipykernel_127656/979748029.py in <module>
          7 enabled_precisions = {torch.float, torch.half} # Run with fp16
          8 
    ----> 9 trt_ts_module = torch_tensorrt.compile(model, inputs=inputs, enabled_precisions=enabled_precisions)
         10 
         11 input_data = input_data.to('cuda').half()
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch_tensorrt/_compile.py in compile(module, ir, inputs, enabled_precisions, **kwargs)
        112                 "Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript"
        113             )
    --> 114             ts_mod = torch.jit.script(module)
        115         return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
        116     elif target_ir == _IRType.fx:
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_script.py in script(obj, optimize, _frames_up, _rcb, example_inputs)
       1264         obj = call_prepare_scriptable_func(obj)
       1265         return torch.jit._recursive.create_script_module(
    -> 1266             obj, torch.jit._recursive.infer_methods_to_compile
       1267         )
       1268 
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in create_script_module(nn_module, stubs_fn, share_types, is_tracing)
        452     if not is_tracing:
        453         AttributeTypeIsSupportedChecker().check(nn_module)
    --> 454     return create_script_module_impl(nn_module, concrete_type, stubs_fn)
        455 
        456 def create_script_module_impl(nn_module, concrete_type, stubs_fn):
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in create_script_module_impl(nn_module, concrete_type, stubs_fn)
        464     """
        465     cpp_module = torch._C._create_module_with_type(concrete_type.jit_type)
    --> 466     method_stubs = stubs_fn(nn_module)
        467     property_stubs = get_property_stubs(nn_module)
        468     hook_stubs, pre_hook_stubs = get_hook_stubs(nn_module)
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in infer_methods_to_compile(nn_module)
        733     stubs = []
        734     for method in uniqued_methods:
    --> 735         stubs.append(make_stub_from_method(nn_module, method))
        736     return overload_stubs + stubs
        737 
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in make_stub_from_method(nn_module, method_name)
         64     # In this case, the actual function object will have the name `_forward`,
         65     # even though we requested a stub for `forward`.
    ---> 66     return make_stub(func, method_name)
         67 
         68 
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in make_stub(func, name)
         49 def make_stub(func, name):
         50     rcb = _jit_internal.createResolutionCallbackFromClosure(func)
    ---> 51     ast = get_jit_def(func, name, self_name="RecursiveScriptModule")
         52     return ScriptMethodStub(rcb, ast, func)
         53 
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/frontend.py in get_jit_def(fn, def_name, self_name, is_classmethod)
        262         pdt_arg_types = type_trace_db.get_args_types(qualname)
        263 
    --> 264     return build_def(parsed_def.ctx, fn_def, type_line, def_name, self_name=self_name, pdt_arg_types=pdt_arg_types)
        265 
        266 # TODO: more robust handling of recognizing ignore context manager
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/frontend.py in build_def(ctx, py_def, type_line, def_name, self_name, pdt_arg_types)
        300                        py_def.col_offset + len("def"))
        301 
    --> 302     param_list = build_param_list(ctx, py_def.args, self_name, pdt_arg_types)
        303     return_type = None
        304     if getattr(py_def, 'returns', None) is not None:
    
    ~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/frontend.py in build_param_list(ctx, py_args, self_name, pdt_arg_types)
        335             if arg is not None:
        336                 ctx_range = build_expr(ctx, arg).range()
    --> 337                 raise NotSupportedError(ctx_range, _vararg_kwarg_err)
        338 
        339     # List of Tuple of args and type as inferred by profile directed typing
    
    NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults:
      File "/home/iamalien/Desktop/my_files/semantic_segmentation_example/semantic-segmentation-pytorch/sage_example/code/mit_semseg/models/models.py", line 29
        def forward(self, feed_dict, *, segSize=None):
                                                ~~~~ <--- HERE
            # training
            if segSize is None:
    
    

    @hangzhaomit @Tete-Xiao @davidbau @devinaconley @eugenelawrence @MarcoForte @zhoubolei @yagi-3 @arjo129 @jeremyfix

    could you please help here to save the model in torchscript format?

    opened by IamExperimenting 2
  • Using TochMetrics for code simplification

    Using TochMetrics for code simplification

    :hammer_and_wrench: Proposed Refactor

    We have been developing TorchMetrics to be a general-purpose metric and some domain-specific use-cases. In many cases, we have an exact mapping to scikit-learn with verification/testing to the reference metric for its correctness. The TM includes functional as well as nn.Module versions and for the most/standard metrics, the only dependency is PyTorch (for the domain-specific metrics you need to install related extras).

    Suggest a potential alternative/fix

    WIth using TM metrics you may rely on the widely tested correctness (testing against gold standards in multiple OS environments and all PyTorch versions above v1.4), and later you can use nn.Module to leverage update and compute. Moreover, the recent release v0.8 allows re-using/sharing common compute for similar as you leverage confusion matrix.

    Overall if you are fine with it, we are happy to draft a PR with a suggested change to verify in place the impact. If you have any questions, happy to follow up with me or @aniketmaurya

    What I have quickly checked, all this mit_semseg.utils can be simplified with TM

    opened by Borda 0
  • Indoor Split

    Indoor Split

    Since the ADE 20K dataset contains both indoor and outdoor scenes, is is possible to train the ADE20K on only indoor scenes ? Is there any way to filter out only indoor scenes or is there any existing split out there that covers only indoor scenes ?

    opened by debasmitdas 0
Owner
MIT CSAIL Computer Vision
MIT CSAIL Computer Vision
Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Semantic Segmentation on MIT ADE20K dataset in PyTorch This is a PyTorch impleme

Berat Eren Terzioğlu 4 Mar 22, 2022
Development kit for MIT Scene Parsing Benchmark

Development Kit for MIT Scene Parsing Benchmark [NEW!] Our PyTorch implementation is released in the following repository: https://github.com/hangzhao

MIT CSAIL Computer Vision 424 Dec 1, 2022
A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video

null 45 Nov 29, 2022
Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Fast MST Algorithm Implementation of fast algorithms for (Maximum Spanning Tree) MST parsing that includes fast ArcMax+Reweighting+Tarjan algorithm fo

Miloš Stanojević 11 Oct 14, 2022
Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

SPLASH: Semantic Parsing with Language Assistance from Humans SPLASH is dataset for the task of semantic parse correction with natural language feedba

Microsoft Research - Language and Information Technologies (MSR LIT) 35 Oct 31, 2022
LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation by Junjue Wang, Zhuo Zheng, Ailong Ma, Xiaoyan Lu, and Yanfei Zh

Kingdrone 174 Dec 22, 2022
Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Make-A-Scene - PyTorch Pytorch implementation (inofficial) of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors (https://arxiv.org/

Casual GAN Papers 259 Dec 28, 2022
Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation) Download Synthia dataset The model uses

null 32 Sep 21, 2022
Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Semantic-NeRF: Semantic Neural Radiance Fields Project Page | Video | Paper | Data In-Place Scene Labelling and Understanding with Implicit Scene Repr

Shuaifeng Zhi 243 Jan 7, 2023
Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

DeepPanoContext (DPC) [Project Page (with interactive results)][Paper] DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context G

Cheng Zhang 66 Nov 16, 2022
A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

README.md shall be finished soon. WSSGG 0 Overview 1 Installation 1.1 Faster-RCNN 1.2 Language Parser 1.3 GloVe Embeddings 2 Settings 2.1 VG-GT-Graph

Keren Ye 35 Nov 20, 2022
Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

EGFNet Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing Dataset and Results Test maps: 百度网盘 提取码:zust Citation @ARTICLE{ author={Zhou,

ShaohuaDong 10 Dec 8, 2022
SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab

CORNELLSASLAB SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab Instructions: This python code can be used to convert SAS out

null 2 Jan 26, 2022
This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

WTW-Dataset This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on ICCV 2021. Here, you can download the

null 109 Dec 29, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Kakao Brain 72 Dec 28, 2022
Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts The rapid progress in 3D scene understanding has come with growing dem

Facebook Research 182 Dec 30, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
Neural Scene Graphs for Dynamic Scene (CVPR 2021)

Implementation of Neural Scene Graphs, that optimizes multiple radiance fields to represent different objects and a static scene background. Learned representations can be rendered with novel object compositions and views.

null 151 Dec 26, 2022