Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

MIT CSAIL Computer Vision

Last update: Jan 8, 2023

Related tags

Overview

Semantic Segmentation on MIT ADE20K dataset in PyTorch

This is a PyTorch implementation of semantic segmentation models on MIT ADE20K scene parsing dataset (http://sceneparsing.csail.mit.edu/).

ADE20K is the largest open source dataset for semantic segmentation and scene parsing, released by MIT Computer Vision team. Follow the link below to find the repository for our dataset and implementations on Caffe and Torch7: https://github.com/CSAILVision/sceneparsing

If you simply want to play with our demo, please try this link: http://scenesegmentation.csail.mit.edu You can upload your own photo and parse it!

You can also use this colab notebook playground here to tinker with the code for segmenting an image.

All pretrained models can be found at: http://sceneparsing.csail.mit.edu/model/pytorch

[From left to right: Test Image, Ground Truth, Predicted Result]

Color encoding of semantic categories can be found here: https://docs.google.com/spreadsheets/d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit?usp=sharing

Updates

HRNet model is now supported.
We use configuration files to store most options which were in argument parser. The definitions of options are detailed in config/defaults.py.
We conform to Pytorch practice in data preprocessing (RGB [0, 1], substract mean, divide std).

Highlights

Syncronized Batch Normalization on PyTorch

This module computes the mean and standard-deviation across all devices during training. We empirically find that a reasonable large batch size is important for segmentation. We thank Jiayuan Mao for his kind contributions, please refer to Synchronized-BatchNorm-PyTorch for details.

The implementation is easy to use as:

It is pure-python, no C++ extra extension libs.
It is completely compatible with PyTorch's implementation. Specifically, it uses unbiased variance to update the moving average, and use sqrt(max(var, eps)) instead of sqrt(var + eps).
It is efficient, only 20% to 30% slower than UnsyncBN.

Dynamic scales of input for training with multiple GPUs

For the task of semantic segmentation, it is good to keep aspect ratio of images during training. So we re-implement the DataParallel module, and make it support distributing data to multiple GPUs in python dict, so that each gpu can process images of different sizes. At the same time, the dataloader also operates differently.

^{Now the batch size of a dataloader always equals to the number of GPUs, each element will be sent to a GPU. It is also compatible with multi-processing. Note that the file index for the multi-processing dataloader is stored on the master process, which is in contradict to our goal that each worker maintains its own file list. So we use a trick that although the master process still gives dataloader an index for __getitem__ function, we just ignore such request and send a random batch dict. Also, the multiple workers forked by the dataloader all have the same seed, you will find that multiple workers will yield exactly the same data, if we use the above-mentioned trick directly. Therefore, we add one line of code which sets the defaut seed for numpy.random before activating multiple worker in dataloader.}

State-of-the-Art models

PSPNet is scene parsing network that aggregates global representation with Pyramid Pooling Module (PPM). It is the winner model of ILSVRC'16 MIT Scene Parsing Challenge. Please refer to https://arxiv.org/abs/1612.01105 for details.
UPerNet is a model based on Feature Pyramid Network (FPN) and Pyramid Pooling Module (PPM). It doesn't need dilated convolution, an operator that is time-and-memory consuming. Without bells and whistles, it is comparable or even better compared with PSPNet, while requiring much shorter training time and less GPU memory. Please refer to https://arxiv.org/abs/1807.10221 for details.
HRNet is a recently proposed model that retains high resolution representations throughout the model, without the traditional bottleneck design. It achieves the SOTA performance on a series of pixel labeling tasks. Please refer to https://arxiv.org/abs/1904.04514 for details.

Supported models

We split our models into encoder and decoder, where encoders are usually modified directly from classification networks, and decoders consist of final convolutions and upsampling. We have provided some pre-configured models in the config folder.

Encoder:

MobileNetV2dilated
ResNet18/ResNet18dilated
ResNet50/ResNet50dilated
ResNet101/ResNet101dilated
HRNetV2 (W48)

Decoder:

C1 (one convolution module)
C1_deepsup (C1 + deep supervision trick)
PPM (Pyramid Pooling Module, see PSPNet paper for details.)
PPM_deepsup (PPM + deep supervision trick)
UPerNet (Pyramid Pooling + FPN head, see UperNet for details.)

Performance:

IMPORTANT: The base ResNet in our repository is a customized (different from the one in torchvision). The base models will be automatically downloaded when needed.

Architecture	MultiScale Testing	Mean IoU	Pixel Accuracy(%)	Overall Score	Inference Speed(fps)
MobileNetV2dilated + C1_deepsup	No	34.84	75.75	54.07	17.2
MobileNetV2dilated + C1_deepsup	Yes	33.84	76.80	55.32	10.3
MobileNetV2dilated + PPM_deepsup	No	35.76	77.77	56.27	14.9
MobileNetV2dilated + PPM_deepsup	Yes	36.28	78.26	57.27	6.7
ResNet18dilated + C1_deepsup	No	33.82	76.05	54.94	13.9
ResNet18dilated + C1_deepsup	Yes	35.34	77.41	56.38	5.8
ResNet18dilated + PPM_deepsup	No	38.00	78.64	58.32	11.7
ResNet18dilated + PPM_deepsup	Yes	38.81	79.29	59.05	4.2
ResNet50dilated + PPM_deepsup	No	41.26	79.73	60.50	8.3
ResNet50dilated + PPM_deepsup	Yes	42.14	80.13	61.14	2.6
ResNet101dilated + PPM_deepsup	No	42.19	80.59	61.39	6.8
ResNet101dilated + PPM_deepsup	Yes	42.53	80.91	61.72	2.0
UperNet50	No	40.44	79.80	60.12	8.4
UperNet50	Yes	41.55	80.23	60.89	2.9
UperNet101	No	42.00	80.79	61.40	7.8
UperNet101	Yes	42.66	81.01	61.84	2.3
HRNetV2	No	42.03	80.77	61.40	5.8
HRNetV2	Yes	43.20	81.47	62.34	1.9

The training is benchmarked on a server with 8 NVIDIA Pascal Titan Xp GPUs (12GB GPU memory), the inference speed is benchmarked a single NVIDIA Pascal Titan Xp GPU, without visualization.

Environment

The code is developed under the following configurations.

Hardware: >=4 GPUs for training, >=1 GPU for testing (set [--gpus GPUS] accordingly)
Software: Ubuntu 16.04.3 LTS, CUDA>=8.0, Python>=3.5, PyTorch>=0.4.0
Dependencies: numpy, scipy, opencv, yacs, tqdm

Quick start: Test on an image using our trained model

Here is a simple demo to do inference on a single image:

chmod +x demo_test.sh
./demo_test.sh

This script downloads a trained model (ResNet50dilated + PPM_deepsup) and a test image, runs the test script, and saves predicted segmentation (.png) to the working directory.

To test on an image or a folder of images ($PATH_IMG), you can simply do the following:

python3 -u test.py --imgs $PATH_IMG --gpu $GPU --cfg $CFG

Training

Download the ADE20K scene parsing dataset:

chmod +x download_ADE20K.sh
./download_ADE20K.sh

Train a model by selecting the GPUs ($GPUS) and configuration file ($CFG) to use. During training, checkpoints by default are saved in folder ckpt.

python3 train.py --gpus $GPUS --cfg $CFG

To choose which gpus to use, you can either do --gpus 0-7, or --gpus 0,2,4,6.

For example, you can start with our provided configurations:

Train MobileNetV2dilated + C1_deepsup

python3 train.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml

Train ResNet50dilated + PPM_deepsup

python3 train.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml

Train UPerNet101

python3 train.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml

You can also override options in commandline, for example python3 train.py TRAIN.num_epoch 10 .

Evaluation

Evaluate a trained model on the validation set. Add VAL.visualize True in argument to output visualizations as shown in teaser.

For example:

Evaluate MobileNetV2dilated + C1_deepsup

python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-mobilenetv2dilated-c1_deepsup.yaml

Evaluate ResNet50dilated + PPM_deepsup

python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet50dilated-ppm_deepsup.yaml

Evaluate UPerNet101

python3 eval_multipro.py --gpus GPUS --cfg config/ade20k-resnet101-upernet.yaml

Integration with other projects

This library can be installed via pip to easily integrate with another codebase

pip install git+https://github.com/CSAILVision/semantic-segmentation-pytorch.git@master

Now this library can easily be consumed programmatically. For example

from mit_semseg.config import cfg
from mit_semseg.dataset import TestDataset
from mit_semseg.models import ModelBuilder, SegmentationModule

Reference

If you find the code or pre-trained models useful, please cite the following papers:

Semantic Understanding of Scenes through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso and A. Torralba. International Journal on Computer Vision (IJCV), 2018. (https://arxiv.org/pdf/1608.05442.pdf)

@article{zhou2018semantic,
  title={Semantic understanding of scenes through the ade20k dataset},
  author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Xiao, Tete and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
  journal={International Journal on Computer Vision},
  year={2018}
}

Scene Parsing through ADE20K Dataset. B. Zhou, H. Zhao, X. Puig, S. Fidler, A. Barriuso and A. Torralba. Computer Vision and Pattern Recognition (CVPR), 2017. (http://people.csail.mit.edu/bzhou/publication/scene-parse-camera-ready.pdf)

@inproceedings{zhou2017scene,
    title={Scene Parsing through ADE20K Dataset},
    author={Zhou, Bolei and Zhao, Hang and Puig, Xavier and Fidler, Sanja and Barriuso, Adela and Torralba, Antonio},
    booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
    year={2017}
}

Comments

input and target size don't match for loss function

It looks like every combination but the default resnet50_dilated8/ppm_bilinear_deepsup leads to a mismatch in size between the input and the target of the loss function. I'm a bit mystified, I did not change any of the models. What I adapted was the number of labels (to 8 as one can see below).

Encoder: resnet50_dilated8. Decoder: upernet RuntimeError: input and target batch or spatial sizes don't match: target [1 x 85 x 106], input [1 x 8 x 170 x 212] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24 Encoder: Resnet101. Decoder: ppm_bilinear_deepsup return torch._C._nn.nll_loss2d(input, target, weight, size_average, ignore_index, reduce) RuntimeError: input and target batch or spatial sizes don't match: target [1 x 75 x 94], input [1 x 8 x 19 x 24] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24

Encoder: Resnet101. Decoder: Upernet RuntimeError: input and target batch or spatial sizes don't match: target [1 x 85 x 106], input [1 x 8 x 170 x 212] at /opt/conda/conda-bld/pytorch_1524582441669/work/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:24

In cases where the program runs the last two dimensions seem to be consistent torch.Size([1, 8, 75, 94]) torch.Size([1, 75, 94])

opened by heinzermch 14

Error during training for custom dataset

When trying to train the model by the command below, a RuntimeError occurred, it seems that some problems with the GPUs (four GPU).

command I run

the command I run:

python train.py --gpus 0,1,2,3 --cfg $cfg

Error:

[2019-10-06 08:56:13,423 INFO train.py line 246 3390] Outputing checkpoints to: ckpt/test-resnet50dilated-ppm_deepsup
# samples: 7296
1 Epoch = 5000 iters
Traceback (most recent call last):
  File "train.py", line 273, in <module>
    main(cfg, gpus)
  File "train.py", line 200, in main
    train(segmentation_module, iterator_train, optimizers, history, epoch+1, cfg)
  File "train.py", line 32, in train
    batch_data = next(iterator)
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 637, in __next__
    return self._process_next_batch(batch)
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 658, in _process_next_batch
    raise batch.exc_type(batch.exc_msg)
AssertionError: Traceback (most recent call last):
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/bruno/apps/intelpython3/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
    samples = collate_fn([dataset[i] for i in batch_indices])
  File "/home/bruno/xView2/semantic-segmentation-pytorch/dataset.py", line 162, in __getitem__
    assert(segm.mode == "L")
AssertionError

opened by bao18 9

dimension mismatch

I got this error

RuntimeError: Given groups=1, weight of size [512, 320, 1, 1], expected input[1, 2048, 1, 1] to have 320 channels, but got 2048 channels instead

I saw the other thread and I tried changing it to 512 but it still didn't work

opened by johnathanchiu 9
Class that corresponds to each color

Hello,

@hangzhaomit thanks for the Pytorch implementation on ADE20K dataset! I wonder if there is a way to know which classes correspond to each color in the .png image we get as a result. Would you be so kind to provide me with any information about it?

Thank you in advance

opened by nefelipk 8

Jobs stop after random iterations without any error information.

I have implemented my method based on the code base.

I try to train my model from scratch and find that the jobs always stop outputting any log information after a random number of iterations, and it is no more than one epoch over all the data.

For example, the job ended without updating the log information and without releasing the GPU memory.

Epoch: [1][4517/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019349, lr_decoder: 0.019349, Accuracy: 62.36, Loss: 2.131847
Epoch: [1][4518/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131737
Epoch: [1][4519/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131553
Epoch: [1][4520/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131408
Epoch: [1][4521/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131315
Epoch: [1][4522/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131346
Epoch: [1][4523/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019348, lr_decoder: 0.019348, Accuracy: 62.37, Loss: 2.131252
Epoch: [1][4524/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131484
Epoch: [1][4525/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131410
Epoch: [1][4526/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131326
Epoch: [1][4527/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131221
Epoch: [1][4528/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131181
Epoch: [1][4529/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.37, Loss: 2.131092
Epoch: [1][4530/5000], Time: 2.27, Data: 0.01, lr_encoder: 0.019347, lr_decoder: 0.019347, Accuracy: 62.38, Loss: 2.130914

opened by PkuRainBow 8

it will execute multiple times when import from test.py

In my commamd line, the 'test.py' works well .

I want use this file in other python file, So I add a function to 'test.py', it is:

    def run(self, imgs=None, model="config/ade20k-resnet50dilated-ppm_deepsup.yaml", gpu=0):
        if imgs == None:
            return False
        cfg.merge_from_file(model)

        cfg.MODEL.arch_encoder = cfg.MODEL.arch_encoder.lower()
        cfg.MODEL.arch_decoder = cfg.MODEL.arch_decoder.lower()

        # absolute paths of model weights
        cfg.MODEL.weights_encoder = os.path.join(
            cfg.DIR, 'encoder' + cfg.TEST.suffix)
        cfg.MODEL.weights_decoder = os.path.join(
            cfg.DIR, 'decoder' + cfg.TEST.suffix)

        if not os.path.exists(cfg.MODEL.weights_encoder):
            return  (False, "checkpoint does not exitst!")

        # generate testing image list
        if os.path.isdir(imgs[0]):
            imgs = find_recursive(imgs[0])
        else:
            imgs = [imgs]
        assert len(imgs), "imgs should be a path to image (.jpg) or directory."
        cfg.list_test = [{'fpath_img': x} for x in imgs]
        if not os.path.isdir(cfg.TEST.result):
            os.makedirs(cfg.TEST.result)
        
        main(cfg, gpu)

However, In another file, when I use this function like this:

import run from test
run("test.jpg")

It outputs error:

Loading weights for net_encoder
Loading weights for net_decoder
# samples: 1
Loading weights for net_encoder
Loading weights for net_encoder
Loading weights for net_encoder
Loading weights for net_encoder
Loading weights for net_encoder
Loading weights for net_decoder
Loading weights for net_decoder
Loading weights for net_decoder
# samples: 1
Loading weights for net_decoder
Loading weights for net_decoder
# samples: 1
# samples: 1
# samples: 1
......

then my command line is crash.

However, to use this function in the file test.py works well. Only in other file will it failed.

This function is as same as the code in the if __name__ == '__main__':, so where is wrong?

Thanks.

opened by acdzh 7

test script for process a list of images

Hi @hangzhaomit , i have write a script (based on your test .py) to read a list of images for batch convert. Hope it helps~

import os
import datetime
import argparse
from distutils.version import LooseVersion
# Numerical libs
import numpy as np
import torch
import torch.nn as nn
from scipy.io import loadmat
# Our libs
from dataset import TestDataset
from models import ModelBuilder, SegmentationModule
from utils import colorEncode
from lib.nn import user_scattered_collate, async_copy_to
from lib.utils import as_numpy, mark_volatile
import lib.utils.data as torchdata
import cv2

def visualize_result(data, preds, args):
    colors = loadmat('data/color150.mat')['colors']
    (img, info) = data
    pred_color = colorEncode(preds, colors)

    im_vis = np.concatenate((img, pred_color),
                            axis=1).astype(np.uint8)

    img_name = info.split('/')[-1]
    cv2.imwrite(os.path.join(args.result,
                img_name.replace('.jpg', '.png')), im_vis)

def test(segmentation_module, loader, args):
    segmentation_module.eval()

    for i, batch_data in enumerate(loader):
        # process data
        batch_data = batch_data[0]
        segSize = (batch_data['img_ori'].shape[0],
                   batch_data['img_ori'].shape[1])

        img_resized_list = batch_data['img_data']

        with torch.no_grad():
            pred = torch.zeros(1, args.num_class, segSize[0], segSize[1])

            for img in img_resized_list:
                feed_dict = batch_data.copy()
                feed_dict['img_data'] = img
                del feed_dict['img_ori']
                del feed_dict['info']
                feed_dict = async_copy_to(feed_dict, args.gpu_id)

                # forward pass
                pred_tmp = segmentation_module(feed_dict, segSize=segSize)
                pred = pred + pred_tmp.cpu() / len(args.imgSize)

            _, preds = torch.max(pred, dim=1)
            preds = as_numpy(preds.squeeze(0))

        visualize_result(
            (batch_data['img_ori'], batch_data['info']),
            preds, args)

        print('[{}] iter {}'
              .format(datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S"), i))


def main(args):
    torch.cuda.set_device(args.gpu_id)
    builder = ModelBuilder()
    net_encoder = builder.build_encoder(
        arch=args.arch_encoder,
        fc_dim=args.fc_dim,
        weights=args.weights_encoder)
    net_decoder = builder.build_decoder(
        arch=args.arch_decoder,
        fc_dim=args.fc_dim,
        num_class=args.num_class,
        weights=args.weights_decoder,
        use_softmax=True)

    crit = nn.NLLLoss(ignore_index=-1)

    segmentation_module = SegmentationModule(net_encoder, net_decoder, crit)

 ################# modified #################
   with open(args.test_list, 'r') as f:
         files = f.read().splitlines()
         for file_ in files:
             test_img =[{'fpath_img':file_}]
             dataset_val = TestDataset(test_img, args, max_sample=args.num_val)
             loader_val = torchdata.DataLoader(
                dataset_val,
                batch_size=args.batch_size,
               shuffle=False,
               collate_fn=user_scattered_collate,
               num_workers=5,
              drop_last=True)

         segmentation_module.cuda()
         test(segmentation_module, loader_val, args)

         print('Inference done!')
 ################# modified #################

if __name__ == '__main__':
    assert LooseVersion(torch.__version__) >= LooseVersion('0.4.0'), \
        'PyTorch>=0.4.0 is required'

    parser = argparse.ArgumentParser()
    # Path related arguments
    parser.add_argument('--test_img', required=True)
    parser.add_argument('--model_path', required=True,
                        help='folder to model path')
    parser.add_argument('--suffix', default='_epoch_20.pth',
                        help="which snapshot to load")

    # Model related arguments
    parser.add_argument('--arch_encoder', default='resnet50_dilated8',
                        help="architecture of net_encoder")
    parser.add_argument('--arch_decoder', default='ppm_bilinear_deepsup',
                        help="architecture of net_decoder")
    parser.add_argument('--fc_dim', default=2048, type=int,
                        help='number of features between encoder and decoder')

    # Data related arguments
    parser.add_argument('--num_val', default=-1, type=int,
                        help='number of images to evalutate')
    parser.add_argument('--num_class', default=150, type=int,
                        help='number of classes')
    parser.add_argument('--batch_size', default=1, type=int,
                        help='batchsize. current only supports 1')
    parser.add_argument('--imgSize', default=[300, 400, 500, 600],
                        nargs='+', type=int,
                        help='list of input image sizes.'
                             'for multiscale testing, e.g. 300 400 500')
    parser.add_argument('--imgMaxSize', default=1000, type=int,
                        help='maximum input image size of long edge')
    parser.add_argument('--padding_constant', default=8, type=int,
                        help='maxmimum downsampling rate of the network')
    parser.add_argument('--segm_downsampling_rate', default=8, type=int,
                        help='downsampling rate of the segmentation label')

   
    parser.add_argument('--result', default='.',
                        help='folder to output visualization results')
    parser.add_argument('--gpu_id', default=0, type=int,
                        help='gpu_id for evaluation')

    args = parser.parse_args()
    print(args)

   
    args.weights_encoder = os.path.join(args.model_path,
                                        'encoder' + args.suffix)
    args.weights_decoder = os.path.join(args.model_path,
                                        'decoder' + args.suffix)

    assert os.path.exists(args.weights_encoder) and \
        os.path.exists(args.weights_encoder), 'checkpoint does not exitst!'

    if not os.path.isdir(args.result):
        os.makedirs(args.result)

    main(args)

opened by visonpon 7

ADE20k 150 Challenge instance segmentation

Hi,

Is it possible to get the original image (RGB masks) for the gray scale masks in the challenge, like a mapping of file names? I want to use them to create bounding boxes.

Thank you for your time.

opened by darleybarreto 6

Training stucks with hrnet: problems while loading data?

First of all, thank you for this very good repository!! I already launched successfully a training over a binary custom dataset and also got very good results in evaluation and visualization. I used the model with hrnet+c1. Now I am trying to train with another custom dataset, with 34 classes including "undefined" (which has been labeled as 0 to be ignored): everything starts as usual, but it seems to block while creating the iterator over the training dataset. Here is the output of my program until it stops and remains stuck:

Training started on sáb abr  4 16:37:42 CEST 2020
[2020-04-04 16:37:43,581 INFO train.py line 243 28431] Loaded configuration file ./config/customdataset-hrnetv2-c1.yaml
[2020-04-04 16:37:43,582 INFO train.py line 244 28431] Running with config:
DATASET:
  imgMaxSize: 4000
  imgSizes: (254, 267, 300, 350, 363, 372, 396, 400, 410, 420, 421, 425, 426, 429, 436, 440, 441, 456, 466, 467, 480, 496, 498, 500, 506, 525, 531, 538, 549, 559, 600, 605, 639, 640, 654, 662, 664, 680, 702, 714, 720, 750, 751, 768, 800, 808, 843, 860, 873, 900, 938, 954, 957, 960, 1000, 1015, 1024, 1025, 1080, 1087, 1102, 1118, 1200, 1283, 1333, 1390, 1600, 1789, 2000, 2247, 2332, 2400, 3000, 3079, 3264)
  list_train: data/customdataset/training.odgt
  list_val: data/customdataset/validation.odgt
  num_class: 34
  padding_constant: 32
  random_flip: True
  root_dataset: data/customdataset/
  segm_downsampling_rate: 4
DIR: ckpt/customdataset-hrnetv2-c1
MODEL:
  arch_decoder: c1
  arch_encoder: hrnetv2
  fc_dim: 720
  weights_decoder: 
  weights_encoder: 
TEST:
  batch_size: 1
  checkpoint: epoch_4.pth
  result: ./result/customdataset/exp01
TRAIN:
  batch_size_per_gpu: 2
  beta1: 0.9
  deep_sup_scale: 0.4
  disp_iter: 1
  epoch_iters: 5000
  fix_bn: False
  lr_decoder: 0.02
  lr_encoder: 0.02
  lr_pow: 0.9
  num_epoch: 4
  optim: SGD
  seed: 304
  start_epoch: 0
  weight_decay: 0.0001
  workers: 16
VAL:
  batch_size: 1
  checkpoint: epoch_4.pth
  visualize: True
[2020-04-04 16:37:43,582 INFO train.py line 249 28431] Outputing checkpoints to: ckpt/customdataset-hrnetv2-c1
# samples: 80
1 Epoch = 5000 iters

The pretrained model I am using is ade20k-hrnetv2-c1 and the same was for the previous experiments. I put some trivial print() in train.py, after every step that follows the last information actually printed. It seems that there are problems in creating the iterator of the training data:

[178] print('1 Epoch = {} iters'.format(cfg.TRAIN.epoch_iters)) # this appears
[179]
[180] # create loader iterator
[181] iterator_train = iter(loader_train)
[182] print('Iterator train created') # this does not appear

The GPU memory usage seems to confirm this guess: I am monitoring with nvidia-smi 2 Nvidia GPUs, Pascal Quadro P6000 + Titan RTX, both with 24GB memory (I understand that maybe it is not correct to use different architectures?). In the previous trainings, everything worked with both memories occupied more or less for the 75%, equally distributed as I expected. Now, instead, after an initial increase in memory usage, it stucks whith very unbalanced and low memory usage.

What am I doing wrong? Any help is appreciated.

opened by Fiordarancio 6

ImportError: cannot import name '_set_worker_signal_handlers'

Hi, I found that the pytorch requires python>=3.6, so I installed pytorch0.4 with python3.6. When I do ./demo_test.sh, it gave me the error: File "test.py", line 13, in from dataset import TestDataset File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/dataset.py", line 4, in import lib.utils.data as torchdata File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/lib/utils/data/init.py", line 3, in from .dataloader import DataLoader File "/home/aizz/Documents/kaggle/semantic-segmentation-pytorch/lib/utils/data/dataloader.py", line 3, in from torch._C import _set_worker_signal_handlers, _update_worker_pids,
ImportError: cannot import name '_set_worker_signal_handlers' is there any solution?Thanks!
invalid

opened by dongdonghy 6
Can't use cuda.
Hi, I use conda install -c prigoyal pytorch=0.4.0 to install pytorch and run ./demo_test.sh. Everything goes well until line 105 of test.py:

segmentation_module.cuda()

The output is:

./demo_test.sh: line 30: 27040 Segmentation fault (core dumped) python3 -u test.py --model_path $MODEL_PATH --test_img $TEST_IMG --arch_encoder resnet50_dilated8 --arch_decoder ppm_bilinear_deepsup --fc_dim 2048 --result $RESULT_PATH

I guess it's the problem of pytorch, and test the code:

import torch x = torch.Tensor(3,3) x.cuda()

Similar error : Segmentation fault (core dumped).

Any suggestions?

BTW, my other projects of pytorch 0.2.0 and 0.3.0 on this machine work well.

Thanks.
bug
opened by zhiweifang 6
Could I use the pretrained model of ADE20K dataset for commercial purpose

Hi, pretrained model ADE20K data is very helpful for one of my commercial project. Could I use the pretrained model for commercial purpose. I read the license term from https://groups.csail.mit.edu/vision/datasets/ADE20K/terms/ and it says that the dataset can't be used for commercial purpose. But is there any restriction to use pretrained model .

opened by robi56 0
Trained with custom dataset model results

I prepared custom dataset that 7000 images has only floor class and added to ade20k dataset. Now i have dataset with 28000 images. I changed floor annotations color with #040404 in my dataset, because floor class had that color in ADE20K dataset. So my floor annotation looks like this; (#040404 color is very close to black. If you look carefully you will see the floor annotation)

and original images;

It seems everything normal my dataset. Then i changed start_epoch and num_epoch values in config yaml file, num_epoch: 23 start_epoch: 20 epoch_iters: 5000 . Training process done with successfully and i have encoder and decoder model that names are encoder_epoch_23.pth, decoder_epoch_23.pth. Everything is seems normal here as well

I got results using theses models but result was not as expected. I got this result when i download model from here decoder_epoch_20.pth

And i got this result when using i trained model;

Results seem to be getting worse. What could i be doing wrong?

opened by Muratoter 0

unable to save the model in torchscript format

Hi team,

Firstly, thanks for your repository. I would like to save the model in torchscript module format rather than the traditional way. when I try to save the model using

torch.jit.script(model)

I'm getting the below error,

---------------------------------------------------------------------------
NotSupportedError                         Traceback (most recent call last)
/tmp/ipykernel_127656/979748029.py in <module>
      7 enabled_precisions = {torch.float, torch.half} # Run with fp16
      8 
----> 9 trt_ts_module = torch_tensorrt.compile(model, inputs=inputs, enabled_precisions=enabled_precisions)
     10 
     11 input_data = input_data.to('cuda').half()

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch_tensorrt/_compile.py in compile(module, ir, inputs, enabled_precisions, **kwargs)
    112                 "Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript"
    113             )
--> 114             ts_mod = torch.jit.script(module)
    115         return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
    116     elif target_ir == _IRType.fx:

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_script.py in script(obj, optimize, _frames_up, _rcb, example_inputs)
   1264         obj = call_prepare_scriptable_func(obj)
   1265         return torch.jit._recursive.create_script_module(
-> 1266             obj, torch.jit._recursive.infer_methods_to_compile
   1267         )
   1268 

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in create_script_module(nn_module, stubs_fn, share_types, is_tracing)
    452     if not is_tracing:
    453         AttributeTypeIsSupportedChecker().check(nn_module)
--> 454     return create_script_module_impl(nn_module, concrete_type, stubs_fn)
    455 
    456 def create_script_module_impl(nn_module, concrete_type, stubs_fn):

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in create_script_module_impl(nn_module, concrete_type, stubs_fn)
    464     """
    465     cpp_module = torch._C._create_module_with_type(concrete_type.jit_type)
--> 466     method_stubs = stubs_fn(nn_module)
    467     property_stubs = get_property_stubs(nn_module)
    468     hook_stubs, pre_hook_stubs = get_hook_stubs(nn_module)

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in infer_methods_to_compile(nn_module)
    733     stubs = []
    734     for method in uniqued_methods:
--> 735         stubs.append(make_stub_from_method(nn_module, method))
    736     return overload_stubs + stubs
    737 

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in make_stub_from_method(nn_module, method_name)
     64     # In this case, the actual function object will have the name `_forward`,
     65     # even though we requested a stub for `forward`.
---> 66     return make_stub(func, method_name)
     67 
     68 

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/_recursive.py in make_stub(func, name)
     49 def make_stub(func, name):
     50     rcb = _jit_internal.createResolutionCallbackFromClosure(func)
---> 51     ast = get_jit_def(func, name, self_name="RecursiveScriptModule")
     52     return ScriptMethodStub(rcb, ast, func)
     53 

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/frontend.py in get_jit_def(fn, def_name, self_name, is_classmethod)
    262         pdt_arg_types = type_trace_db.get_args_types(qualname)
    263 
--> 264     return build_def(parsed_def.ctx, fn_def, type_line, def_name, self_name=self_name, pdt_arg_types=pdt_arg_types)
    265 
    266 # TODO: more robust handling of recognizing ignore context manager

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/frontend.py in build_def(ctx, py_def, type_line, def_name, self_name, pdt_arg_types)
    300                        py_def.col_offset + len("def"))
    301 
--> 302     param_list = build_param_list(ctx, py_def.args, self_name, pdt_arg_types)
    303     return_type = None
    304     if getattr(py_def, 'returns', None) is not None:

~/miniconda3/envs/tensorrt/lib/python3.7/site-packages/torch/jit/frontend.py in build_param_list(ctx, py_args, self_name, pdt_arg_types)
    335             if arg is not None:
    336                 ctx_range = build_expr(ctx, arg).range()
--> 337                 raise NotSupportedError(ctx_range, _vararg_kwarg_err)
    338 
    339     # List of Tuple of args and type as inferred by profile directed typing

NotSupportedError: Compiled functions can't take variable number of arguments or use keyword-only arguments with defaults:
  File "/home/iamalien/Desktop/my_files/semantic_segmentation_example/semantic-segmentation-pytorch/sage_example/code/mit_semseg/models/models.py", line 29
    def forward(self, feed_dict, *, segSize=None):
                                            ~~~~ <--- HERE
        # training
        if segSize is None:

@hangzhaomit @Tete-Xiao @davidbau @devinaconley @eugenelawrence @MarcoForte @zhoubolei @yagi-3 @arjo129 @jeremyfix

could you please help here to save the model in torchscript format?

opened by IamExperimenting 2

Using TochMetrics for code simplification

:hammer_and_wrench: Proposed Refactor

We have been developing TorchMetrics to be a general-purpose metric and some domain-specific use-cases. In many cases, we have an exact mapping to scikit-learn with verification/testing to the reference metric for its correctness. The TM includes functional as well as nn.Module versions and for the most/standard metrics, the only dependency is PyTorch (for the domain-specific metrics you need to install related extras).

Suggest a potential alternative/fix

WIth using TM metrics you may rely on the widely tested correctness (testing against gold standards in multiple OS environments and all PyTorch versions above v1.4), and later you can use nn.Module to leverage update and compute. Moreover, the recent release v0.8 allows re-using/sharing common compute for similar as you leverage confusion matrix.

Overall if you are fine with it, we are happy to draft a PR with a suggested change to verify in place the impact. If you have any questions, happy to follow up with me or @aniketmaurya

What I have quickly checked, all this mit_semseg.utils can be simplified with TM

opened by Borda 0
Indoor Split

Since the ADE 20K dataset contains both indoor and outdoor scenes, is is possible to train the ADE20K on only indoor scenes ? Is there any way to filter out only indoor scenes or is there any existing split out there that covers only indoor scenes ?

opened by debasmitdas 0

Pytorch implementation for Semantic Segmentation/Scene Parsing on MIT ADE20K dataset

Related tags

Overview

Semantic Segmentation on MIT ADE20K dataset in PyTorch

Updates

Highlights

Syncronized Batch Normalization on PyTorch

Dynamic scales of input for training with multiple GPUs

State-of-the-Art models

Supported models

Performance:

Environment

Quick start: Test on an image using our trained model

Training

Evaluation

Integration with other projects

Reference

Comments

:hammer_and_wrench: Proposed Refactor

Suggest a potential alternative/fix

Owner

MIT CSAIL Computer Vision

Sematic-Segmantation - Semantic Segmentation on MIT ADE20K dataset in PyTorch

Development kit for MIT Scene Parsing Benchmark

A pytorch implementation of the CVPR2021 paper "VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild"

Implementation of fast algorithms for Maximum Spanning Tree (MST) parsing that includes fast ArcMax+Reweighting+Tarjan algorithm for single-root dependency parsing.

Release of SPLASH: Dataset for semantic parse correction with natural language feedback in the context of text-to-SQL parsing

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

Pytorch implementation of Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors

Recall Loss for Semantic Segmentation (This repo implements the paper: Recall Loss for Semantic Segmentation)

Build upon neural radiance fields to create a scene-specific implicit 3D semantic representation, Semantic-NeRF

Official PyTorch code of DeepPanoContext: Panoramic 3D Scene Understanding with Holistic Scene Context Graph and Relation-based Optimization (ICCV 2021 Oral).

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Edge-aware Guidance Fusion Network for RGB-Thermal Scene Parsing

SAS output to EXCEL converter for Cornell/MIT Language and acquisition lab

This is an official implementation for the WTW Dataset in "Parsing Table Structures in the Wild " on table detection and table structure recognition.

TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

An official PyTorch Implementation of Boundary-aware Self-supervised Learning for Video Scene Segmentation (BaSSL)

Code for CVPR 2021 oral paper "Exploring Data-Efficient 3D Scene Understanding with Contrastive Scene Contexts"

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Neural Scene Graphs for Dynamic Scene (CVPR 2021)