Codes of paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling"

Unseen Object Amodal Instance Segmentation (UOAIS)

Seunghyeok Back, Joosoon Lee, Taewon Kim, Sangjun Noh, Raeyoung Kang, Seongho Bak, Kyoobin Lee

This repository contains source codes for the paper "Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling."

[Paper] [Project Website] [Video]

Updates & TODO Lists

  • (2021.09.26) UOAIS-Net has been released
  • Add train and evaluation code
  • Release synthetic dataset (UOAIS-Sim) and amodal annotation (OSD-Amodal)
  • Add ROS inference node

Getting Started

Environment Setup

Tested on Titan RTX with python 3.7, pytorch 1.8.0, torchvision 0.9.0, CUDA 10.2.

  1. Download
git clone
cd uoais
mkdir output

Download the checkpoint at GDrive and move the downloaded folders to the output folder

  1. Set up a python environment
conda create -n uoais python=3.7
conda activate uoais
pip install torch torchvision 
pip install shapely torchfile opencv-python pyfastnoisesimd rapidfuzz
  1. Install detectron2
  2. Build and install custom AdelaiDet
python build develop 

Run with Sample Data


python tools/


This repository is released under the MIT license.


If you use our work in a research project, please cite our work:

      title={Unseen Object Amodal Instance Segmentation via Hierarchical Occlusion Modeling}, 
      author={Seunghyeok Back and Joosoon Lee and Taewon Kim and Sangjun Noh and Raeyoung Kang and Seongho Bak and Kyoobin Lee},
  • rgb and depth features fusion

    rgb and depth features fusion

    I am taking inspiration from your project to extend Detectron2 defaults for my project. The feature I want to add is just to create a new backbone that has 2 parallel Resnet, one resnet extracts RGB feature maps, the other resnet extracts Depth feature maps. At the end their features are fused before being feeded to the common FPN. This is exactly the same as you do. Then instead of your architecture I use a standard maskRcnn. The problem is that using your convolutional fusion the network is not able to learn well and I get a very strange result in the end as you can see in the picture.

    Can you explain me how can I fuse the tensors representing rgb and depth feature maps using your other method (sum) instead of a Convolution layer? I understand that I should change in the config file the MODEL.FUSE_TYPE to "add" but I am not sure in how should I modify also this part. Should I just eliminate the references of the fusion layers in the class and do the summation in the forward pass or what? Thanks for your time

    opened by andreaceruti 9
  • Passing RGB and Depth image to network

    Passing RGB and Depth image to network

    Hi, I want to combine 2 backbones as you do, one rgb backbone and one depth backbone, and at the end I want to fuse their features (in my case before feeding the fused features to RPN and ROI stages of the classical MaskRCNN). The problem is that I actually can't understand how, through the dataset mapper, I can pass the firsts 3 channels to the first rgb backbone and the lasts 3 channels to the depth backbone. I can see that you concatenate the channels, but then I miss the moment when this numpy array will be divided and tensors will be passed to corresponding backbone. Can you point me to the code implementation where this happens?

    opened by andreaceruti 5
  • module and key errors

    module and key errors

    running any of the sample code in the uoais dir e.g. python_ tools/ --use-cgnet --dataset-path ./sample_data --config-file configs/R50_depth_mlc_occatmask_hom_concat.yaml

    ModuleNotFoundError: No module named 'utils' after copying file to the tools folder -> ModuleNotFoundError: No module named 'foreground_segmentation' after copying that folder to to the tools folder ->

    File "tools/", line 59, in cfg.merge_from_file(args.config_file) ... KeyError: 'Non-existent config key: MODEL.ROI_VISIBLE_MASK_HEAD'

    Most of the flags you used in .xml file do not seem to exist... How could I solve it?

    opened by maxiuw 2
  • Camera Model

    Camera Model

    First, thanks for your contribution, this paper and method look great.

    I was just curious what camera you used for this project and what method you use to align depth the depth images with RGB?

    opened by tteresi7 2
  • Labelling the dataset

    Labelling the dataset

    Hi. I'm amazed with your work and I want to apply it on my own datasets. I want to know what tool you used to for labelling to get "visible_mask": RLE, # visible mask "visible_bbox": [x,y,width,height], # bounding box of visible mask "occluded_mask": RLE # occluded mask "occluded_rate": float # ratio between occluded mask and amodal mask all this things

    opened by jayes97 1
  • I think

    I think "occlude_rate" is a mistake for "occluded_rate".


    What is "occlude_rate" ? I think "occlude_rate" is a mistake for "occluded_rate".

    adet/data/amodal_datasets/pycocotools/ line:98-99


    opened by siva-shiba 1
  • FloatingPointError: Loss became infinite or NaN at iteration=0!

    FloatingPointError: Loss became infinite or NaN at iteration=0!

    The loss_occ_cls of the first iteration is 0

    [11/15 02:06:37 adet.trainer]: Starting training from iteration 0 Traceback (most recent call last): File "", line 303, in args=(args,), File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/", line 82, in launch main_func(*args) File "", line 286, in main return trainer.train() File "", line 83, in train self.train_loop(self.start_iter, self.max_iter) File "", line 73, in train_loop self.run_step() File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/", line 494, in run_step self._trainer.run_step() File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/", line 287, in run_step self._write_metrics(loss_dict, data_time) File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/", line 302, in _write_metrics SimpleTrainer.write_metrics(loss_dict, data_time, prefix) File "/root/anaconda3/envs/uoais/lib/python3.7/site-packages/detectron2/engine/", line 339, in write_metrics f"Loss became infinite or NaN at iteration={storage.iter}!\n" FloatingPointError: Loss became infinite or NaN at iteration=0! loss_dict = {'loss_cls': 157.09115600585938, 'loss_box_reg': 5.162332534790039, 'loss_visible_mask': 3.1945271492004395, 'loss_amodal_mask': 2.944978952407837, 'loss_occ_cls': nan, 'loss_rpn_cls': 9.696294784545898, 'loss_rpn_loc': 12.890896797180176}

    opened by niushou 0
  • make my own dataset

    make my own dataset

    Thank you for your work and code! I wonder how to use my own data to make uoais format dataset,and which tool should be used to label the images.Thank you very much.

    opened by trugle 0
  • ORCNNROIHeads raises error if input

    ORCNNROIHeads raises error if input "proposes" is empty. (when batch size changed to 1)

    Issue When I changed the batch_size from 2 to 1 for reducing memory usage, an error occurred in ORCNNROIHeads class at adet/modeling/rcnn/ line 508 gt_occludeds = cat(gt_occludeds, dim=0).to(torch.int64). I suspect that the error is caused by no ground truth proposals.

    NotImplementedError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors, or that you (the operator writer) forgot to register a fallback function. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Python, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradLazy, AutogradXPU, AutogradMLC, AutogradHPU, AutogradNestedTensor, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, AutocastCPU, Autocast, Batched, VmapMode, Functionalize].

    CPU: registered at aten/src/ATen/RegisterCPU.cpp:21063 [kernel] CUDA: registered at aten/src/ATen/RegisterCUDA.cpp:29726 [kernel] QuantizedCPU: registered at aten/src/ATen/RegisterQuantizedCPU.cpp:1258 [kernel] BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback] Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:47 [backend fallback] Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback] Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:18 [backend fallback] Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:18 [backend fallback] ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback] ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:64 [backend fallback] AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradLazy: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradMLC: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_3.cpp:11380 [autograd kernel] Tracer: registered at ../torch/csrc/autograd/generated/TraceType_3.cpp:11220 [kernel] AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:461 [backend fallback] Autocast: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:305 [backend fallback] Batched: registered at ../aten/src/ATen/BatchingRegistrations.cpp:1059 [backend fallback] VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback] Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:52 [backend fallback]

    Possible solution?: Change line 508-513 by adding an if statement:

    if gt_occludeds == []:
        losses["loss_occ_cls"] = torch.tensor(0)
        gt_occludeds = cat(gt_occludeds, dim=0).to(torch.int64)
        n_occ, n_gt = torch.sum(gt_occludeds), gt_occludeds.shape[0]
        n_noocc = n_gt - n_occ
        loss = F.cross_entropy(occ_cls_logits, gt_occludeds, reduction="mean", 
                            weight=torch.Tensor([1, n_noocc/n_occ]).to(device=gt_occludeds.device))
        losses["loss_occ_cls"] = loss

    So that losses["loss_occ_cls"] = torch.tensor(0) does not contribute to the gradient.

    opened by YueBro 0
Artificial Intelligence Lab, Institute of Integrated Technology, Gwangju Institute of Science and Technogloy (GIST)
