Video-based open-world segmentation

Overview

UVO_Challenge

Team Alpes_runner Solutions

This is an official repo for our UVO Challenge solutions for Image/Video-based open-world segmentation. Our team "Alpes_runner" achieved the best performance on both Image/Video-based benchmarks. More details about the workshop can be found here.

Technical Reports

Models

Detection

Model Pretrained datasets Finetuned datasets links
UVO_Detector COCO - config/weights
UVO_Detector COCO UVO config/weights

Segmentation

Model Pretrained datasets Finetuned datasets links
UVO_Segementor COCO - weights
UVO_Segmentor COCO, PASCAL, OpenImage - config/weights
UVO_Segmentor COCO, PASCAL, OpenImage UVO config/weights

Citation

If you find this project useful in your research, please consider cite:

@article{du20211st,
  title={1st Place Solution for the UVO Challenge on Image-based Open-World Segmentation 2021},
  author={Du, Yuming and Guo, Wen and Xiao, Yang and Lepetit, Vincent},
  journal={arXiv preprint arXiv:2110.10239},
  year={2021}
}

@article{du20211st,
  title={1st Place Solution for the UVO Challenge on Video-based Open-World Segmentation 2021},
  author={Du, Yuming and Guo, Wen and Xiao, Yang and Lepetit, Vincent},
  journal={arXiv preprint arXiv:2110.11661},
  year={2021}
}

Contact

Feel free to contact me or open a new issue if you have any questions.

Comments
  • Init segmentor config file problem

    Init segmentor config file problem

    Hi! Thanks for the code!

    I tried to use your config_file in segmentation dir, but got an error. It seems in config.py there is no "type" key. I'm not familiar with openmmlab. Could you help me figure it out? Thanks a lot!

    Script:

    config_file = "./segmentation/configs/swin/swin_l_upper_w_jitter_inference.py"
    ckpt_file = "../../models/seg_swin_l_uvo_finetuned.pth"
    model = init_segmentor(config_file, ckpt_file, device="cuda:0")
    

    Log:

    Traceback (most recent call last):
      File "infer.py", line 20, in <module>
        model = init_segmentor(config_file, ckpt_file, device="cuda:0")
      File "/home/xin/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmseg/apis/inference.py", line 32, in init_segmentor
        model = build_segmentor(config.model, test_cfg=config.get('test_cfg'))
      File "/home/xin/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmseg/models/builder.py", line 49, in build_segmentor
        cfg, default_args=dict(train_cfg=train_cfg, test_cfg=test_cfg))
      File "/home/xin/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 212, in build
        return self.build_func(*args, **kwargs, registry=self)
      File "/home/xin/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/cnn/builder.py", line 27, in build_model_from_cfg
        return build_from_cfg(cfg, registry, default_args)
      File "/home/xin/anaconda3/envs/openmmlab/lib/python3.7/site-packages/mmcv/utils/registry.py", line 25, in build_from_cfg
        '`cfg` or `default_args` must contain the key "type", '
    KeyError: '`cfg` or `default_args` must contain the key "type", but got {\'pretrained\': None, \'backbone\': {\'pretrain_img_size\': 384, \'embed_dims\': 192, \'depths\': [2, 2, 18, 2], \'num_heads\': [6, 12, 24, 48], \'drop_path_rate\': 0.2, \'window_size\': 12}, \'decode_head\': {\'in_channels\': [192, 384, 768, 1536], \'num_classes\': 2, \'loss_decode\': {\'type\': \'CrossEntropyLoss\', \'use_sigmoid\': False, \'loss_weight\': 1.0}}, \'auxiliary_head\': {\'in_channels\': 768, \'num_classes\': 2, \'loss_decode\': {\'type\': \'CrossEntropyLoss\', \'use_sigmoid\': False, \'loss_weight\': 1.0}}, \'train_cfg\': None}\n{\'train_cfg\': None, \'test_cfg\': None}'
    
    opened by kxhit 13
  • wgts links failed

    wgts links failed

    Hi! Sorry to bother! The links to download weights failed. Can you offer model weights ? I'm seeking for object detection model. Thank you! BR! George.

    opened by lluo-Desktop 10
  • Question regrading the UVO Dataset

    Question regrading the UVO Dataset

    Firstly, Thank You for your amazing work.

    There’s a question I would like to ask regarding the UVO Dataset. I would like to work on the dense dataset. Through the download link in the homepage of the challenge, I could access ‘UVO_video_train_dense.json’, ‘UVO_video_val_dense.json’ and ‘UVO_video_test_dense.json’ that specify the video ids of the train, validation and test datasets. However, I’m unsure how I could obtain the videos. Could you please guide me on how I could obtain the original and annotated videos ?

    Many thanks

    opened by YScheung 4
  • About the mask annotation used 'box2seg.py'  and  '../_base_/datasets/uvo_finetune.py'

    About the mask annotation used 'box2seg.py' and '../_base_/datasets/uvo_finetune.py'

    Hi, @dulucas , In these two config files, many "mask annotations" are used, like in this:

    oid_train= dict( type='RepeatDataset', times=1, dataset=dict( type=dataset_type, data_root='data/oid/', img_dir = 'images/', ann_dir = 'masks/', split = ['train_clean_v2.txt', ], pipeline=train_pipeline ) ) and : uvo_dense_val = dict( type='RepeatDataset', times=1, dataset=dict( type=dataset_type, data_root='data/uvo/', img_dir = 'images/dense_val/', ann_dir = 'masks/dense_val/', split = ['dense_val_list.txt', ], pipeline=train_pipeline ) )

    And I tried the code like this(https://github.com/alicranck/coco2voc)(url) to generate the mask pngs, but there maybe something wrong with the generated masks, the training losses are unusual. Could you please provide the code or scripts that you used to generate the mask? or giving a link you are referred?

    opened by Cecilia-xue 2
  • Which data split was the final result coming from

    Which data split was the final result coming from

    Hello! Many thanks for this nice GitHub repo. I am wondering on which data split did you evaluate your model. Whether it was UVO sparse test or UVO dense test? In the paper I saw the statement of Challenge final results on UVO-Sparse test dataset. But the testing script in Github code loaded test annotation from the dense split. I guess dense split's test set was what the challenge targeted? image

    opened by zitongzhan 4
  • checkpoint seem broken.

    checkpoint seem broken.

    this checkpoint(seg_swin_l_mixed_pretrained.pth) seem broken.

    import torch
    torch.load('seg_swin_l_mixed_pretrained.pth')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/chen/miniconda3/envs/sota/lib/python3.9/site-packages/torch/serialization.py", line 600, in load
        with _open_zipfile_reader(opened_file) as opened_zipfile:
      File "/home/chen/miniconda3/envs/sota/lib/python3.9/site-packages/torch/serialization.py", line 242, in __init__
        super(_open_zipfile_reader, self).__init__(torch._C.PyTorchFileReader(name_or_buffer))
    RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory
    opened by blackmagicianZ 2
  • Loop call get_targets?

    Loop call get_targets?

    First of all thank the author for sharing such an excellent work! When browsing the code, I found that there is a loop call in the get_targets function, is this not a problem? image

    opened by zhaoxin111 28
Owner
Yuming Du
PhD Computer Vision
Yuming Du
the code for paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration"

EOW-Softmax This code is for the paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration". Accepted by ICCV21. Usage Commnd exa

Yezhen Wang 36 Dec 2, 2022
A platform for intelligent agent learning based on a 3D open-world FPS game developed by Inspir.AI.

Wilderness Scavenger: 3D Open-World FPS Game AI Challenge This is a platform for intelligent agent learning based on a 3D open-world FPS game develope

null 46 Nov 24, 2022
Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

null 1 Jan 23, 2022
HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

Code for HDR Video Reconstruction HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021) Guanying Chen, Cha

Guanying Chen 64 Nov 19, 2022
TorchDistiller - a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

This project is a collection of the open source pytorch code for knowledge distillation, especially for the perception tasks, including semantic segmentation, depth estimation, object detection and instance segmentation.

yifan liu 147 Dec 3, 2022
PyTorch CZSL framework containing GQA, the open-world setting, and the CGE and CompCos methods.

Compositional Zero-Shot Learning This is the official PyTorch code of the CVPR 2021 works Learning Graph Embeddings for Compositional Zero-shot Learni

EML Tübingen 70 Dec 27, 2022
Learning Open-World Object Proposals without Learning to Classify

Learning Open-World Object Proposals without Learning to Classify Pytorch implementation for "Learning Open-World Object Proposals without Learning to

Dahun Kim 149 Dec 22, 2022
Pytorch implementation for "Large-Scale Long-Tailed Recognition in an Open World" (CVPR 2019 ORAL)

Large-Scale Long-Tailed Recognition in an Open World [Project] [Paper] [Blog] Overview Open Long-Tailed Recognition (OLTR) is the author's re-implemen

Zhongqi Miao 761 Dec 26, 2022
[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach

This repository holds the implementation for paper Towards Open-World Feature Extrapolation: An Inductive Graph Learning Approach Download our preproc

Qitian Wu 42 Dec 27, 2022
RE-OWOD - Revisiting open world object detection

Revisting Open World Object Detection Installation See INSTALL.md. Dataset Our n

null 7 Jan 5, 2022
[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

OW-DETR: Open-world Detection Transformer (CVPR 2022) [Paper] Akshita Gupta*, Sanath Narayan*, K J Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Sh

Akshita Gupta 127 Dec 27, 2022
[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Reference-based Video Super-Resolution (RefVSR) Official PyTorch Implementation of the CVPR 2022 Paper Project | arXiv | RealMCVSR Dataset This repo c

Junyong Lee 151 Dec 30, 2022
Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation

SSWS-loss_function_based_on_MS-TCN Supervised Sliding Window Smoothing Loss Function Based on MS-TCN for Video Segmentation Supervised Sliding Window

null 3 Aug 3, 2022
[CVPR 2021] Exemplar-Based Open-Set Panoptic Segmentation Network (EOPSN)

EOPSN: Exemplar-Based Open-Set Panoptic Segmentation Network (CVPR 2021) PyTorch implementation for EOPSN. We propose open-set panoptic segmentation t

Jaedong Hwang 49 Dec 30, 2022
A Planar RGB-D SLAM which utilizes Manhattan World structure to provide optimal camera pose trajectory while also providing a sparse reconstruction containing points, lines and planes, and a dense surfel-based reconstruction.

ManhattanSLAM Authors: Raza Yunus, Yanyan Li and Federico Tombari ManhattanSLAM is a real-time SLAM library for RGB-D cameras that computes the camera

null 117 Dec 28, 2022
A Real-World Benchmark for Reinforcement Learning based Recommender System

RL4RS: A Real-World Benchmark for Reinforcement Learning based Recommender System RL4RS is a real-world deep reinforcement learning recommender system

null 121 Dec 1, 2022
PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集,包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。 人机交互 主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

null 185 Dec 26, 2022
[NeurIPS 2020] Blind Video Temporal Consistency via Deep Video Prior

pytorch-deep-video-prior (DVP) Official PyTorch implementation for NeurIPS 2020 paper: Blind Video Temporal Consistency via Deep Video Prior TensorFlo

Yazhou XING 90 Oct 19, 2022