Real-Time High-Resolution Background Matting

Overview

Real-Time High-Resolution Background Matting

Teaser

Official repository for the paper Real-Time High-Resolution Background Matting. Our model requires capturing an additional background image and produces state-of-the-art matting results at 4K 30fps and HD 60fps on an Nvidia RTX 2080 TI GPU.

Disclaimer: The video conversion script in this repo is not meant be real-time. Our research's main contribution is the neural architecture for high resolution refinement and the new matting datasets. The inference_speed_test.py script allows you to measure the tensor throughput of our model, which should achieve real-time. The inference_video.py script allows you to test your video on our model, but the video encoding and decoding is done without hardware acceleration and parallization. For production use, you are expected to do additional engineering for hardware encoding/decoding and loading frames to GPU in parallel. For more architecture detail, please refer to our paper.

 

New Paper is Out!

Check out Robust Video Matting! Our new method does not require pre-captured backgrounds, and can inference at even faster speed!

 

Overview

 

Updates

  • [Jun 21 2021] Paper received CVPR 2021 Best Student Paper Honorable Mention.
  • [Apr 21 2021] VideoMatte240K dataset is now published.
  • [Mar 06 2021] Training script is published.
  • [Feb 28 2021] Paper is accepted to CVPR 2021.
  • [Jan 09 2021] PhotoMatte85 dataset is now published.
  • [Dec 21 2020] We updated our project to MIT License, which permits commercial use.

 

Download

Model / Weights

Video / Image Examples

Datasets

 

Demo

Scripts

We provide several scripts in this repo for you to experiment with our model. More detailed instructions are included in the files.

  • inference_images.py: Perform matting on a directory of images.
  • inference_video.py: Perform matting on a video.
  • inference_webcam.py: An interactive matting demo using your webcam.

Notebooks

Additionally, you can try our notebooks in Google Colab for performing matting on images and videos.

Virtual Camera

We provide a demo application that pipes webcam video through our model and outputs to a virtual camera. The script only works on Linux system and can be used in Zoom meetings. For more information, checkout:

 

Usage / Documentation

You can run our model using PyTorch, TorchScript, TensorFlow, and ONNX. For detail about using our model, please check out the Usage / Documentation page.

 

Training

Configure data_path.pth to point to your dataset. The original paper uses train_base.pth to train only the base model till convergence then use train_refine.pth to train the entire network end-to-end. More details are specified in the paper.

 

Project members

* Equal contribution.

 

License

This work is licensed under the MIT License. If you use our work in your project, we would love you to include an acknowledgement and fill out our survey.

Community Projects

Projects developed by third-party developers.

Comments
  •  视频效果不好求指教

    视频效果不好求指教

    小哥哥,你好,我又来了

    我先和你说一下我遇到的问题,我用自己拍出来的视频做实验,得到的效果,和,在官网上下载到的你们paper中做实验用的视频得到的效果,相差的太多了,我就是想知道原因,是我的参数设置的有问题,还是对于拍摄的视频有什么要求吗?还是对视频做了什么处理呀,我们做出来的视频真的是惨不忍睹啊

    我说一下我的实验环境啊,小哥哥,cuda 11.0 pytorch 1.7 3090的卡,多张,不过目前还是在一张卡上跑,然后附件中我会附上,我们同事拍的视频和背景照片,还有运行出来的结果。

    python /home/ubuntu/BM/BackgroundMattingV2/inference_video_only.py
    --model-type mattingrefine
    --model-backbone resnet50
    --model-backbone-scale 0.25
    --model-refine-mode sampling
    --model-refine-sample-pixels 80000
    --model-checkpoint "/home/ubuntu/BM/pytorch_resnet50.pth"
    --video-src "/home/ubuntu/BM/huangfupeng/720/720.mp4"
    --video-bgr "/home/ubuntu/BM/huangfupeng/img/新建文件夹/3_720.bmp"
    --output-dir "/home/ubuntu/BM/content/output/2021051910_720/"
    --output-type com

    image

    fig1. 这个是用res50模型,1080的像素(视频和背景照片)运行出来的实验结果

    opened by zhanghonglishanzai 20
  • Not working

    Not working

    So I followed this tutorial on youtube, https://www.youtube.com/watch?v=HlOUKj6WP-s&list=PLmo1GBItOimXfKR5t4D3f0doSflEgUo9j&index=3&t=474s and installed everything I needed to install, activated everything, made sure picture and video were of same size and named properly and I cannot get the program to green screen me out. I have an NVIDIA graphics card. I used a sample image and video from this website and it worked, but mine wont work. It green screens random sections of the background but not everything. It's not a complicated scene, and it is on a tripod. Just me walking away for a few seconds and turning around. it is a 4k video. I cannot upload the original as it is too big so I am converting it too a smaller size and uploading for you to look at. Help me please.

    https://user-images.githubusercontent.com/76640989/103165057-31404f80-47d0-11eb-9892-52d7993febda.mp4

    opened by cioccolata12345 14
  • how to run it realtime with 2080ti

    how to run it realtime with 2080ti

    i run it with torch version in 2080ti, but it runs very slowly, and the model speed of the onnx version is improved, but it cannot be real-time. What should I do to achieve the 4K and HD presentation effect?

    opened by luoww1992 12
  • Error loading model in libTorch

    Error loading model in libTorch

    I receive this error when trying to load the model:

    error loading the model : ■   open file failed, file path: Exception raised from FileAdapter at ..\..\caffe2\serialize\file_adapter.cc:11 (most recent call first): 00007FFE9633A7B200007FFE9633A750 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>] 00007FFE69BA5A3D00007FFE69BA56D0 torch_cpu.dll!caffe2::serialize::FileAdapter::FileAdapter [<unknown file> @ <unknown line number>] 00007FFE6AAB408C00007FFE6AAB4050 torch_cpu.dll!torch::jit::load [<unknown file> @ <unknown line number>]

    My Configuration:

    Win 10 libTorch 1.7.1 cuda 11.0 Visual Studio 2017

    The example compiles correct and creates the cuda device correctly. I'm trying to load the torchscript_resnet50_fp16.pth model.

    Any suggestions or ideas on how to solve this?

    opened by brinoausrino 11
  • How to get result image on C++

    How to get result image on C++

    Thanks for your greate contributions. I refered model_usage for C++ , but I don't know how to transform the results and show

    I also refered inference_webcam.py. I get inspiration from the code

    pha, fgr = model(src, bgr)[:2]
    res = pha * fgr + (1 - pha) * torch.ones_like(fgr)
    res = res.mul(255).byte().cpu().permute(0, 2, 3, 1).numpy()[0]
    res = cv2.cvtColor(res, cv2.COLOR_RGB2BGR)
    key = dsp.step(res)
    

    I need transform it to c++ but there still some questions.

        auto outputs = model.forward({src, bgr}).toTuple()->elements();
        auto pha = outputs[0].toTensor();
        auto fgr = outputs[1].toTensor();
        
       // the fllowing code is error, but I have no idea.
        auto res_tensor = (pha * fgr + (1-pha) * torch::ones_like(fgr)).mul(255).cpu();
        Mat res(res_tensor.size(2), res_tensor.size(3), CV_8UC3, (void*) res_tensor.data_ptr<uint8_t>());
        cvtColor(res, res, COLOR_RGB2BGR);
        imshow("matting", res);
    

    Would you please show me the code to study?Thanks.

    opened by MolianWH 11
  • Dataset release schedule

    Dataset release schedule

    Hi, thanks for your awesome work! I recently do some researches about human body segmentation. I wonder do you have a schedule to provide the dataset which I am really interested in?

    opened by AmberCheng 11
  • ZeroDivisionError: integer division or modulo by zero

    ZeroDivisionError: integer division or modulo by zero

    I have successfully converted a 440x440 video using colab. Now I'm trying with a HD video and received following error: !python inference_video.py
    --model-type mattingrefine
    --model-backbone resnet50
    --model-backbone-scale 0.25
    --model-refine-mode sampling
    --model-refine-sample-pixels 80000
    --model-checkpoint "/content/model.pth"
    --video-src "/content/balconay_test.mp4"
    --video-bgr "/content/balcony_bg.jpg"
    --output-dir "/content/output/"
    --output-type com fgr pha err ref

    0% 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "inference_video.py", line 178, in for src, bgr in tqdm(DataLoader(dataset, batch_size=1, pin_memory=True)): File "/usr/local/lib/python3.6/dist-packages/tqdm/std.py", line 1104, in iter for obj in iterable: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/BackgroundMattingV2/dataset/zip.py", line 17, in getitem x = tuple(d[idx % len(d)] for d in self.datasets) File "/content/BackgroundMattingV2/dataset/zip.py", line 17, in x = tuple(d[idx % len(d)] for d in self.datasets) ZeroDivisionError: integer division or modulo by zero 0% 0/1 [00:00<?, ?it/s]

    opened by bigboss97 10
  • 请问可以分布式计算吗?

    请问可以分布式计算吗?

    您好,我现在手里有8块3090的GPU卡,我修改了以下代码 model = model.to(device).eval() model.load_state_dict(torch.load(args.model_checkpoint, map_location=device), strict=False) 为 model = model.to(device).eval() BM = model.load_state_dict(torch.load(args.model_checkpoint, map_location=device), strict=False) BM = nn.DataParallel(BM)

    我不知道修改的对不对

    我在运行程序的时候监测GPU的使用状况,发现只有一张卡在使用,求指点

    opened by zhanghonglishanzai 8
  • Doesn't work on any of my videos

    Doesn't work on any of my videos

    When I use your src.mp4 i.e.

    !gdown https://drive.google.com/uc?id=1tCEk8FE3WGrr49cdL8qMCqHptMCAtHRU -O /content/src.mp4 -q
    

    It works great

    image

    However when I use one of my videos (h264 1080) it doesn't work at all

    This is the alpha:

    image

    From this input:

    image

    From running this command (notice I also get an error message but video still produced)

    image

    opened by ecsplendid 7
  • Explain Unfolding step

    Explain Unfolding step

    Had a few questions about what is going on here in the sampling stage in the Refiner:

    if self.patch_crop_method == 'unfold':
                   return x.permute(0, 2, 3, 1) \
                        .unfold(1, size + 2 * padding, size) \
                        .unfold(2, size + 2 * padding, size)[idx[0], idx[1], idx[2]]
    
    • https://github.com/PeterL1n/BackgroundMattingV2/blob/4a56223a1cd9b2c2678582513c573debbfc12cae/model/refiner.py#L205
    1. x is (bs,c,h,w). Why is it being permuted to (bs,h,w,c) before the unfold?

    2. Generally one unfold across the channel dimension should be able to extract the patches. Why are there two unfolds here?

    3. What is the logic behind size + 2 * padding?

    opened by bluesky314 6
  • Real-time background replacement in a web browser

    Real-time background replacement in a web browser

    Is it possible to use it in a browser for real-time video background replacement? Are there instructions? Something like: https://ai.googleblog.com/2020/10/background-features-in-google-meet.html

    opened by benbro 6
  • Details of the training process

    Details of the training process

    你好,我测试了BackgroundMattingV2,发现效果很好,感谢。 但是关于一些训练的细节,我还是不太理解(PS:主要是关于数据增强部分)

    1. 在train_base中,针对 dataset_train的Affine增强,为什么fgr-pha与bgr在scale参数上的随机范围不一样,或者说为什么分别设置成这个范围 image
    2. 在train_base中,通过ZipDataset将pha_fgr_dataset和bgr_dataset打包一个整体dataset(下图1),但是dataset好像是一种固定匹配的打包方式,固定匹配不是会造成合成的数据量远远小于len(fgr)*len(bgr)吗?(下图2),这样会不会有一些问题。 image image 再次感谢!
    opened by fenneishi 0
  • how to understand the

    how to understand the "err_map"

    I'm interested in the "err_map",however,i just find that "err_map" is a dimension of the "output" after the decoder, as:

    err_sm = x[:, 4:5].clamp_(0., 1.)

    could you please tell me more details about "err_map" ? (how to calculate)

    opened by Pros-yanghaozhe 0
Owner
Peter Lin
Peter Lin
U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

Dennis Bappert 104 Nov 25, 2022
MODNet: Trimap-Free Portrait Matting in Real Time

MODNet is a model for real-time portrait matting with only RGB image input.

Zhanghan Ke 2.8k Dec 30, 2022
Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

Computational Photography Lab @ SFU 1.1k Jan 2, 2023
Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

Rounak Das 1 Feb 15, 2022
Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend

Holy Wu 44 Dec 27, 2022
Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English ?? Real-CUGAN

tarsin 111 Dec 28, 2022
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

xindong zhang 236 Dec 26, 2022
Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

null 69 Oct 19, 2022
PyMatting: A Python Library for Alpha Matting

Given an input image and a hand-drawn trimap (top row), alpha matting estimates the alpha channel of a foreground object which can then be composed onto a different background (bottom row).

PyMatting 1.4k Dec 30, 2022
Github project for Attention-guided Temporal Coherent Video Object Matting.

Attention-guided Temporal Coherent Video Object Matting This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matti

null 71 Dec 19, 2022
[IJCAI'21] Deep Automatic Natural Image Matting

Deep Automatic Natural Image Matting [IJCAI-21] This is the official repository of the paper Deep Automatic Natural Image Matting. Introduction | Netw

Jizhizi_Li 316 Jan 6, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Peter Lin 6.5k Jan 4, 2023
Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

flow-dev 2 Aug 21, 2022
Video Matting Refinement For Python

Video-matting refinement Library (use pip to install) scikit-image numpy av matplotlib Run Static background python path_to_video.mp4 Moving backgroun

null 3 Jan 11, 2022
Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

Lightweight-Deep-CNN-for-Natural-Image-Matting-via-Similarity-Preserving-Knowledge-Distillation Introduction Accepted at IEEE Signal Processing Letter

DongGeun-Yoon 19 Jun 7, 2022
Rethinking Portrait Matting with Privacy Preserving

Rethinking Portrait Matting with Privacy Preserving This is the official repository of the paper Rethinking Portrait Matting with Privacy Preserving.

null 184 Jan 3, 2023
TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

凌逆战 16 Dec 30, 2022