Real-Time High-Resolution Background Matting

Peter Lin

Last update: Jan 3, 2023

Related tags

Overview

Real-Time High-Resolution Background Matting

Official repository for the paper Real-Time High-Resolution Background Matting. Our model requires capturing an additional background image and produces state-of-the-art matting results at 4K 30fps and HD 60fps on an Nvidia RTX 2080 TI GPU.

Disclaimer: The video conversion script in this repo is not meant be real-time. Our research's main contribution is the neural architecture for high resolution refinement and the new matting datasets. The inference_speed_test.py script allows you to measure the tensor throughput of our model, which should achieve real-time. The inference_video.py script allows you to test your video on our model, but the video encoding and decoding is done without hardware acceleration and parallization. For production use, you are expected to do additional engineering for hardware encoding/decoding and loading frames to GPU in parallel. For more architecture detail, please refer to our paper.

New Paper is Out!

Check out Robust Video Matting! Our new method does not require pre-captured backgrounds, and can inference at even faster speed!

Updates

[Jun 21 2021] Paper received CVPR 2021 Best Student Paper Honorable Mention.
[Apr 21 2021] VideoMatte240K dataset is now published.
[Mar 06 2021] Training script is published.
[Feb 28 2021] Paper is accepted to CVPR 2021.
[Jan 09 2021] PhotoMatte85 dataset is now published.
[Dec 21 2020] We updated our project to MIT License, which permits commercial use.

Download

Model / Weights

Download model / weights

Video / Image Examples

HD videos (by Sengupta et al.) (Our model is more robust on HD footage)
4K videos and images

Datasets

Download datasets

Demo

Scripts

We provide several scripts in this repo for you to experiment with our model. More detailed instructions are included in the files.

inference_images.py: Perform matting on a directory of images.
inference_video.py: Perform matting on a video.
inference_webcam.py: An interactive matting demo using your webcam.

Notebooks

Additionally, you can try our notebooks in Google Colab for performing matting on images and videos.

Virtual Camera

We provide a demo application that pipes webcam video through our model and outputs to a virtual camera. The script only works on Linux system and can be used in Zoom meetings. For more information, checkout:

Webcam plugin

Usage / Documentation

You can run our model using PyTorch, TorchScript, TensorFlow, and ONNX. For detail about using our model, please check out the Usage / Documentation page.

Training

Configure data_path.pth to point to your dataset. The original paper uses train_base.pth to train only the base model till convergence then use train_refine.pth to train the entire network end-to-end. More details are specified in the paper.

Project members

Shanchuan Lin*, University of Washington
Andrey Ryabtsev*, University of Washington
Soumyadip Sengupta, University of Washington
Brian Curless, University of Washington
Steve Seitz, University of Washington
Ira Kemelmacher-Shlizerman, University of Washington

^{* Equal contribution.}

License

This work is licensed under the MIT License. If you use our work in your project, we would love you to include an acknowledgement and fill out our survey.

Community Projects

Projects developed by third-party developers.

After Effects Plug-In

Comments

视频效果不好求指教

小哥哥，你好，我又来了

我先和你说一下我遇到的问题，我用自己拍出来的视频做实验，得到的效果，和，在官网上下载到的你们paper中做实验用的视频得到的效果，相差的太多了，我就是想知道原因，是我的参数设置的有问题，还是对于拍摄的视频有什么要求吗？还是对视频做了什么处理呀，我们做出来的视频真的是惨不忍睹啊

我说一下我的实验环境啊，小哥哥，cuda 11.0 pytorch 1.7 3090的卡，多张，不过目前还是在一张卡上跑，然后附件中我会附上，我们同事拍的视频和背景照片，还有运行出来的结果。

python /home/ubuntu/BM/BackgroundMattingV2/inference_video_only.py
--model-type mattingrefine
--model-backbone resnet50
--model-backbone-scale 0.25
--model-refine-mode sampling
--model-refine-sample-pixels 80000
--model-checkpoint "/home/ubuntu/BM/pytorch_resnet50.pth"
--video-src "/home/ubuntu/BM/huangfupeng/720/720.mp4"
--video-bgr "/home/ubuntu/BM/huangfupeng/img/新建文件夹/3_720.bmp"
--output-dir "/home/ubuntu/BM/content/output/2021051910_720/"
--output-type com

fig1. 这个是用res50模型，1080的像素(视频和背景照片)运行出来的实验结果

opened by zhanghonglishanzai 20
Not working

So I followed this tutorial on youtube, https://www.youtube.com/watch?v=HlOUKj6WP-s&list=PLmo1GBItOimXfKR5t4D3f0doSflEgUo9j&index=3&t=474s and installed everything I needed to install, activated everything, made sure picture and video were of same size and named properly and I cannot get the program to green screen me out. I have an NVIDIA graphics card. I used a sample image and video from this website and it worked, but mine wont work. It green screens random sections of the background but not everything. It's not a complicated scene, and it is on a tripod. Just me walking away for a few seconds and turning around. it is a 4k video. I cannot upload the original as it is too big so I am converting it too a smaller size and uploading for you to look at. Help me please.

https://user-images.githubusercontent.com/76640989/103165057-31404f80-47d0-11eb-9892-52d7993febda.mp4

opened by cioccolata12345 14
how to run it realtime with 2080ti

i run it with torch version in 2080ti, but it runs very slowly, and the model speed of the onnx version is improved, but it cannot be real-time. What should I do to achieve the 4K and HD presentation effect?

opened by luoww1992 12
Error loading model in libTorch

I receive this error when trying to load the model:

error loading the model : ■ open file failed, file path: Exception raised from FileAdapter at ..\..\caffe2\serialize\file_adapter.cc:11 (most recent call first): 00007FFE9633A7B200007FFE9633A750 c10.dll!c10::Error::Error [<unknown file> @ <unknown line number>] 00007FFE69BA5A3D00007FFE69BA56D0 torch_cpu.dll!caffe2::serialize::FileAdapter::FileAdapter [<unknown file> @ <unknown line number>] 00007FFE6AAB408C00007FFE6AAB4050 torch_cpu.dll!torch::jit::load [<unknown file> @ <unknown line number>]

My Configuration:

Win 10 libTorch 1.7.1 cuda 11.0 Visual Studio 2017

The example compiles correct and creates the cuda device correctly. I'm trying to load the torchscript_resnet50_fp16.pth model.

Any suggestions or ideas on how to solve this?

opened by brinoausrino 11

How to get result image on C++

Thanks for your greate contributions. I refered model_usage for C++ , but I don't know how to transform the results and show

I also refered inference_webcam.py. I get inspiration from the code

pha, fgr = model(src, bgr)[:2]
res = pha * fgr + (1 - pha) * torch.ones_like(fgr)
res = res.mul(255).byte().cpu().permute(0, 2, 3, 1).numpy()[0]
res = cv2.cvtColor(res, cv2.COLOR_RGB2BGR)
key = dsp.step(res)

I need transform it to c++ but there still some questions.

    auto outputs = model.forward({src, bgr}).toTuple()->elements();
    auto pha = outputs[0].toTensor();
    auto fgr = outputs[1].toTensor();
    
   // the fllowing code is error, but I have no idea.
    auto res_tensor = (pha * fgr + (1-pha) * torch::ones_like(fgr)).mul(255).cpu();
    Mat res(res_tensor.size(2), res_tensor.size(3), CV_8UC3, (void*) res_tensor.data_ptr<uint8_t>());
    cvtColor(res, res, COLOR_RGB2BGR);
    imshow("matting", res);

Would you please show me the code to study?Thanks.

opened by MolianWH 11

Dataset release schedule

Hi, thanks for your awesome work! I recently do some researches about human body segmentation. I wonder do you have a schedule to provide the dataset which I am really interested in?

opened by AmberCheng 11
ZeroDivisionError: integer division or modulo by zero

I have successfully converted a 440x440 video using colab. Now I'm trying with a HD video and received following error: !python inference_video.py
--model-type mattingrefine
--model-backbone resnet50
--model-backbone-scale 0.25
--model-refine-mode sampling
--model-refine-sample-pixels 80000
--model-checkpoint "/content/model.pth"
--video-src "/content/balconay_test.mp4"
--video-bgr "/content/balcony_bg.jpg"
--output-dir "/content/output/"
--output-type com fgr pha err ref

0% 0/1 [00:00<?, ?it/s]Traceback (most recent call last): File "inference_video.py", line 178, in for src, bgr in tqdm(DataLoader(dataset, batch_size=1, pin_memory=True)): File "/usr/local/lib/python3.6/dist-packages/tqdm/std.py", line 1104, in iter for obj in iterable: File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 435, in next data = self._next_data() File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/dataloader.py", line 475, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "/usr/local/lib/python3.6/dist-packages/torch/utils/data/_utils/fetch.py", line 44, in data = [self.dataset[idx] for idx in possibly_batched_index] File "/content/BackgroundMattingV2/dataset/zip.py", line 17, in getitem x = tuple(d[idx % len(d)] for d in self.datasets) File "/content/BackgroundMattingV2/dataset/zip.py", line 17, in x = tuple(d[idx % len(d)] for d in self.datasets) ZeroDivisionError: integer division or modulo by zero 0% 0/1 [00:00<?, ?it/s]

opened by bigboss97 10
请问可以分布式计算吗？

您好，我现在手里有8块3090的GPU卡，我修改了以下代码 model = model.to(device).eval() model.load_state_dict(torch.load(args.model_checkpoint, map_location=device), strict=False) 为 model = model.to(device).eval() BM = model.load_state_dict(torch.load(args.model_checkpoint, map_location=device), strict=False) BM = nn.DataParallel(BM)

我不知道修改的对不对

我在运行程序的时候监测GPU的使用状况，发现只有一张卡在使用，求指点

opened by zhanghonglishanzai 8
Doesn't work on any of my videos
When I use your src.mp4 i.e.

!gdown https://drive.google.com/uc?id=1tCEk8FE3WGrr49cdL8qMCqHptMCAtHRU -O /content/src.mp4 -q

It works great

However when I use one of my videos (h264 1080) it doesn't work at all

This is the alpha:

From this input:

From running this command (notice I also get an error message but video still produced)
opened by ecsplendid 7
Explain Unfolding step
Had a few questions about what is going on here in the sampling stage in the Refiner:

if self.patch_crop_method == 'unfold': return x.permute(0, 2, 3, 1) \ .unfold(1, size + 2 * padding, size) \ .unfold(2, size + 2 * padding, size)[idx[0], idx[1], idx[2]]

https://github.com/PeterL1n/BackgroundMattingV2/blob/4a56223a1cd9b2c2678582513c573debbfc12cae/model/refiner.py#L205

x is (bs,c,h,w). Why is it being permuted to (bs,h,w,c) before the unfold?

Generally one unfold across the channel dimension should be able to extract the patches. Why are there two unfolds here?

What is the logic behind size + 2 * padding?
opened by bluesky314 6
Real-time background replacement in a web browser

Is it possible to use it in a browser for real-time video background replacement? Are there instructions? Something like: https://ai.googleblog.com/2020/10/background-features-in-google-meet.html

opened by benbro 6
Details of the training process
你好，我测试了BackgroundMattingV2,发现效果很好，感谢。但是关于一些训练的细节，我还是不太理解（PS：主要是关于数据增强部分）

在train_base中，针对 dataset_train的Affine增强，为什么fgr-pha与bgr在scale参数上的随机范围不一样，或者说为什么分别设置成这个范围

在train_base中，通过ZipDataset将pha_fgr_dataset和bgr_dataset打包一个整体dataset(下图1)，但是dataset好像是一种固定匹配的打包方式，固定匹配不是会造成合成的数据量远远小于len(fgr)*len(bgr）吗？(下图2)，这样会不会有一些问题。再次感谢！
opened by fenneishi 0
how to understand the "err_map"

I'm interested in the "err_map"，however，i just find that "err_map" is a dimension of the "output" after the decoder, as:

err_sm = x[:, 4:5].clamp_(0., 1.)

could you please tell me more details about "err_map" ? (how to calculate)

opened by Pros-yanghaozhe 0

Owner

Peter Lin

GitHub

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

104 Nov 25, 2022

MODNet: Trimap-Free Portrait Matting in Real Time

MODNet is a model for real-time portrait matting with only RGB image input.

2.8k Dec 30, 2022

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging

Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging This repository contains an implementation

1.1k Jan 2, 2023

Real-Time-Student-Attendence-System - Real Time Student Attendence System

Real-Time-Student-Attendence-System The Student Attendance Management System Pro

1 Feb 15, 2022

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

Real-ESRGAN Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data Ported from https://github.com/xinntao/Real-ESRGAN Depend

44 Dec 27, 2022

Real-CUGAN - Real Cascade U-Nets for Anime Image Super Resolution

Real Cascade U-Nets for Anime Image Super Resolution 中文 | English ?? Real-CUGAN

111 Dec 28, 2022

Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

906 Dec 30, 2022

This is the unofficial code of Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes. which achieve state-of-the-art trade-off between accuracy and speed on cityscapes and camvid, without using inference acceleration and extra data

Deep Dual-resolution Networks for Real-time and Accurate Semantic Segmentation of Road Scenes Introduction This is the unofficial code of Deep Dual-re

113 Dec 23, 2022

Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices, ACM Multimedia 2021

Codes for ECBSR Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices Xindong Zhang, Hui Zeng, Lei Zhang ACM Multimedia 202

236 Dec 26, 2022

Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

69 Oct 19, 2022

PyMatting: A Python Library for Alpha Matting

Given an input image and a hand-drawn trimap (top row), alpha matting estimates the alpha channel of a foreground object which can then be composed onto a different background (bottom row).

1.4k Dec 30, 2022

Github project for Attention-guided Temporal Coherent Video Object Matting.

Attention-guided Temporal Coherent Video Object Matting This is the Github project for our paper Attention-guided Temporal Coherent Video Object Matti

71 Dec 19, 2022

[IJCAI'21] Deep Automatic Natural Image Matting

Deep Automatic Natural Image Matting [IJCAI-21] This is the official repository of the paper Deep Automatic Natural Image Matting. Introduction | Netw

316 Jan 6, 2023

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

6.5k Jan 4, 2023

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Robust Video Matting (RVM) English | 中文 Official repository for the paper Robust High-Resolution Video Matting with Temporal Guidance. RVM is specific

2 Aug 21, 2022

Video Matting Refinement For Python

Video-matting refinement Library (use pip to install) scikit-image numpy av matplotlib Run Static background python path_to_video.mp4 Moving backgroun

3 Jan 11, 2022

Official implementation of the paper "Lightweight Deep CNN for Natural Image Matting via Similarity Preserving Knowledge Distillation"

Lightweight-Deep-CNN-for-Natural-Image-Matting-via-Similarity-Preserving-Knowledge-Distillation Introduction Accepted at IEEE Signal Processing Letter

19 Jun 7, 2022

Rethinking Portrait Matting with Privacy Preserving

Rethinking Portrait Matting with Privacy Preserving This is the official repository of the paper Rethinking Portrait Matting with Privacy Preserving.

184 Jan 3, 2023

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

TCNN Pandey A, Wang D L. TCNN: Temporal convolutional neural network for real-time speech enhancement in the time domain[C]//ICASSP 2019-2019 IEEE Int

16 Dec 30, 2022