Automatically erase objects in the video, such as logo, text, etc.

seeprettyface.com

Last update: Dec 26, 2022

Related tags

Deep Learning Video-Auto-Wipe

Overview

Video-Auto-Wipe

Read English Introduction：Here

本人不定期的基于生成技术制作一些好玩有趣的算法模型，这次带来的作品是“视频擦除”方向的应用模型，它实现的功能是自动感知到视频中我们不想看见的部分（譬如广告、水印、字幕、图标等等）然后进行擦除。由于图标擦除模型存在潜在的被利用于侵权行为的隐患，因此我暂时只分享了字幕擦除模型，希望能帮助到大家。
我后续会持续不断的探索和制作新的生成方向的技术内容。基于生成模型可玩的点还有很多，此项目仅展示了其中一个做落地应用的例子。本项目的模型版权所属为：www.seeprettyface.com ，未获得授权请不要直接用作商业用途。关于算法的细节介绍可以参阅我的研究笔记。

效果预览

1. 图标擦除

图标擦除模型的功能是模型自动感知到视频中图标的位置然后进行擦除，感知图标的方法为在时域上静止不动的小块像素块被视作图标。

2. 动态图标擦除

动态图标擦除模型的功能是模型自动感知到视频中动态图标的位置然后进行擦除，感知动态图标的方法为在时域上闪烁出现或动态移动的固定像素块被视作动态图标，这个在制作上有一定难度所以还没有对外开放。

2.1 测试1-闪烁出现的特效文字擦除

查看视频

3. 字幕擦除

字幕擦除模型的功能是模型自动感知到视频中字幕的位置然后进行擦除，感知字幕的方法为具有统一样式的文字区域被视作字幕。

3.1 测试1-电影字幕擦除

查看视频

3.2 测试2-电视剧字幕擦除

查看视频

3.3 测试3-综艺节目字幕擦除

查看视频

3.4 测试4-综艺节目特殊字幕擦除

查看视频

3.5 测试5-网络视频字幕擦除

查看视频

3.6 测试6-小语种字幕擦除

查看视频

使用方法

1.环境配置

torch>1.0
其他的缺什么依赖就pip install xxx，需要的东西不多

2.运行方法

1. 下载预训练模型放在pretrained-weight文件夹里；
预训练模型下载地址：链接：https://pan.baidu.com/s/1ubZHkgkcskS7Bpg8ZbtoRQ 提取码：ricn

2. 将视频文件和mask文件放在input文件夹里，编辑demo.py(或通过命令行参数)选中对应文件位置；
输入样例下载地址：https://pan.baidu.com/s/1rfdAwxomCVjTJ1zwl7hu3g 提取码：qk64

3. 图标擦除任务运行：python demo.py delogo
字幕擦除任务运行：python demo.py detext

训练方法

训练数据

1.YoutubeVOS2018数据集；

2.基于搜集的300余部高清电影制作了2,709部电影片段数据集；
下载地址：https://pan.baidu.com/s/1CIgJmFmx5iR2JfgAyjVaeg 提取码：xb7o

3.基于搜集的40余部综艺节目制作了864部综艺片段数据集；
下载地址：https://pan.baidu.com/s/1lJk6IIWlwxknAie0LlGYOg 提取码：9rd4

训练过程

第1步. 针对特定任务的时域感知训练；
第2步. 融合擦除模型的微调训练。

训练配置

最近寻觅到了一种非常简易的制作和训练方法:
'图标擦除'模型在单卡3090上训练3天；
'字幕擦除'模型在单卡3090上训练2天；

了解更多

本人的研究方向是生成模型的应用技术研究。生成技术解决的问题是像素的预测，也就是在一个有缺失/完全缺失的图像棋盘上进行像素的填补/预测，使填补/预测完的图像符合真实图像的规律。基于这种模式可展开的玩法有很多，除了我之前做的数字人生成、视频内容生成等，我们还可以拓展出更多并行的思路出来。
尽管目前大部分的CV落地项目都集中在感知和识别任务上，而对于重构和生成任务的研发相对较少，但这不应影响我们对于生成技术价值的判断，毕竟生成技术是相对较新、参与人较少，但是应用前景较广的研究方向。我后续将持续致力于探索生成方向的落地型算法研发，欢迎访问我的网站了解这方面最新的研究进展：www.seeprettyface.com。

Doods2 - API for detecting objects in images and video streams using Tensorflow

DOODS2 - Return of DOODS Dedicated Open Object Detection Service - Yes, it's a b

101 Jan 4, 2023

Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Box_Discretization_Network This repository is built on the pytorch [maskrcnn_benchmark]. The method is the foundation of our ReCTs-competition method

266 Nov 24, 2022

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

19 Sep 29, 2022

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

TAP: Text-Aware Pre-training TAP: Text-Aware Pre-training for Text-VQA and Text-Caption by Zhengyuan Yang, Yijuan Lu, Jianfeng Wang, Xi Yin, Dinei Flo

61 Nov 14, 2022

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

SwinTextSpotter This is the pytorch implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text R

183 Jan 3, 2023

A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

imutils A series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, and displ

4.3k Jan 8, 2023

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

781 Jan 3, 2023

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

简体中文 | English PaddleRobotics paddleRobotics是基于paddle的机器人开源算法库集，包括人机交互、复杂运动控制、环境感知、slam定位导航等开源算法部分。人机交互主动多模交互技术TFVT-HRI 主动多模交互技术是通过视觉、语音、触摸传感器等输入机器人

185 Dec 26, 2022

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

This is the Vowpal Wabbit fast online learning code. Why Vowpal Wabbit? Vowpal Wabbit is a machine learning system which pushes the frontier of machin

8.1k Jan 6, 2023

Automatically erase objects in the video, such as logo, text, etc.

Related tags

Overview

Video-Auto-Wipe

效果预览

1. 图标擦除

1.1 测试1-电视剧的台标、剧名和角标擦除

1.2 测试2-足球赛的台标、状态栏擦除

1.3 测试3-综艺节目的台标、状态栏擦除

1.4 测试4-短视频MV的遮挡图标擦除

1.5 测试5-短视频MV的遮挡水印擦除

1.6 测试6-新闻媒体的台标擦除

2. 动态图标擦除

2.1 测试1-闪烁出现的特效文字擦除

3. 字幕擦除

3.1 测试1-电影字幕擦除

3.2 测试2-电视剧字幕擦除

3.3 测试3-综艺节目字幕擦除

3.4 测试4-综艺节目特殊字幕擦除

3.5 测试5-网络视频字幕擦除

3.6 测试6-小语种字幕擦除

使用方法

1.环境配置

2.运行方法

训练方法

训练数据

训练过程

训练配置

更多玩法

了解更多

You might also like...

Doods2 - API for detecting objects in images and video streams using Tensorflow

Omnidirectional Scene Text Detection with Sequential-free Box Discretization (IJCAI 2019). Including competition model, online demo, etc.

Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

TAP: Text-Aware Pre-training for Text-VQA and Text-Caption, CVPR 2021 (Oral)

Pytorch re-implementation of Paper: SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition (CVPR 2022)

A series of convenience functions to make basic image processing operations such as translation, rotation, resizing, skeletonization, and displaying Matplotlib images easier with OpenCV and Python.

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

PaddleRobotics is an open-source algorithm library for robots based on Paddle, including open-source parts such as human-robot interaction, complex motion control, environment perception, SLAM positioning, and navigation.

Vowpal Wabbit is a machine learning system which pushes the frontier of machine learning with techniques such as online, hashing, allreduce, reductions, learning2search, active, and interactive learning.

Owner

seeprettyface.com

Video-Captioning - A machine Learning project to generate captions for video frames indicating the relationship between the objects in the video

This is the offical website for paper ''Category-consistent deep network learning for accurate vehicle logo recognition''

A custom DeepStack model that has been trained detecting ONLY the USPS logo

Satellite labelling tool for manual labelling of storm top features such as overshooting tops, above-anvil plumes, cold U/Vs, rings etc.

Transport Mode detection - can detect the mode of transport with the help of features such as acceeration,jerk etc

A Python script that creates subtitles of a given length from text paragraphs that can be easily imported into any Video Editing software such as FinalCut Pro for further adjustments.

Customised to detect objects automatically by a given model file(onnx)

A blender add-on that automatically re-aligns wrong axis objects.

This repository contains a re-implementation of the code for the CVPR 2021 paper "Omnimatte: Associating Objects and Their Effects in Video."

Code for the SIGGRAPH 2021 paper "Consistent Depth of Moving Objects in Video".