Translate .sbv subtitle files

Overview

deepl4subtitle

Deeplを使って字幕ファイル(.sbv)を翻訳します。タイムスタンプも含めて出力しますが、翻訳時はタイムスタンプは文の一部とは切り離されるので、.sbvファイルをそのまま翻訳機に突っ込むよりも高精度な翻訳ができるはずです。

つかいかた

入力する.sbvファイルの前処理として、文の終わりにピリオド(.)を打っていく。これで、Deeplが文の区切りを正しく認識してくれる。

# install deepl 
# https://pypi.org/project/deepl/
pip3 install deepl
python3 deepl4subtitle.py -i sample.sbv -o output.sbv -k YOUR_DEEPL_API_KEY

サンプル

sample video: https://www.youtube.com/watch?v=CL7HuMLIPO0

  • sample.xbv: Youtubeが自動で生成した字幕を若干手直ししたもの
  • sample_deepl4subtitle.sbv: deepl4subtitleを使って翻訳したもの
  • sample_raw_deepl.sbv: sample.xbvの中身をそのままDeeplにコピペして翻訳したもの

sample_raw_deeplだと、タイムスタンプが文章の一部であることが原因であちこちで怪しい翻訳が発生していたのが、sample_deepl4subtitleでは概ね解消されている。

中でやってること

original

(文末のピリオドは手作業で加える必要がある)

0:00:01.340,0:00:04.780
クラウドコンピューティングという言葉を
知っているだろうか.

0:00:04.780,0:00:08.110
クラウドコンピューティングとは
インターネットの先にあるデータセンター

0:00:08.110,0:00:12.420
のサーバーに処理してもらうシステム形態
を指す言葉である.

↓ move timestamp within XML tag, remove newlines

クラウドコンピューティングという言葉を知っているだろうか.クラウドコンピューティングとはインターネットの先にあるデータセンターのサーバーに処理してもらうシステム形態を指す言葉である. ">
<timestamp ts="0:00:01.340,0:00:04.780"/>クラウドコンピューティングという言葉を知っているだろうか.<timestamp ts="0:00:04.780,0:00:08.110"/>クラウドコンピューティングとはインターネットの先にあるデータセンター<timestamp ts="0:00:08.110,0:00:12.420"/>のサーバーに処理してもらうシステム形態を指す言葉である.

↓ translate with Deepl through API, ignoring XML tags

Do you know the term "cloud computing"? Cloud computing is a term that refers to a form of system that is processed by servers in a data center located beyond the Internet. ">
<timestamp ts="0:00:01.340,0:00:04.780"/>Do you know the term "cloud computing"? <timestamp ts="0:00:04.780,0:00:08.110"/> Cloud computing is a term that refers to a form of system that is processed by servers in a data center <timestamp ts="0:00:08.110,0:00:12.420"/>located beyond the Internet. 

↓ put back timestamp and newlines

0:00:01.340,0:00:04.780
Do you know the term "cloud computing"? 

0:00:04.780,0:00:08.110
 Cloud computing is a term that refers to a form of system that is processed by servers in a data center 

0:00:08.110,0:00:12.420
located beyond the Internet. 
You might also like...
AutoSub is a CLI application to generate subtitle files (.srt, .vtt, and .txt transcript) for any video file using Mozilla DeepSpeech.

AutoSub About Motivation Installation Docker How-to example How it works TO-DO Contributing References About AutoSub is a CLI application to generate

A Simple Telegram Bot By @Tellybots to add Subtitle Files in Video
A Simple Telegram Bot By @Tellybots to add Subtitle Files in Video

Video-subtitle-merger A Simple Telegram Bot By @Tellybots to add Subtitle Files in Video Features Force Sub Button Added Soon Support Media Type Such

Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.
Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module.

Import Subtitles for Blender VSE Addon for adding subtitle files to blender VSE as Text sequences. Using pysub2 python module. Supported formats by py

Subtitle Workshop (subshop): tools to download and synchronize subtitles

SUBSHOP Tools to download, remove ads, and synchronize subtitles. SUBSHOP Purpose Limitations Required Web Credentials Installation, Configuration, an

A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract.

Video Subtitle Extractor Bot A simple Telegram bot to extract hard-coded subtitle from videos using FFmpeg & Tesseract. Note that the accuracy of reco

 Subtitle Translater
Subtitle Translater

Subtitle Translater

A Python library for rendering ASS subtitle file format using libass.

ass_renderer A Python library for rendering ASS subtitle file format using libass. Installation pip install --user ass-renderer Contributing # Clone

Nonton anime subtitle Indonesia tanpa iklan. Dengan GUI berbasis PyQt5 dan spaghetti code yang sangat tidak terstruktur
Nonton anime subtitle Indonesia tanpa iklan. Dengan GUI berbasis PyQt5 dan spaghetti code yang sangat tidak terstruktur

Nonton anime subtitle Indonesia tanpa iklan. Dengan GUI berbasis PyQt5 dan spaghetti code yang sangat tidak terstruktur

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Automatically move or copy files based on metadata associated with the files. For example, file your photos based on EXIF metadata or use MP3 tags to file your music files.

Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format

DAOTools This is a set of tools for Dragon Age: Origins modding. It can patch the text lines of .dlg files, extract and build an .erf file, and view t

(unofficial) Googletrans: Free and Unlimited Google translate API for Python. Translates totally free of charge.

Googletrans Googletrans is a free and unlimited python library that implemented Google Translate API. This uses the Google Translate Ajax API to make

Translate - a PyTorch Language Library

NOTE PyTorch Translate is now deprecated, please use fairseq instead. Translate - a PyTorch Language Library Translate is a library for machine transl

A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.
A flexible free and unlimited python tool to translate between different languages in a simple way using multiple translators.

deep-translator Translation for humans A flexible FREE and UNLIMITED tool to translate between different languages in a simple way using multiple tran

Translate - a PyTorch Language Library

NOTE PyTorch Translate is now deprecated, please use fairseq instead. Translate - a PyTorch Language Library Translate is a library for machine transl

Free and Open Source Machine Translation API. 100% self-hosted, no limits, no ties to proprietary services. Built on top of Argos Translate.
Free and Open Source Machine Translation API. 100% self-hosted, no limits, no ties to proprietary services. Built on top of Argos Translate.

LibreTranslate Try it online! | API Docs Free and Open Source Machine Translation API, entirely self-hosted. Unlike other APIs, it doesn't rely on pro

Tool which allow you to detect and translate text.
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Your copilot to studies and work (Pomodoro-timer, Translate and Notes app)

Copylot Your copilot to studies and work (Pomodoro-timer, Translate and Notes app) Copylot are three applications in one: Pomodoro Translate Notes Cop

Tool which allow you to detect and translate text.
Tool which allow you to detect and translate text.

Text detection and recognition This repository contains tool which allow to detect region with text and translate it one by one. Description Two pretr

Trans is a dependency-free CLI for Google Translate

Trans is a dependency-free CLI for Google Translate

Owner
Yasunori Toshimitsu
Yasunori Toshimitsu
Auto translate Localizable.strings for multiple languages in Xcode

auto_localize Auto translate Localizable.strings for multiple languages in Xcode Usage put your origin Localizable.strings file in folder pip3 install

Wesley Zhang 13 Nov 22, 2022
Build a translation program similar to Google Translate with Python programming language and QT library

google-translate Build a translation program similar to Google Translate with Python programming language and QT library Different parts of the progra

Amir Hussein Sharifnezhad 3 Oct 9, 2021
strbind - lapidary text converter for translate an text file to the C-style string

strbind strbind - lapidary text converter for translate an text file to the C-style string. My motivation is fast adding large text chunks to the C co

Mihail Zaytsev 1 Oct 22, 2021
AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

Niklas 29 Dec 20, 2022
A collection of pre-commit hooks for handling text files.

texthooks A collection of pre-commit hooks for handling text files. In particular, hooks for handling unicode characters which may be undesirable in a

Stephen Rosen 5 Oct 28, 2022
Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files

cdvpp Chilean Digital Vaccination Pass Parser (CDVPP) parses digital vaccination passes from PDF files Reads a Digital Vaccination Pass PDF file as in

Esteban Borai 1 Nov 17, 2021
Split large XML files into smaller ones for easy upload

Split large XML files into smaller ones for easy upload. Works for WordPress Posts Import and other XML files.

Joseph Adediji 1 Jan 30, 2022
Easy Google Translate: Unofficial Google Translate API

easygoogletranslate Unofficial Google Translate API. This library does not need an api key or something else to use, it's free and simple. You can eit

Ahmet Eren Odacı 9 Nov 6, 2022
Edit SRT files to delay subtitle time-stamps.

subtitle-delay A program written in Python that directly edits SRT file to delay the subtitles. Features: Will throw an error if delaying with negativ

null 8 Jul 17, 2022
This code renames subtitle file names to your video files names, so you don't need to rename them manually.

Rename Subtitle This code renames your subtitle file names to your video file names so you don't need to do it manually Note: It only works for series

Mostafa Kazemi 4 Sep 12, 2021