HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Rishikesh (ऋषिकेश)

Last update: Dec 29, 2022

Related tags

Deep Learning HiFiplusplus-pytorch

Overview

HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

This is the unofficial implementation of Vocoder part of HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement.

Currently, this repo is WIP but you can start your training without any error.

Training:

python train.py --config config_v2.json

Citations:

@misc{https://doi.org/10.48550/arxiv.2203.13086,
  doi = {10.48550/ARXIV.2203.13086},
  
  url = {https://arxiv.org/abs/2203.13086},
  
  author = {Andreev, Pavel and Alanov, Aibek and Ivanov, Oleg and Vetrov, Dmitry},
  
  keywords = {Sound (cs.SD), Machine Learning (cs.LG), Audio and Speech Processing (eess.AS), FOS: Computer and information sciences, FOS: Computer and information sciences, FOS: Electrical engineering, electronic engineering, information engineering, FOS: Electrical engineering, electronic engineering, information engineering},
  
  title = {HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement},
  
  publisher = {arXiv},
  
  year = {2022},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

References:

https://github.com/jik876/hifi-gan

You might also like...

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech Keon Lee, Ky

114 Dec 12, 2022

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

LDNet Author: Wen-Chin Huang (Nagoya University) Email: [email protected] This is the official implementation of the paper "LDNet

40 Nov 20, 2022

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

43 Nov 27, 2022

CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

CBREN This is the Pytorch implementation for our IEEE TCSVT paper : CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhanceme

3 Nov 4, 2022

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

Applied Research Center (ARC), Tencent PCG

84 Jan 4, 2023

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

159 Dec 30, 2022

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021) This repository is the official P

159 Dec 30, 2022

⚡ Fast • 🪶 Lightweight • 0️⃣ Dependency • 🔌 Pluggable • 😈 TLS interception • 🔒 DNS-over-HTTPS • 🔥 Poor Man's VPN • ⏪ Reverse & ⏩ Forward • 👮🏿 "Proxy Server" framework • 🌐 "Web Server" framework • ➵ ➶ ➷ ➠ "PubSub" framework • 👷 "Work" acceptor & executor framework

Table of Contents Features Install Using PIP Stable version Development version Using Docker Stable version Development version Using HomeBrew Stable

2.2k Jan 8, 2023

A unified framework for machine learning with time series

Welcome to sktime A unified framework for machine learning with time series We provide specialized time series algorithms and scikit-learn compatible

6k Jan 8, 2023

Comments

may be a mistake in MRF？

it seems the paper use a serial ResBlock(kernel size=3,7,11) but the coder use parallel ResBlock

https://github.com/rishikksh20/HiFiplusplus-pytorch/blob/8be0d0e0092d4f609c37bfbeede5a9ad9bd7470a/models.py#L120

I think this is the key of high quality, but the RTF will be much bigger than melgan and hifigan v2. MAC is not a performance indicator.

opened by azraelkuan 4
Any idea about the discriminator architecture?

Hi, amazing work! I'm also trying to implement HIFI++ but I can't find any info regarding the discriminator architecture. In the paper they claim to have designed a much smaller MSD discriminators. Do you have any idea?

Further, we demonstrate that we can substitude MSD discriminators that operate on different input resolutions for identical much smaller discriminators that process the waveform on the initial resolution.

opened by gnsrla12 3
Bump numpy from 1.17.4 to 1.21.0
Bumps numpy from 1.17.4 to 1.21.0.

Release notes

Sourced from numpy's releases.

v1.21.0

NumPy 1.21.0 Release Notes

The NumPy 1.21.0 release highlights are

continued SIMD work covering more functions and platforms,

initial work on the new dtype infrastructure and casting,

universal2 wheels for Python 3.8 and Python 3.9 on Mac,

improved documentation,

improved annotations,

new PCG64DXSM bitgenerator for random numbers.

In addition there are the usual large number of bug fixes and other improvements.

The Python versions supported for this release are 3.7-3.9. Official support for Python 3.10 will be added when it is released.

:warning: Warning: there are unresolved problems compiling NumPy 1.21.0 with gcc-11.1 .

Optimization level -O3 results in many wrong warnings when running the tests.

On some hardware NumPy will hang in an infinite loop.

New functions

Add PCG64DXSM BitGenerator

Uses of the PCG64 BitGenerator in a massively-parallel context have been shown to have statistical weaknesses that were not apparent at the first release in numpy 1.17. Most users will never observe this weakness and are safe to continue to use PCG64. We have introduced a new PCG64DXSM BitGenerator that will eventually become the new default BitGenerator implementation used by default_rng in future releases. PCG64DXSM solves the statistical weakness while preserving the performance and the features of PCG64.

See upgrading-pcg64 for more details.

(gh-18906)

Expired deprecations

The shape argument numpy.unravel_index cannot be passed as dims keyword argument anymore. (Was deprecated in NumPy 1.16.)

... (truncated)

Commits

b235f9e Merge pull request #19283 from charris/prepare-1.21.0-release

34aebc2 MAINT: Update 1.21.0-notes.rst

493b64b MAINT: Update 1.21.0-changelog.rst

07d7e72 MAINT: Remove accidentally created directory.

032fca5 Merge pull request #19280 from charris/backport-19277

7d25b81 BUG: Fix refcount leak in ResultType

fa5754e BUG: Add missing DECREF in new path

61127bb Merge pull request #19268 from charris/backport-19264

143d45f Merge pull request #19269 from charris/backport-19228

d80e473 BUG: Removed typing for == and != in dtypes

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Single frequency lines in Spectrogram of generated wavs

Hi, I traned a model for the Bandwidth expansion task using the code and VCTK dataset. But I got bad generated wavs, which contain Single frequency lines in Spectrogram , as shown in the figure

opened by williamsz 0

Owner

Rishikesh (ऋषिकेश)

GitHub

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

SF-Net for fullband SE This is the repo of the manuscript "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Ban

36 Dec 2, 2022

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Related tags

Overview

HiFi++ : a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Training:

Citations:

References:

You might also like...

STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech

Official implementation of the paper: "LDNet: Unified Listener Dependent Modeling in MOS Prediction for Synthetic Speech"

ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

CBREN: Convolutional Neural Networks for Constant Bit Rate Video Quality Enhancement

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

Official PyTorch code for Hierarchical Conditional Flow: A Unified Framework for Image Super-Resolution and Image Rescaling (HCFlow, ICCV2021)

A unified framework for machine learning with time series

Comments

may be a mistake in MRF？

Any idea about the discriminator architecture?

Bump numpy from 1.17.4 to 1.21.0

v1.21.0

NumPy 1.21.0 Release Notes

New functions

Add PCG64DXSM BitGenerator

Expired deprecations

Single frequency lines in Spectrogram of generated wavs

Owner

Rishikesh (ऋषिकेश)

HiFi-GAN: High Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis

Combine Tacotron2 and Hifi GAN to generate speech from text

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

Python codes for Lite Audio-Visual Speech Enhancement.

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"