The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

Kexun Zhang

Last update: Jan 3, 2023

Related tags

Deep Learning WSRGlow

Overview

WSRGlow

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution. Audio samples can be found here.

Feel free to create issues or send an email to [email protected] if you have problems running the code.

Before running the code, you need to install the dependicies by pip install -r requirements.txt.

The configs for model architecture and training scheme is saved in config.yaml. You can overwrite some of the attributes by adding the --hparams flag when running a command. The general way to run a python script is

python $SRC$ --config $CONFIG$ --hparams $KEY1$=$VALUE1$,$KEY2$=$VALUE2$,...

See hparams.py for more details.

To prepare data

Before training, you need to binarize the data first. The raw wav files should be put in the hparams['raw_data_path']. The binarized data would be put in the hparams['binary_data_path'].

Specifically, for the VCTK corpus, the file structure should be like

.
|--data
    |--raw
        |--VCTK-Corpus
            |--wav48
                |--$WAVS
|--checkpoints
    |--wsrglow

where the model checkpoints are in checkpoints/wsrglow.

The command to binarize is

python binarizer.py --config config.yaml

To modify the architecture of the model

The current WSRGlow model in model.py is designed for x4 super-resolution and takes waveform, spectrogram and phase information as input.

To train

Run python train.py --config config.yaml on a GPU.

To infer

Change the code in infer.py to specify the checkpoint you want to load and the sample inputs you want to use for inference. Run python infer.py --config config.yaml on a GPU, modify the code for the correct path of checkpoints and wav files.

Comments

Add Cog config and web demo

Hi @zkx06111! 👋

This pull request adds an interactive web demo of the 2x upsampling checkpoint, based on your Colab notebook. You can try it out here: https://replicate.ai/zkx06111/wsrglow

Under the hood I've used an open source tool called Cog to build a Docker image, that is run by the Replicate servers. The Docker image can also be downloaded from the website for people who want to use WSRGlow from the command line without installing any Python dependencies.

If you click the "Sign in with GitHub" link you can edit the page and add more examples, and we'll feature your model on the Explore page.

In case you wonder who I am, I used to be a PhD student working on music source separation, and now I'm working on Replicate. I used to struggle to get baseline models working in my research, so we're building Cog and Replicate to make it easier to package trained models in a reproducible way. As part of that we're going around the internet making demos for our favorite models.

opened by andreasjansson 1
Is it possible to implement reading any other files than WAV? (e.g. MKA (Matroska) files)

Google Colab and replicate.com virtual machines have a necessity of chunking 44kHz WAV stereo file into 53 seconds parts, otherwise it throws out of memory CUDA error.

I find it very comfortable to chunk WAVs using MKVToolnix which losslessly places WAV inside Matroska container. I kind of overcome the MKA issue by using MKVExtractGUI-2 with v.20 version of MKVToolnix (the only one compatible). I also used Lossless-Cut for pure WAVs, but it's more cumbersome and cannot really chunk a file every 53 seconds automatically. At least it has a merge option.

opened by deton24 0
FileNotFoundError: [Errno 2] No such file or directory: ''
Hi, authors, Thank you for open sourcing this great repository.

I ran python train.py --config config.yaml, and got this error: FileNotFoundError: [Errno 2] No such file or directory: ''

Traceback (most recent call last): File "/home/wschoi/PycharmProjects/WSRGlow/train.py", line 345, in <module> WaveGlowTask4.start() File "/home/wschoi/PycharmProjects/WSRGlow/train.py", line 274, in start period=1 if hparams['save_ckpt'] else 100000 File "/home/wschoi/PycharmProjects/WSRGlow/training_utils.py", line 23, in __init__ os.makedirs(filepath, exist_ok=True) File "/home/wschoi/miniconda3/envs/wsrglow/lib/python3.7/os.py", line 223, in makedirs mkdir(name, mode) FileNotFoundError: [Errno 2] No such file or directory: '' Process finished with exit code 1

I guess this error was occurred because args_work_dir was set to '' unless args.exp_name is not a default value.

https://github.com/zkx06111/WSRGlow/blob/1b8fc4939c72b319efdb520ba2868eacf468ca18/hparams.py#L39-L42

and then, hparams_['work_dir'] is set to args_work_dir regardless of work_dir of config.yaml.

https://github.com/zkx06111/WSRGlow/blob/1b8fc4939c72b319efdb520ba2868eacf468ca18/hparams.py#L84-L86

TLDR;

This error is occurred only when args.exp_name == ''.

For those who want to quickly reproduce train.py I would recommend a script like below.

python train.py --config config.yaml --config config.yaml --exp_name WSRGlow
opened by ws-choi 0
Real world application, upsampling historic recordings?

Hi I've been testing your model for a side project I'm working on. I'd like to take early historic recordings (1890-1920s), denoise & upsample them. I've already denoised them (amazingly so!), I'm trying to upsample using your model but it doesn't seem to be doing much. I used the code from the code lab and the config that's in the repo.

Is this not a good application of the model or did I do something incorrectly?

Here is the results I produced example_and_prediction_wav_files.zip

Spectrogram - Top is the example wav (Thomas Edison speaking 1912), bottom is the prediction. I can't hear a discernible difference and I'm well versed in audio engineering.

from infer import *

set_hparams(config='config.yaml')

model = WaveGlowMelHF(**hparams['waveglow_config']).cuda()

load_ckpt(model, 'model_ckpt_best.pt') model.eval()

fns = ['te_small.wav']

sigma = 1 for lr_fn in fns: lr, sr = load_wav(lr_fn) print(f'sampling rate (lr) = {sr}') print(f'lr.shape = {lr.shape}', flush=True) with torch.no_grad(): pred = run(model, lr, sigma=sigma) print(lr.shape, pred.shape) pred_fn = f'pred_{lr_fn}' print(f'sampling rate = {sr * 2}') sf.write(open(pred_fn, 'wb'), pred, sr * 2)

opened by go-dustin 0
Having Trouble in training: utils.tensors_to_scalars

Hello, I'm trying to run your code. I just ran train.py with commands in readme, (with additional argument --hparams work_dir=ccc). But faced this error.

File "train.py", line 143, in training_step log_outputs = utils.tensors_to_scalars(log_outputs) AttributeError: module 'utils' has no attribute 'tensors_to_scalars'

I looked over commit logs, but utils.py never had that function.

opened by jc5201 1
distorted spectrograms after model

Hi! I tried your pretrained checkpoint in colab and got some extra values at the spectrogram in the first case and broken harmonics in the second case. First audio is 44100Hz real speech (converted to 24k and then upscaled to 48k). Second audio is the output of text-to-speech system (22050, upscaled to 44100)

I don't hear any noticeable difference in both audios, is this expected?

this is the spectrogram representation in Audacity. Upper one is before, bottom is after. Mel scale

opened by thepowerfuldeez 4

Owner

Kexun Zhang

Interested in linguistics. Former participant in programming contests.

GitHub

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

AST: Audio Spectrogram Transformer Introduction Citing Getting Started ESC-50 Recipe Speechcommands Recipe AudioSet Recipe Pretrained Models Contact I

603 Jan 7, 2023

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Speech Resynthesis from Discrete Disentangled Self-Supervised Representations Implementation of the method described in the Speech Resynthesis from Di

253 Jan 6, 2023

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

170 Jan 4, 2023

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation This is an unofficial PyTorch

54 Aug 30, 2021

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

Chunked Autoregressive GAN (CARGAN) Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis [paper] [compan

150 Dec 6, 2022

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

UnivNet UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation. Training python train.py --c

55 Dec 26, 2022

efficient neural audio synthesis in the waveform domain

neural waveshaping synthesis real-time neural audio synthesis in the waveform domain paper • website • colab • audio by Ben Hayes, Charalampos Saitis,

169 Dec 23, 2022

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of Conformer: Convolution-augmented Transformer for Speech Recognition. Transformer models are good at capturing content-based

565 Jan 4, 2023

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

ContextNet ContextNet has CNN-RNN-transducer architecture and features a fully convolutional encoder that incorporates global context information into

24 Nov 24, 2022

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

PyTorch SRResNet Implementation of Paper: "Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network"(https://arxiv.org/abs

436 Jan 9, 2023

PyTorch implementation of Glow

glow-pytorch PyTorch implementation of Glow, Generative Flow with Invertible 1x1 Convolutions (https://arxiv.org/abs/1807.03039) Usage: python train.p

433 Dec 27, 2022

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Deep Constrained Least Squares for Blind Image Super-Resolution [Paper] This is the official implementation of 'Deep Constrained Least Squares for Bli

141 Dec 30, 2022

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

DASR Paper Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution Jie Liang, Hui Zeng, and Lei Zhang. In arxiv preprint. Abs

81 Dec 28, 2022

Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

LDL Paper | Supplementary Material Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution Jie Liang*, Hu

150 Dec 26, 2022

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution Figure: Example visualization of the method and baseline as a

16 Dec 23, 2022

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Reference-based Video Super-Resolution (RefVSR) Official PyTorch Implementation of the CVPR 2022 Paper Project | arXiv | RealMCVSR Dataset This repo c

151 Dec 30, 2022

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

110 Dec 20, 2022

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

110 Dec 20, 2022

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".

This is the official PyTorch implementation of TMNet in the CVPR 2021 paper "Temporal Modulation Network for Controllable Space-Time VideoSuper-Resolu

95 Oct 24, 2022

The official implementation of the Interspeech 2021 paper WSRGlow: A Glow-based Waveform Generative Model for Audio Super-Resolution.

Related tags

Overview

WSRGlow

To prepare data

To modify the architecture of the model

To train

To infer

Comments

Add Cog config and web demo

Is it possible to implement reading any other files than WAV? (e.g. MKA (Matroska) files)

FileNotFoundError: [Errno 2] No such file or directory: ''

Real world application, upsampling historic recordings?

Having Trouble in training: utils.tensors_to_scalars

distorted spectrograms after model

Owner

Kexun Zhang

Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".

An official reimplementation of the method described in the INTERSPEECH 2021 paper - Speech Resynthesis from Discrete Disentangled Self-Supervised Representations.

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Unofficial PyTorch Implementation of UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

Official implementation of the paper Chunked Autoregressive GAN for Conditional Waveform Synthesis

UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation

efficient neural audio synthesis in the waveform domain

PyTorch implementation of "Conformer: Convolution-augmented Transformer for Speech Recognition" (INTERSPEECH 2020)

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

pytorch implementation for Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network arXiv:1609.04802

PyTorch implementation of Glow

Official PyTorch implementation of the paper "Deep Constrained Least Squares for Blind Image Super-Resolution", CVPR 2022.

Official implementation of the paper 'Efficient and Degradation-Adaptive Network for Real-World Image Super-Resolution'

Official implementation of the paper 'Details or Artifacts: A Locally Discriminative Learning Approach to Realistic Image Super-Resolution' in CVPR 2022

Deep Learning: Architectures & Methods Project: Deep Learning for Audio Super-Resolution

[CVPR 2022] Official PyTorch Implementation for "Reference-based Video Super-Resolution Using Multi-Camera Video Triplets"

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

The official pytorch implemention of the CVPR paper "Temporal Modulation Network for Controllable Space-Time Video Super-Resolution".