SelfRemaster: SSL Speech Restoration

Overview

SelfRemaster: Self-Supervised Speech Restoration

Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling

Demo

Setup

  1. Clone this repository: git clone https://github.com/Takaaki-Saeki/ssl_speech_restoration.git
  2. CD into this repository: cd ssl_speech_restoration
  3. Install python packages and download some pretrained models: ./setup.sh

Getting started

  • If you use default Japanese corpora
    • Download JSUT Basic5000 and JVS Corpus
    • Downsample them to 22.05 kHz and Place them under data/ as jsut_22k and jvs_22k
    • Place simulated low-quality data under ./data as jsut_22k-low and jvs_22k-low
  • Or you can use arbitrary datasets by modifying config files

Training

You can choose MelSpec or SourFilter models with --config_path option.
As shown in the paper, MelSpec model is of higher-quality.

Firstly you need to split the data to train/val/test and dump them by the following command.

python preprocess.py --config_path configs/train/${feature}/ssl_jsut.yaml

To perform self-supervised learning with dual learning, run the following command.

python train.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, refer to train.py.

Speech restoration

To perform speech restoration of the test data, run the following command.

python eval.py \
    --config_path configs/test/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

For other options, see eval.py.

Audio effect transfer

You can run a simple audio effect transfer demo using a model pretrained with real data.
Run the following command.

python aet_demo.py

Or you can customize the dataset or model.
You need to edit audio_effect_transfer.yaml and run the following command.

python aet.py \
    --config_path configs/test/melspec/audio_effect_transfer.yaml \
    --stage ssl-dual \
    --run_name aet_melspec_dual

For other options, see aet.py.

Pretrained models

See here.

Reproducing results

You can generate simulated low-quality data as in the paper with the following command.

python simulated_data.py \
    --in_dir ${input_directory (e.g., path to jsut_22k)} \
    --output_dir ${output_directory (e.g., path to jsut_22k-low)} \
    --corpus_type ${single-speaker corpus or multi-speaker corpus} \
    --deg_type lowpass

Then download the pretrained model correspond to the deg_type and run the following command.

python eval.py \
    --config_path configs/train/${feature}/ssl_jsut.yaml \
    --ckpt_path ${path to checkpoint} \
    --stage ssl-dual \
    --run_name ssl_melspec_dual

Citation

@article{saeki22selfremaster,
  title={{SelfRemaster}: {S}elf-Supervised Speech Restoration with Analysis-by-Synthesis Approach Using Channel Modeling},
  author={T. Saeki and S. Takamichi and T. Nakamura and N. Tanji and H. Saruwatari},
  journal={arXiv preprint arXiv:2203.12937},
  year={2022}
}

Reference

You might also like...
Punctuation Restoration using Transformer Models for High-and Low-Resource Languages
Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Punctuation Restoration using Transformer Models This repository contins official implementation of the paper Punctuation Restoration using Transforme

SwinIR: Image Restoration Using Swin Transformer
SwinIR: Image Restoration Using Swin Transformer

SwinIR: Image Restoration Using Swin Transformer This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Win

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR
Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]
Dynamic Attentive Graph Learning for Image Restoration, ICCV2021 [PyTorch Code]

Dynamic Attentive Graph Learning for Image Restoration This repository is for GATIR introduced in the following paper: Chong Mou, Jian Zhang, Zhuoyuan

An official repository for Paper
An official repository for Paper "Uformer: A General U-Shaped Transformer for Image Restoration".

Uformer: A General U-Shaped Transformer for Image Restoration Zhendong Wang, Xiaodong Cun, Jianmin Bao and Jianzhuang Liu Paper: https://arxiv.org/abs

Image Restoration Using Swin Transformer for VapourSynth

SwinIR SwinIR function for VapourSynth, based on https://github.com/JingyunLiang/SwinIR. Dependencies NumPy PyTorch, preferably with CUDA. Note that t

Half Instance Normalization Network for Image Restoration

HINet Half Instance Normalization Network for Image Restoration, based on https://github.com/megvii-model/HINet. Dependencies NumPy PyTorch, preferabl

Official repository for
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.
The 7th edition of NTIRE: New Trends in Image Restoration and Enhancement workshop will be held on June 2022 in conjunction with CVPR 2022.

NTIRE 2022 - Image Inpainting Challenge Important dates 2022.02.01: Release of train data (input and output images) and validation data (only input) 2

Comments
  • running train.py compains about lack of data

    running train.py compains about lack of data

    Thank you very much for the interesting paper and the code repo.

    I downloaded jvs and jsut dataset, unpacked them, renamed and degraded them accordingly, e.g.:

    #!/usr/bin/env bash
    
    set -ev
    
    dir=jsut_ver1.1
    
    [ -e "$dir" ] || {
      >&2 echo "error: invalid directory '$dir'"
      exit 1
    }
    
    lowdir="jsut_22k"
    degradedir="jsut_22k-low"
    
    replace_once() {
      s=$1; shift
      from=$1; shift
      to=$1; shift
      env python3 -c "print('$s'.replace('$from', '$to', 1))"
    }
    
    # create subdirs
    find "$dir" -type d | while IFS= read -r line; do
      mkdir -pv "$(replace_once "$line" "$dir" "$lowdir")"
    done
    
    # downsample to 22k
    find "$dir" -type f | sort -n | while IFS= read -r line; do
      [ -e "$line" ] || {
        echo "no such file $line"
        exit 1
      }
      output=$(replace_once "$line" "$dir" "$lowdir")
      [ -e "$output" ] &&  {
        continue
      }
      if [ -z "$(echo "$line" | grep -E ".wav$")" ]; then
        #cp -v "$line" "$output"
        continue
      fi
      echo "downsample '$line' -> '$output'"
      ffmpeg -nostdin -hide_banner -loglevel error -i "$line" -ac 1 -ar 22050 -q:a 0 -y "$output"
    done
    
    # create subdirs
    find "$dir" -type d | while IFS= read -r line; do
      mkdir -p "$(replace_once "$line" "$dir" "$degradedir")"
    done
    
    # degrade audio
    find "$lowdir" -type f | sort -n | while IFS= read -r line; do
      [ -e "$line" ] || {
        echo "no such file $line"
        exit 1
      }
      output=$(replace_once "$line" "$lowdir" "$degradedir")
      [ -e "$output" ] &&  {
        continue
      }
      if [ -z "$(echo "$line" | grep -E ".wav$")" ]; then
        #cp -v "$line" "$output"
        continue
      fi
      echo "degrade '$line' -> '$output'"
      tmp="/tmp/jsut_$(basename "$output")"
      ./degrade_audio.py "$line" "$tmp"
      mv "$tmp" "$output"
    done
    

    Then I do a similar thing with jvs dataset, but restructure so that the *.wav files are found under */*.wav mask somehow (15k files).

    In configs/train/melspec/ssl_jsut.yaml i change:

      source_path: "./data/jsut_22k-low/basic5000/wav"
      aux_path: "./data/jsut_22k/basic5000/wav"
    

    Running this seems to generate a lot of pickles for 5000+14997 files (changing jsut:

    python3 preprocess.py --config_path configs/train/melspec/ssl_jsut.yaml
    

    Then running

    env python3 train.py \
        --config_path configs/train/melspec/ssl_jsut.yaml \
        --stage ssl-dual \
        --run_name ssl_melspec_dual
    

    Produces "index 0 not found" errors in the dataset, e.g:

      File "./ssl_speech_restoration/dataset.py", line 205, in __getitem__
        d_batch["wavstask"] = torch.from_numpy(self.d_out["wavstask"][idx])
    IndexError: index 0 is out of bounds for axis 0 with size 0
    

    Changing ssl-dual into pretrain produces some "augment key not found" error.

    What would be the correct pipeline? Is there something I could try to make it train?

    Thanks

    opened by theoden8 5
  • RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

    RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False

    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/gradio/routes.py", line 275, in predict
    	output = await app.blocks.process_api(body, username, session_state)
      File "/usr/local/lib/python3.8/dist-packages/gradio/blocks.py", line 274, in process_api
    	predictions = await run_in_threadpool(block_fn.fn, *processed_input)
      File "/usr/local/lib/python3.8/dist-packages/starlette/concurrency.py", line 41, in run_in_threadpool
    	return await anyio.to_thread.run_sync(func, *args)
      File "/usr/local/lib/python3.8/dist-packages/anyio/to_thread.py", line 31, in run_sync
    	return await get_asynclib().run_sync_in_worker_thread(
      File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 937, in run_sync_in_worker_thread
    	return await future
      File "/usr/local/lib/python3.8/dist-packages/anyio/_backends/_asyncio.py", line 867, in run
    	result = context.run(func, *args)
      File "/usr/local/lib/python3.8/dist-packages/gradio/interface.py", line 500, in <lambda>
    	lambda *args: self.run_prediction(args)[0]
      File "/usr/local/lib/python3.8/dist-packages/gradio/interface.py", line 682, in run_prediction
    	prediction = predict_fn(*processed_input)
      File "aet_demo.py", line 60, in transfer
    	src_model = SSLDualLightningModule(config).load_from_checkpoint(
      File "/root/ssl_speech_restoration/lightning_module.py", line 623, in __init__
    	super().__init__(config)
      File "/root/ssl_speech_restoration/lightning_module.py", line 307, in __init__
    	self.vocoder = load_vocoder(config)
      File "/root/ssl_speech_restoration/utils.py", line 44, in load_vocoder
    	vocoder.load_state_dict(torch.load(config["general"]["hifigan_path"])["generator"])
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 608, in load
    	return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 787, in _legacy_load
    	result = unpickler.load()
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 743, in persistent_load
    	deserialized_objects[root_key] = restore_location(obj, location)
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 175, in default_restore_location
    	result = fn(storage, location)
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 151, in _cuda_deserialize
    	device = validate_cuda_device(location)
      File "/usr/local/lib/python3.8/dist-packages/torch/serialization.py", line 135, in validate_cuda_device
    	raise RuntimeError('Attempting to deserialize object on a CUDA '
    RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False. If you are running on a CPU-only machine, please use torch.load with map_location=torch.device('cpu') to map your storages to the CPU.
    

    that is when upload a sample file with spanish

    no-issue-activity 
    opened by johnfelipe 3
  • No versions in requirements.txt

    No versions in requirements.txt

    Hello. Thanks for publishing your code and checkpoints 😃

    I've come across the following error

    dataset.py:145: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray.
    

    Although this warning disappears when you add dtype=object, I came across another problem later on and was unable to get the system running.

    My suggestion is to add version numbers for each dependency in requirements.txt. That way, we can know which versions of each library form a working solution, and the code will continue to work in the future after libraries have changed.

    opened by chrisbaume 1
  • quality of restored speech not good

    quality of restored speech not good

    Hi

    I tried the Hugging face demo on my wav file but the quality is not good. Is it because the vocoder is trained on Japanese corpus. Is there a general speech restoration model?

    opened by sciai-ai 1
Owner
Takaaki Saeki
Ph.D. Student @ UTokyo / Spoken Language Processing
Takaaki Saeki
UT-Sarulab MOS prediction system using SSL models

UTMOS: UTokyo-SaruLab MOS Prediction System Official implementation of "UTMOS: UTokyo-SaruLab System for VoiceMOS Challenge 2022" submitted to INTERSP

sarulab-speech 58 Nov 22, 2022
Multi-Stage Progressive Image Restoration

Multi-Stage Progressive Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, and Ling Sh

Syed Waqas Zamir 859 Dec 22, 2022
Old Photo Restoration (Official PyTorch Implementation)

Bringing Old Photo Back to Life (CVPR 2020 oral)

Microsoft 11.3k Dec 30, 2022
(under submission) Bayesian Integration of a Generative Prior for Image Restoration

BIGPrior: Towards Decoupling Learned Prior Hallucination and Data Fidelity in Image Restoration Authors: Majed El Helou, and Sabine Süsstrunk {Note: p

Majed El Helou 22 Dec 17, 2022
Implementation for "Exploiting Aliasing for Manga Restoration" (CVPR 2021)

[CVPR Paper](To appear) | [Project Website](To appear) | BibTex Introduction As a popular entertainment art form, manga enriches the line drawings det

null 133 Dec 15, 2022
HINet: Half Instance Normalization Network for Image Restoration

HINet: Half Instance Normalization Network for Image Restoration Liangyu Chen, Xin Lu, Jie Zhang, Xiaojie Chu, Chengpeng Chen Paper: https://arxiv.org

null 303 Dec 31, 2022
This is an implementation for the CVPR2020 paper "Learning Invariant Representation for Unsupervised Image Restoration"

Learning Invariant Representation for Unsupervised Image Restoration (CVPR 2020) Introduction This is an implementation for the paper "Learning Invari

GarField 88 Nov 7, 2022
EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration

EDPN: Enhanced Deep Pyramid Network for Blurry Image Restoration Ruikang Xu, Zeyu Xiao, Jie Huang, Yueyi Zhang, Zhiwei Xiong. EDPN: Enhanced Deep Pyra

null 69 Dec 15, 2022
Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

Real-ESRGAN Colab Demo for Real-ESRGAN . Portable Windows executable file. You can find more information here. Real-ESRGAN aims at developing Practica

Xintao 17.2k Jan 2, 2023
Image restoration with neural networks but without learning.

Warning! The optimization may not converge on some GPUs. We've personally experienced issues on Tesla V100 and P40 GPUs. When running the code, make s

Dmitry Ulyanov 7.4k Jan 1, 2023