Voicefixer aims at the restoration of human speech regardless how serious its degraded.

Related tags

Audio voicefixer
Overview

Open In Colab PyPI version

VoiceFixer

Voicefixer aims at the restoration of human speech regardless how serious its degraded. It can handle noise, reveberation, low resolution (2kHz~44.1kHz) and clipping (0.1-1.0 threshold) effect within one model.

46dAq1.png

Demo

Please visit demo page to view what voicefixer can do.

Usage

from voicefixer import VoiceFixer
voicefixer = VoiceFixer()
voicefixer.restore(input="", # input wav file path
                   output="", # output wav file path
                   cuda=False, # whether to use gpu acceleration
                   mode = 0) # You can try out mode 0, 1 to find out the best result

from voicefixer import Vocoder
# Universal Speaker Independent Vocoder
vocoder = Vocoder(sample_rate=44100) # only support 44100 sample rate
vocoder.oracle(fpath="", # input wav file path
               out_path="") # output wav file path

46dnPO.png 46dMxH.png

Related Material

Comments
  • Issue with defining Module

    Issue with defining Module

    I'm trying to make a Google Colab with the code of this one, but it somehow returned this error: NameError: name 'VoiceFixer' is not defined. I even actually defined VoiceFixer using one of the definitions from line 9 of base.py. So I changed the definition with line 93 of model.py, still got the same error. Do you know any fixes? If yes, reply.

    opened by YTR76 9
  • Inconsistency in the generator architecture

    Inconsistency in the generator architecture

    Thanks for releasing the code publicly. I have a small confusion in the implementation of the generator mentioned here. As per Fig.3(a) in the paper, a mask is predicted from the input noisy audio which is then multiplied with the input to get the clean audio, but in the implementation, it seems the after the masking operation it is further passed through a unet. The loss is also calculated for both the outputs. Can you please clarify the inconsistency? Thanks in advance.

    opened by krantiparida 5
  • Add command line script

    Add command line script

    This update adds a script for processing files directly from the command line. You can test locally by switching to the command-line branch, navigating to the repo folder, and running pip3 install -e . You should be able to run the command voicefixer from any directory.

    opened by chrisbaume 4
  • Possibility of running on Windows?

    Possibility of running on Windows?

    Hello, I stumbled on this repo and found it really interesting. The demos in particular impressed me. I have some old/bad quality speech recordings I'd like to try and enhance, but I'm having trouble running any of the code.

    I am running Windows 10 home, Python 3.9.12 at the moment. No GPU present right now, so that may be a problem? I understand that the code is not well tested on Windows yet. Nevertheless, I am completely ignorant when it comes to getting these sorts of things to run; without clear steps to follow, I am lost.

    If there are legitimate issues running on Windows, I'd like to do my part in making them known, but I'm taking a shot in the dark here. I still hope I can be helpful though!

    I assume that the intended workflow for testing is to read an audio file eg. wav, aiff, raw PCM data etc. and process it, creating a new output file? But please correct me if I'm wrong.

    I followed instructions in readme.md to try and use the Streamlit app. Specifically, I ran these commands: pip install voicefixer==0.0.17 git clone https://github.com/haoheliu/voicefixer.git cd voicefixer pip install streamlit streamlit run test/streamlit.py At this point a Windows firewall dialog comes up and I click allow. Throughout this process, no errors seem to show up. But the models do not appear to download (no terminal updates, and I let it sit for about a day with no changes). Streamlit page remains blank. The last thing I see in terminal is: "  You can now view your Streamlit app in your browser.   Local URL: http://localhost:8501   Network URL: http://10.0.0.37:8501" That local URL is the one shown in the address bar.

    So yeah I'm quite lost. What do you advise? Thanks in advance!

    opened by musicalman 4
  • How to test the model for a single task?

    How to test the model for a single task?

    I ran the test/reference.py to test my distorted speech, and the result was GSR. How to test the model for a single task, such as audio super-resolution only? In addition, what is the delay of voicefixer?

    opened by litong123 4
  • Add streamlit inference demo page

    Add streamlit inference demo page

    image Hi!

    I'm very impressed with your research result, and also I want to test my samples as easily as possible.

    So, I made a simple web-based demo using streamlit.

    opened by AppleHolic 3
  • some questions

    some questions

    Hi, thanks for your great work.
    After reading your paper, I have a question here.

    1. Why use the two-stage algorithm? is it to facilitate more types of speech restoration?
    2. Since there is no information about the speed of the model in the paper, what is the training and inference speed of the model?
    opened by LqNoob 2
  • Can the pretrained model suppot these waveform where target sound is far-field?

    Can the pretrained model suppot these waveform where target sound is far-field?

    I tried to use the test script for restoring my audio, but I obtained worse performance. I suspect the model only supports target sound from close field.

    opened by NewEricWang 2
  • where to find the model(*.pth) to test the effect with my own input wav?

    where to find the model(*.pth) to test the effect with my own input wav?

    hi, i just want to test the powerfull effect of voicefixer, with my own distored wav. so i followed your instruction under Python Examples, but when run python3 test/test.py failed. the error information is as follows~~~~~~~~~ Initializing VoiceFixer... Traceback (most recent call last): File "test/test.py", line 39, in voicefixer = VoiceFixer() File "/root/anaconda3.8/lib/python3.8/site-packages/voicefixer/base.py", line 12, in init self._model = voicefixer_fe(channels=2, sample_rate=44100) File "/root/anaconda3.8/lib/python3.8/site-packages/voicefixer/restorer/model.py", line 140, in init self.vocoder = Vocoder(sample_rate=44100) File "/root/anaconda3.8/lib/python3.8/site-packages/voicefixer/vocoder/base.py", line 14, in init self._load_pretrain(Config.ckpt) File "/root/anaconda3.8/lib/python3.8/site-packages/voicefixer/vocoder/base.py", line 19, in _load_pretrain checkpoint = load_checkpoint(pth, torch.device("cpu")) File "/root/anaconda3.8/lib/python3.8/site-packages/voicefixer/vocoder/model/util.py", line 92, in load_checkpoint checkpoint = torch.load(checkpoint_path, map_location=device) File "/root/anaconda3.8/lib/python3.8/site-packages/torch/serialization.py", line 600, in load with _open_zipfile_reader(opened_file) as opened_zipfile: File "/root/anaconda3.8/lib/python3.8/site-packages/torch/serialization.py", line 242, in init super(_open_zipfile_reader, self).init(torch._C.PyTorchFileReader(name_or_buffer)) RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory It seems that the pretrained model file can not be find. i manually searched the *.pth files but not find, so seeking your help. Thank you!

    opened by yihe1003 2
  • Unable to test, error in state_dict

    Unable to test, error in state_dict

    Hello,

    I am trying to test the code on a wav file. But I receive the following message:

    RuntimeError: Error(s) in loading state_dict for VoiceFixer: Missing key(s) in state_dict: "f_helper.istft.ola_window". Unexpected key(s) in state_dict: "f_helper.istft.reverse.weight", "f_helper.istft.overlap_add.weight".

    Which seemed to be caused by the following line in the code: self._model = self._model.load_from_checkpoint(os.path.join(os.path.expanduser('~'), ".cache/voicefixer/analysis_module/checkpoints/epoch=15_trimed_bn.ckpt"))

    Do you have an idea on how to resolve this issue?

    opened by yalharbi 2
  • Some problems and questions.

    Some problems and questions.

    Hello! I installed your neural network and ran it in Desktop App mode, but I don't see the "Turn on GPU" switch here. This is the first question. Second question: How do I use the models from the demo page? GSR_UNet, VF_Unet, Oracle?

    Thanks in advance for the answer!

    opened by Aspector1 1
  • Lack of user information

    Lack of user information

    What do "modes" do? for example

    Change mode (The default mode is 0):
    
    voicefixer --infile /path/to/input.wav --outfile /path/to/output.wav --mode 1
    
    Run all modes:
    
    # output file saved to `/path/to/output-modeX.wav`.
    voicefixer --infile /path/to/input.wav --outfile /path/to/output.wav --mode all
    

    Also, the app says it uses cuda but even having the required hardware and drivers set up in my system, I see the app uses only my CPU. I did not use the "--disable-cuda" arg.

    I used this app on a 30 minute old radio show in spanish from 2001 that has horrible quality (a home recording of the radio show apparently) and the result was 20 minutes later (apparently it did not use CUDA) and had horrible quality, the words could not be understood anymore, they sounded like a person with difficulties to talk (was kinda funny though)

    opened by Shituation 0
  • "voicefixer" is not recognized as an internal or external command

    Hello! I plan to use voicefixer from the command line. In accordance with the instructions, I produce the following commands on the command line:

    1. pip install voicefixer==0.1.1
    2. git clone https://github.com/haoheliu/voicefixer.git
    3. cd voicefixer

    all these commands run without any error, ok. But then, as soon as I try to run the «voicefixer» command (for example, this command): voicefixer --infile /path/to/input.wav --outfile /path/to/output.wav a message is displayed: “voicefixer” is not recognized as an internal or external command…and so on even when I just write one word “voicefixer” on the command line, the same message is displayed As I understand it, this executable file cannot be found for some reason. How to fix it? I use Windows 10, I also installed the recommended WGET.

    opened by Scudoxx 4
  • Ask for batch inference example

    Ask for batch inference example

    Hello! I would like to ask to add example for batch processing from samples, for integrating voicefixer before ASV/ASR system if possible. As I have found so far, this proposed function will look quite like restore_inmem.

    opened by vanIvan 3
  • FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\sadness112\\.cache/voicefixer/analysis_module/checkpoints/vf.ckpt'

    FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\sadness112\\.cache/voicefixer/analysis_module/checkpoints/vf.ckpt'

    Hello, how to fix it? When I insert this code.

    # Install additional web package
    pip install streamlit
    # Run streamlit 
    streamlit run test/streamlit.py
    

    Opens the streamlit website and this error comes out Can you please explain to me how to install this software right. I am newbie and don't understand programming very well :( image

    opened by sadness112 7
  • RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

    RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

    Hello, as a novice in speech processing, I want to ask you a question, that is, I am running test.py the specific errors are as follows. I hope all teachers can teach me some solutions to this problem. Thank you very much! G{3KE9HK`K65KGEGEZTS8LR

    opened by zhangzhangthink 11
Releases(v0.0.12)
The project aims to develop a personal-assistant for Windows & Linux-based systems

The project aims to develop a personal-assistant for Windows & Linux-based systems. Samiksha draws its inspiration from virtual assistants like Cortana for Windows, and Siri for iOS. It has been designed to provide a user-friendly interface for carrying out a variety of tasks by employing certain well-defined commands.

SHUBHANSHU RAI 1 Jan 16, 2022
This library provides common speech features for ASR including MFCCs and filterbank energies.

python_speech_features This library provides common speech features for ASR including MFCCs and filterbank energies. If you are not sure what MFCCs ar

James Lyons 2.1k Sep 24, 2022
:speech_balloon: SpeechPy - A Library for Speech Processing and Recognition: http://speechpy.readthedocs.io/en/latest/

SpeechPy Official Project Documentation Table of Contents Documentation Which Python versions are supported Citation How to Install? Local Installatio

Amirsina Torfi 863 Sep 21, 2022
DeepSpeech is an open source embedded (offline, on-device) speech-to-text engine which can run in real time on devices ranging from a Raspberry Pi 4 to high power GPU servers.

Project DeepSpeech DeepSpeech is an open-source Speech-To-Text engine, using a model trained by machine learning techniques based on Baidu's Deep Spee

Mozilla 20.3k Sep 28, 2022
Speech recognition module for Python, supporting several engines and APIs, online and offline.

SpeechRecognition Library for performing speech recognition, with support for several engines and APIs, online and offline. Speech recognition engine/

Anthony Zhang 6.5k Sep 26, 2022
Conferencing Speech Challenge

ConferencingSpeech 2021 challenge This repository contains the datasets list and scripts required for the ConferencingSpeech challenge. For more detai

null 71 Sep 30, 2022
Speech Algorithms Collections

Speech Algorithms Collections

Ryuk 449 Sep 30, 2022
Simple, hackable offline speech to text - using the VOSK-API.

Nerd Dictation Offline Speech to Text for Desktop Linux. This is a utility that provides simple access speech to text for using in Linux without being

Campbell Barton 762 Sep 25, 2022
Some utils for auto speech recognition

About Some utils for auto speech recognition. Utils Util Description Script Reset audio Reset sample rate, sample width, etc of audios.

null 1 Jan 24, 2022
VoiceFixer VoiceFixer is a framework for general speech restoration.

VoiceFixer VoiceFixer is a framework for general speech restoration. We aim at the restoration of severly degraded speech and historical speech. Paper

Leo 161 Sep 16, 2022
log4j-tools: CVE-2021-44228 poses a serious threat to a wide range of Java-based applications

log4j-tools Quick links Click to find: Inclusions of log4j2 in compiled code Calls to log4j2 in compiled code Calls to log4j2 in source code Overview

JFrog Ltd. 168 Sep 1, 2022
This Project is based on NLTK It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

This Project is based on NLTK(Natural Language Toolkit) It generates a RANDOM WORD from a predefined list of words, From that random word it read out the word, its meaning with parts of speech , its antonyms, its synonyms

SaiVenkatDhulipudi 2 Nov 17, 2021
Real-ESRGAN aims at developing Practical Algorithms for General Image Restoration.

Real-ESRGAN Colab Demo for Real-ESRGAN . Portable Windows executable file. You can find more information here. Real-ESRGAN aims at developing Practica

Xintao 15.1k Oct 1, 2022
coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." ― John F. Woods coala provides a

coala development group 3.4k Sep 25, 2022
coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." ― John F. Woods coala provides a

coala development group 3.4k Oct 2, 2022
This is a modified variation of abhiTronix's vidgear. In this variation, it is possible to write the output file anywhere regardless the permissions.

Info In order to download this package: Windows 10: Press Windows+S, Type PowerShell (cmd in older versions) and hit enter, Type pip install vidgear_n

Ege Akman 3 Jan 30, 2022
VALORANT rank yoinker lets you retrieve the ranks and basic informations of everyone in the lobby, regardless of gamemode.

vRY VALORANT rank yoinker Retrieve the rank and basic information of everyone in the lobby, regardless of gamemode. Table of Contents Terms of Use Abo

Isaac Kenyon 240 Sep 27, 2022
null 108 Aug 19, 2022
SelfRemaster: SSL Speech Restoration

SelfRemaster: Self-Supervised Speech Restoration Official implementation of SelfRemaster: Self-Supervised Speech Restoration with Analysis-by-Synthesi

Takaaki Saeki 36 Sep 22, 2022