Python codes for Lite Audio-Visual Speech Enhancement.

Shang-Yi Chuang

Last update: Dec 1, 2022

Related tags

Deep Learning LAVSE

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

This is the PyTorch implementation of Lite Audio-Visual Speech Enhancement (LAVSE).

We have also put some preprocessed sample data (including enhanced results) in this repository.

The dataset of TMSV (Taiwan Mandarin speech with video) used in LAVSE is released here.

Please cite the following paper if you find the codes useful in your research.

@inproceedings{chuang2020lite,
  title={Lite Audio-Visual Speech Enhancement},
  author={Chuang, Shang-Yi and Tsao, Yu and Lo, Chen-Chou and Wang, Hsin-Min},
  booktitle={Proc. Interspeech 2020}
}

Prerequisites

Ubuntu 18.04
Python 3.6
CUDA 10

You can use pip to install Python depedencies.

pip install -r requirements.txt

Usage

You can simply enter the command below and the average PESQ and STOI results will show on your terminal pane.

Remember to activate visdom (probably in a screen or tmux) for recording the training loss before bashing the script.

bash run.sh

Go check run.sh if you need further information about the command lines.

License

The LAVSE work is released under MIT License.

See LICENSE for more details.

Acknowledgments

Bio-ASP Lab, CITI, Academia Sinica, Taipei, Taiwan
SLAM Lab, IIS, Academia Sinica, Taipei, Taiwan

You might also like...

Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite.

TFLite-HITNET-Stereo-depth-estimation Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite. Stereo depth e

22 Oct 20, 2022

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

TFLite-msg_chn_wacv20-depth-completion Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model

2 Oct 4, 2021

Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

TFLite-MobileStereoNet Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite. Stereo depth estimati

4 Feb 14, 2022

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

160 Jan 4, 2023

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

Lite-HRNet: A Lightweight High-Resolution Network Introduction This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution

675 Dec 25, 2022

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

1.3k Jan 2, 2023

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

YOLOv5-Lite：lighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

1.5k Jan 5, 2023

Lite-HRNet: A Lightweight High-Resolution Network

LiteHRNet Benchmark 🔥 🔥 Based on MMsegmentation 🔥 🔥 Cityscapes FCN resize concat config mIoU last mAcc last eval last mIoU best mAcc best eval bes

16 Dec 12, 2022

Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

TFlite Ultra Fast Lane Detection Inference Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite. So

12 Aug 27, 2022

Comments

preprocessed dataset?

I know we might need to download the Taiwan Mandarin Dataset, however, the preprocessed dataset in this repo. seems to be "wired"; any code to generate those files? especially the .pt ones.

opened by wubinbai 1
Question about data shape on model output and target data

Hi, I found that the shape of the model outputs (including audio and lip) do not contain frame_seq dimension. Under this condition, how does the loss be calculated due to the shape mismatch on model output and target data?

opened by hsiaohan0827 1

Python codes for Lite Audio-Visual Speech Enhancement.

Related tags

Overview

Lite Audio-Visual Speech Enhancement (Interspeech 2020)

Introduction

Prerequisites

Usage

License

Acknowledgments

You might also like...

Python scripts form performing stereo depth estimation using the HITNET model in Tensorflow Lite.

Python script for performing depth completion from sparse depth and rgb images using the msg_chn_wacv20. model in Tensorflow Lite.

Python scripts for performing stereo depth estimation using the MobileStereoNet model in Tensorflow Lite.

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

This is an official pytorch implementation of Lite-HRNet: A Lightweight High-Resolution Network.

A tutorial showing how to train, convert, and run TensorFlow Lite object detection models on Android devices, the Raspberry Pi, and more!

🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

Lite-HRNet: A Lightweight High-Resolution Network

Example scripts for the detection of lanes using the ultra fast lane detection model in Tensorflow Lite.

Comments

preprocessed dataset?

Question about data shape on model output and target data

Owner

Shang-Yi Chuang

A Low Complexity Speech Enhancement Framework for Full-Band Audio (48kHz) based on Deep Filtering.

Facestar dataset. High quality audio-visual recordings of human conversational speech.

PyTorch implementation of "A Full-Band and Sub-Band Fusion Model for Real-Time Single-Channel Speech Enhancement."

Speech Enhancement Generative Adversarial Network Based on Asymmetric AutoEncoder

Implementation of "A Deep Learning Loss Function based on Auditory Power Compression for Speech Enhancement" by pytorch

TCNN Temporal convolutional neural network for real-time speech enhancement in the time domain

The implementation of "Optimizing Shoulder to Shoulder: A Coordinated Sub-Band Fusion Model for Real-Time Full-Band Speech Enhancement"

HiFi++: a Unified Framework for Neural Vocoding, Bandwidth Extension and Speech Enhancement

Python scripts to detect faces in Python with the BlazeFace Tensorflow Lite models

Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"