LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

WangRui

Last update: Dec 29, 2022

Related tags

Overview

LightHuBERT

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

| Github | Huggingface | SUPERB Leaderboard |

The authors' PyTorch implementation and pretrained models of LightHuBERT.

March 2022: release preprint in arXiv and checkpoints in huggingface.

Pre-Trained Models

Model	Pre-Training Dataset	Download Link
LightHuBERT Base	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_base.pt
LightHuBERT Small	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_small.pt
LightHuBERT Stage 1	960 hrs LibriSpeech	huggingface: lighthubert/lighthubert_stage1.pt

Actually, the pre-trained is trained in common.fp16: true so that we can perform model inference with fp16 weights.

Requirements and Installation

PyTorch version >= 1.8.1
Python version >= 3.6
numpy version >= 1.19.3
To install lighthubert:

git clone [email protected]:mechanicalsea/lighthubert.git
cd lighthubert
pip install --editable .

Load Pre-Trained Models for Inference

import torch
from lighthubert import LightHuBERT, LightHuBERTConfig

wav_input_16khz = torch.randn(1,10000).cuda()

# load the pre-trained checkpoints
checkpoint = torch.load('/path/to/lighthubert.pt')
cfg = LightHuBERTConfig(checkpoint['cfg']['model'])
cfg.supernet_type = 'base'
model = LightHuBERT(cfg)
model = model.cuda()
model = model.eval()
print(model.load_state_dict(checkpoint['model'], strict=False))

# (optional) set a subnet
subnet = model.supernet.sample_subnet()
model.set_sample_config(subnet)
params = model.calc_sampled_param_num()
print(f"subnet (Params {params / 1e6:.0f}M) | {subnet}")

# extract the the representation of last layer
rep = model.extract_features(wav_input_16khz)[0]

# extract the the representation of each layer
hs = model.extract_features(wav_input_16khz, ret_hs=True)[0]

print(f"Representation at bottom hidden states: {torch.allclose(rep, hs[-1])}")

More examples can be found in our tutorials.

Universal Representation Evaluation on SUPERB

License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree. Portions of the source code are based on the FAIRSEQ project.

Reference

If you find our work is useful in your research, please cite the following paper:

@article{wang2022lighthubert,
  title={{LightHuBERT}: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit {BERT}},
  author={Rui Wang and Qibing Bai and Junyi Ao and Long Zhou and Zhixiang Xiong and Zhihua Wei and Yu Zhang and Tom Ko and Haizhou Li},
  journal={arXiv preprint arXiv:2203.15610},
  year={2022}
}

Contact Information

For help or issues using LightHuBERT models, please submit a GitHub issue.

For other communications related to LightHuBERT, please contact Rui Wang ([email protected]).

Learning to Estimate Hidden Motions with Global Motion Aggregation

Learning to Estimate Hidden Motions with Global Motion Aggregation (GMA) This repository contains the source code for our paper: Learning to Estimate

221 Dec 18, 2022

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenGaze: Web Service for OpenFace Facial Behaviour Analysis Toolkit Overview OpenFace is a fantastic tool intended for computer vision and machine le

4 Nov 3, 2022

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

OpenFace 2.2.0: a facial behavior analysis toolkit Over the past few years, there has been an increased interest in automatic facial behavior analysis

5.8k Dec 31, 2022

Comments

Reproducing the Results from the SUPERB Leaderboard

Hello Mr. Wang!

First of all, I would like to thank you for your work and effort to make it open source. I've been working on the robustness of SRL models and I'm trying to reproduce the downstream models from SUPERB.

Do you have the CKPT files generated when training the SUPERB models? If not, could you inform the parameters used in the config.yaml file from the tasks? With this, I could reproduce the numbers in the table.

Best regards, Heitor

opened by Hguimaraes 7
Enabling lighthubert with setup.py?

Hello!

Thanks for the great work! My colleague @edward0804 and I are thinking about integrating lighthubert into S3PRL to enable more research. Instead of copying all the lighthubert code into S3PRL, we are wondering whether adding a setup.py in this repo would be a good alternative so that we can simply install it, enabling lighthubert in the S3PRL codebase, and link the interested user to this repo for the actual implementation.

I have made a minimal fork for this and so lighthubert can be installed in S3PRL after this commit s3prl/s3prl@07c5bd8692ce481cea5e0190c2cabf759300799b, and @edward0804 is working on adding a wrapper for lighthubert. Do you think it would be nice to add an official setup.py ? :)

Thanks!

opened by leo19941227 2
Can you consider to opensource the training code?

This work is great, and the performence of light hubert is even better than Hubert-Large(according to ). So I was wondering how to train a light hubert model. Can you opensource the training code?

opened by duj12 0

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

Related tags

Overview

LightHuBERT

Pre-Trained Models

Requirements and Installation

Load Pre-Trained Models for Inference

Universal Representation Evaluation on SUPERB

License

Reference

Contact Information

You might also like...

Learning to Estimate Hidden Motions with Global Motion Aggregation

Web service for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation based on OpenFace 2.0

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

FADNet++: Real-Time and Accurate Disparity Estimation with Configurable Networks

MAU: A Motion-Aware Unit for Video Prediction and Beyond, NeurIPS2021

Lightweight mmm - Lightweight (Bayesian) Media Mix Model

PIKA: a lightweight speech processing toolkit based on Pytorch and (Py)Kaldi

Quick program made to generate alpha and delta tables for Hidden Markov Models

Diverse Branch Block: Building a Convolution as an Inception-like Unit

Comments

Reproducing the Results from the SUPERB Leaderboard

Enabling lighthubert with setup.py?

Can you consider to opensource the training code?

Owner

WangRui

Unit-Convertor - Unit Convertor Built With Python

The official implementation of You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient.

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

VD-BERT: A Unified Vision and Dialog Transformer with BERT

I-BERT: Integer-only BERT Quantization

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

The source codes for ACL 2021 paper 'BoB: BERT Over BERT for Training Persona-based Dialogue Models from Limited Personalized Data'

Learning hidden low dimensional dyanmics using a Generalized Onsager Principle and neural networks

Facial Action Unit Intensity Estimation via Semantic Correspondence Learning with Dynamic Graph Convolution