Self-Supervised Speech Pre-training and Representation Learning Toolkit.

s3prl

Last update: Jan 8, 2023

Related tags

Deep Learning npc representation-learning tera cpc apc pase mockingjay self-supervised-learning speech-representation wav2vec speech-pretraining hubert vq-apc vq-wav2vec wav2vec2 decoar

Overview

What's New

Sep 2021: We host a challenge in AAAI workshop: The 2nd Self-supervised Learning for Audio and Speech Processing! See SUPERB official site for the challenge details and the SUPERB documentation in this toolkit!
Aug 2021: We now have a tutorial that introduces our toolkit, you can watch it on Youtube!
July 2021: We are now working on packaging s3prl and reorganizing the file structure in v0.3. Please consider using the stable v0.2.0 for now. We will test and release v0.3 before August.
June 2021: Support SUPERB: Speech processing Universal PERformance Benchmark, submitted to Interspeech 2021. Use the tag superb-interspeech2021 or v0.2.0.
June 2021: Support extracting multiple hidden states from the SSL pretrained models
Jan 2021: Readme updated with detailed instructions on how to use our latest version!
Dec 2020: We are migrating to a newer version for a more general, flexible, and scalable code. See the introduction below for more information! The legacy version can be accessed the tag v0.1.0.

Introduction and Usages

This is an open source toolkit called s3prl, which stands for Self-Supervised Speech Pre-training and Representation Learning. Self-supervised speech pre-trained models are called upstream in this toolkit, and are utilized in various downstream tasks.

The toolkit has three major usages:

Pretrain

Pretrain upstream models, including Mockingjay, Audio ALBERT and TERA.
Document: pretrain/README.md

Upstream

Easily load most of the existing upstream models with pretrained weights in a unified I/O interface.
Pretrained models are registered through torch.hub, which means you can use these models in your own project by one-line plug-and-play without depending on this toolkit's coding style.
Document: upstream/README.md

Downstream

Utilize upstream models in lots of downstream tasks
Benchmark upstream models with SUPERB Benchmark
Document: downstream/README.md

Below is an intuitive illustration on how this toolkit may help you:

Feel free to use or modify our toolkit in your research. Here is a list of papers using our toolkit. Any question, bug report or improvement suggestion is welcome through opening up a new issue.

If you find this toolkit helpful to your research, please do consider citing our papers, thanks!

Installation

Python >= 3.6
Install sox on your OS
Install s3prl

pip install -e ./

Install the specific fairseq

pip install fairseq@git+https://github.com//pytorch/fairseq.git@f2146bdc7abf293186de9449bfa2272775e39e1d#egg=fairseq

Some upstream models require special dependencies. If you encounter error with a specific upstream model, you can look into the README.md under each upstream folder. E.g., upstream/pase/README.md

Development pattern for contributors

Create a personal fork of the main S3PRL repository in GitHub.
Make your changes in a named branch different from master, e.g. you create a branch new-awesome-feature.
Contact us if you have any questions during development.
Generate a pull request through the Web interface of GitHub.
Please verify that your code is free of basic mistakes, we appreciate any contribution!

Reference Repositories

Pytorch, Pytorch.
Audio, Pytorch.
Kaldi, Kaldi-ASR.
Transformers, Hugging Face.
PyTorch-Kaldi, Mirco Ravanelli.
fairseq, Facebook AI Research.
CPC, Facebook AI Research.
APC, Yu-An Chung.
VQ-APC, Yu-An Chung.
NPC, Alexander-H-Liu.
End-to-end-ASR-Pytorch, Alexander-H-Liu
Mockingjay, Andy T. Liu.
ESPnet, Shinji Watanabe
speech-representations, aws lab
PASE, Santiago Pascual and Mirco Ravanelli
LibriMix, Joris Cosentino and Manuel Pariente

License

The majority of S3PRL Toolkit is licensed under CC-BY-NC, however portions of the project are available under separate license terms: S3PRL is licensed under the MIT license.

Used by

List of papers that used our toolkit (Feel free to add your own paper by making a pull request)

Self-Supervised Pretraining

Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders (Liu et al., 2020)

@article{mockingjay,
   title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
   ISBN={9781509066315},
   url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
   DOI={10.1109/icassp40776.2020.9054458},
   journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
   publisher={IEEE},
   author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
   year={2020},
   month={May}
}

TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech (Liu et al., 2020)

@misc{tera,
    title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
    author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
    year={2020},
    eprint={2007.06028},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation (Chi et al., 2020)

@inproceedings{audio_albert,
    title={Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation},
    author={Po-Han Chi and Pei-Hung Chung and Tsung-Han Wu and Chun-Cheng Hsieh and Shang-Wen Li and Hung-yi Lee},
    year={2020},
    booktitle={SLT 2020},
}

Explanability

Understanding Self-Attention of Self-Supervised Audio Transformers (Yang et al., 2020)

@inproceedings{understanding_sat,
    author={Shu-wen Yang and Andy T. Liu and Hung-yi Lee},
    title={{Understanding Self-Attention of Self-Supervised Audio Transformers}},
    year=2020,
    booktitle={Proc. Interspeech 2020},
    pages={3785--3789},
    doi={10.21437/Interspeech.2020-2231},
    url={http://dx.doi.org/10.21437/Interspeech.2020-2231}
}

Adversarial Attack

Defense for Black-box Attacks on Anti-spoofing Models by Self-Supervised Learning (Wu et al., 2020), code for computing LNSR: utility/observe_lnsr.py

@inproceedings{mockingjay_defense,
    author={Haibin Wu and Andy T. Liu and Hung-yi Lee},
    title={{Defense for Black-Box Attacks on Anti-Spoofing Models by Self-Supervised Learning}},
    year=2020,
    booktitle={Proc. Interspeech 2020},
    pages={3780--3784},
    doi={10.21437/Interspeech.2020-2026},
    url={http://dx.doi.org/10.21437/Interspeech.2020-2026}
}

Adversarial Defense for Automatic Speaker Verification by Cascaded Self-Supervised Learning Models (Wu et al., 2021)

@misc{asv_ssl,
    title={Adversarial defense for automatic speaker verification by cascaded self-supervised learning models}, 
    author={Haibin Wu and Xu Li and Andy T. Liu and Zhiyong Wu and Helen Meng and Hung-yi Lee},
    year={2021},
    eprint={2102.07047},
    archivePrefix={arXiv},
    primaryClass={eess.AS}

Voice Conversion

S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations (Lin et al., 2021)

@misc{s2vc,
      title={S2VC: A Framework for Any-to-Any Voice Conversion with Self-Supervised Pretrained Representations}, 
      author={Jheng-hao Lin and Yist Y. Lin and Chung-Ming Chien and Hung-yi Lee},
      year={2021},
      eprint={2104.02901},
      archivePrefix={arXiv},
      primaryClass={eess.AS}
}

Benchmark and Evaluation

SUPERB: Speech processing Universal PERformance Benchmark (Yang et al., 2021)

@misc{superb,
      title={SUPERB: Speech processing Universal PERformance Benchmark}, 
      author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
      year={2021},
      eprint={2105.01051},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Utilizing Self-supervised Representations for MOS Prediction (Tseng et al., 2021)

@misc{ssr_mos,
    title={Utilizing Self-supervised Representations for MOS Prediction}, 
    author={Wei-Cheng Tseng and Chien-yu Huang and Wei-Tsung Kao and Yist Y. Lin and Hung-yi Lee},
    year={2021},
    eprint={2104.03017},
    archivePrefix={arXiv},
    primaryClass={eess.AS}
}

}

Citation

If you find this toolkit useful, please consider citing following papers.

If you use our pre-training scripts, or the downstream tasks considered in TERA and Mockingjay, please consider citing the following:

@misc{tera,
  title={TERA: Self-Supervised Learning of Transformer Encoder Representation for Speech},
  author={Andy T. Liu and Shang-Wen Li and Hung-yi Lee},
  year={2020},
  eprint={2007.06028},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

@article{mockingjay,
   title={Mockingjay: Unsupervised Speech Representation Learning with Deep Bidirectional Transformer Encoders},
   ISBN={9781509066315},
   url={http://dx.doi.org/10.1109/ICASSP40776.2020.9054458},
   DOI={10.1109/icassp40776.2020.9054458},
   journal={ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
   publisher={IEEE},
   author={Liu, Andy T. and Yang, Shu-wen and Chi, Po-Han and Hsu, Po-chun and Lee, Hung-yi},
   year={2020},
   month={May}
}

If you use our organized upstream interface and features, or the SUPERB downstream benchmark, please consider citing the following:

@inproceedings{yang21c_interspeech,
  author={Shu-wen Yang and Po-Han Chi and Yung-Sung Chuang and Cheng-I Jeff Lai and Kushal Lakhotia and Yist Y. Lin and Andy T. Liu and Jiatong Shi and Xuankai Chang and Guan-Ting Lin and Tzu-Hsien Huang and Wei-Cheng Tseng and Ko-tik Lee and Da-Rong Liu and Zili Huang and Shuyan Dong and Shang-Wen Li and Shinji Watanabe and Abdelrahman Mohamed and Hung-yi Lee},
  title={{SUPERB: Speech Processing Universal PERformance Benchmark}},
  year=2021,
  booktitle={Proc. Interspeech 2021},
  pages={1194--1198},
  doi={10.21437/Interspeech.2021-1775}
}

Comments

module 'hub' has no attribute 'mockingjay_local'

Hello. I am trying to run the Mockingjay downstream task using this command python run_downstream.py -m train -u mockingjay_local -k '<path to .ckpt>' -d phone_linear -n mockingjayDown. on an HPC. I am getting the following error:

  File "run_downstream.py", line 225, in <module>
    main()
  File "run_downstream.py", line 220, in main
    runner = Runner(args, config)
  File "<path>/s3prl/downstream/runner.py", line 103, in __init__
    self.upstream = self._get_upstream()
  File "<path>/s3prl/downstream/runner.py", line 143, in _get_upstream
    Upstream = getattr(hub, self.args.upstream)
AttributeError: module 'hub' has no attribute 'mockingjay_local'

Please let me know how to resolve the issue or if I need to provide more details. Thanks!

opened by MiPlayer123 20

Speaker Diarization Scoring
Add NIST scoring for standard diarization error rate (der)

The results on three models (upstream + downstream):

baseline(fbank) + rnn 7.03

apc + rnn 7.20

wav2vec2 + rnn 4.36
opened by ftshijt 20
There are tasks that ESPNET does with S3PRL that fail

File "/media/shiyanshi/E/espnet/espnet2/asr/frontend/s3prl.py", line 26, in init import s3prl.nn ModuleNotFoundError: No module named 's3prl.nn' Error: S3PRL is not properly installed. Please install S3PRL: cd ${MAIN_ROOT}/tools && make s3prl.done

But S3PRL is successfully installed and can also be imported successfully in the terminal，How do I fix it?
enhancement

opened by abcdbosh 18
Upstream request: wavLM

I see WavLM now topped all of the SUPERB tasks (10 tasks). So, I would like to request to add this audio embedding to upstream.

Paper: https://arxiv.org/pdf/2110.13900.pdf Code/Model: https://github.com/microsoft/unilm/tree/master/wavlm

Currently, only base and base+ models are available; the large version will be added soon.

opened by bagustris 16
The model rewrite in config is not reflected

Hi, thank you for a great repository!

I'm running a downstream task in ER. I wanted to change the neural network CNNselfAttention to FCN, so I ran the following, but the network doesn't seem to have changed. It is reflected in the config*.yaml in /result/downstream/ExpName.　But the training results are the same as the default (CNNSelfattention)

・The code I ran python3 run_downstream.py -n ExpName -m train -u fbank -d emotion -c downstream/emotion/config.yaml -o "config.downstream_expert.modelrc.DeepModel.model_type='FCN'"

Excuse me, how can I change this to FCN?

opened by miyazakieiji 16
Why is such a large memory cost on gpu

Hello! I was tring to run an experiment of "Hubert + PR" using single gpu. I have noticed it that the task cost nearly 40+G memory on gpu when I start training. After training for some time, it has reported "cuda out of memory" and I have to stop the task. I encountered similar situation when I run the experiment of "Wavlm + ASR", which cost about 30G memory. Such a large memory cost didn't appear in other downstream tasks such as KS, IC. I ran all the experiments with a default config.yaml. So why does the task use so much memory? Is it normal?

opened by TCL606 15
Error while loading finetuned wav2vec 2.0 large

Hi, As per the ppt I try to load wav2vec2 with the following code and get the following error:

upstream = torch.hub.load("s3prl/s3prl",'wav2vec2_url',ckpt = 'https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_new.pt') Using cache found in /home/sreyan/.cache/torch/hub/s3prl_s3prl_master Using cache found in /home/sreyan/.cache/torch/hub/s3prl_cache/1c76d6e88090f01736036b28dc995fef583f47f42662d55286332557f957609f for https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_vox_new.pt Traceback (most recent call last): File "", line 1, in File "/home/sreyan/.conda/envs/semeval/lib/python3.7/site-packages/torch/hub.py", line 370, in load model = _load_local(repo_or_dir, model, *args, **kwargs) File "/home/sreyan/.conda/envs/semeval/lib/python3.7/site-packages/torch/hub.py", line 399, in _load_local model = entry(*args, **kwargs) File "/home/sreyan/.cache/torch/hub/s3prl_s3prl_master/upstream/wav2vec2/hubconf.py", line 23, in wav2vec2_url return wav2vec2_local(_urls_to_filepaths(ckpt, refresh=refresh), *args, **kwargs) File "/home/sreyan/.cache/torch/hub/s3prl_s3prl_master/upstream/wav2vec2/hubconf.py", line 14, in wav2vec2_local return _UpstreamExpert(ckpt, *args, **kwargs) File "/home/sreyan/.cache/torch/hub/s3prl_s3prl_master/upstream/wav2vec2/expert.py", line 24, in init model, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt]) File "/home/sreyan/fairseq/fairseq/checkpoint_utils.py", line 339, in load_model_ensemble_and_task state = load_checkpoint_to_cpu(filename, arg_overrides) File "/home/sreyan/fairseq/fairseq/checkpoint_utils.py", line 273, in load_checkpoint_to_cpu state = _upgrade_state_dict(state) File "/home/sreyan/fairseq/fairseq/checkpoint_utils.py", line 550, in _upgrade_state_dict state["cfg"] = convert_namespace_to_omegaconf(state["args"]) File "/home/sreyan/fairseq/fairseq/dataclass/utils.py", line 351, in convert_namespace_to_omegaconf with initialize(config_path=config_path): AttributeError: enter

I would want to finetune finetuned wav2vec 2.0 on speech sentiment task. Any help would be highly appreciated.

opened by Sreyan88 14
about distilhubert

when I run "python run_pretrain.py -u distiller -g pretrain/distiller/config_model.yaml -n distilhubert";

I got error " File "/home/wangsiyuan/kaldi-wavlm/s3prl-test/s3prl/pretrain/distiller/pretrain_expert.py", line 278, in forward teacher_hiddens = torch.stack(teacher_hiddens, dim=1) # B x N x T x D RuntimeError: stack expects each tensor to be equal size, but got [18, 302, 768] at entry 0 and [18, 301, 768] at entry 1"

Tests have shown that,The teacher model has 12 blocks, the 12th block is one frame away from the other blocks；

After Padding，another error occur , the compute loss denote that student model output is one frame away from the output of teacher model........

Other error: when I use multi GPU ,I got "IndexError: Caught IndexError in replica 0 on device 0." I use torch 1.9.0 or 1.10.1 +cu111,can not fix it

opened by c976237222 13
Integrate Hugging Face Hub & add Docker image
This PR implements two main features:

Integration with the 🤗 Hub for downstream fine-tuning.

The --hub flag allows users to pick any (suitable) upstream model from the PyTorch or 🤗 Hubs, while the --push_to_hf_hub flag pushes all the artifacts from fine-tuning to the 🤗 Hub for inference / evaluation.

A fine-tuning run with these flags looks like:

python run_downstream.py -n exp_dir -m train -u ${upstream_model} -d ${downstream_task} --hub huggingface --push_to_hf_hub True

Upstream models on the 🤗 Hub require an expert.py interface to be defined and you can find an example here.

Downstream models are automatically wrapped in a model.py file that defines the interface for inference and you can find an example here. By default we use the *best*.ckpt checkpoint for inference / evaluation and fall back to the final checkpoint if a "best" one is not produced during training.

By storing all the artifacts, we can visualize the Tensorboard logs and reproduce training runs if needed from the args_*.yaml and config_*.yaml files.

Update: the tensorboard logs are only visible for public repos and by default we create a private repo (in case participants don't want to share their fine-tuned models with everyone). The participant can view the logs by simply making their repo public if they wish

A Docker image for downstream fine-tuning

This builds on the above Hub integration and should be runnable on any infra that has the NVIDIA Container Toolkit installed. See the downstream README for more details on how to build the image / run it. Once this PR is merged, an interesting exercise will be to see if you can run the Docker container on your own infra 😃

Miscellaneous

We have also included some changes to:

The downstream README

The ASR and SD modules now include a template folder for the 🤗 Hub interfaces

cc @leo19941227
opened by lewtun 13
train downstream ASR using own upstream

Hi, I want to use the pertained model for downstream ASR task, however in the s3prl/downstream/asr/feat/ directory, there is no config file, is the ASR task properly configured? Thanks.

opened by zyzpower 13
(WIP) a better version of enhancement and separation downstream
Hi @leo19941227 , I am making the pull request for a better version of enhancement and separation downstream. In this pull request, I

Add two new configs which have a much smaller model size and better performance

Made some small changes to the code, including (1) modifying the loss function, supporting L1 loss, and computing loss in log domain (for smaller input scale and more stable training) (2) removing the original postprocess function. Originally, I found there are some issues when I am using librosa.istft, and I am using the postprocess function to remove the impulse at the end of the signal. Now, I have found a better way to deal with this issue.
opened by HuangZiliAndy 12
Is there no vq_apc local in s3prl?

Hi, I pre-trained the vq_apc model for comparison, but when I tried to extract the feature representation of vq_apc, it failed.

upstream=getattr(hub, 'vq_apc_local')('result/pretrain/vq_apc/states-epoch-50.ckpt')

Can you add vq_apc_local?

opened by kaen2891 0
SID task loss function.

ASV and SID tasks are very similar and yet have different loss functions. ASV has AMsoftmax, and SID has softmax loss function, respectively.

Why was this choice made? Furthermore, changing the loss function is acceptable or not?

opened by raotnameh 1
Bump setuptools from 59.5.0 to 65.5.1 in /requirements
Bumps setuptools from 59.5.0 to 65.5.1.

Release notes

Sourced from setuptools's releases.

v65.5.1

No release notes provided.

v65.5.0

No release notes provided.

v65.4.1

No release notes provided.

v65.4.0

No release notes provided.

v65.3.0

No release notes provided.

v65.2.0

No release notes provided.

v65.1.1

No release notes provided.

v65.1.0

No release notes provided.

v65.0.2

No release notes provided.

v65.0.1

No release notes provided.

v65.0.0

No release notes provided.

v64.0.3

No release notes provided.

v64.0.2

No release notes provided.

v64.0.1

No release notes provided.

v64.0.0

No release notes provided.

v63.4.3

No release notes provided.

v63.4.2

No release notes provided.

... (truncated)

Changelog

Sourced from setuptools's changelog.

v65.5.1

Misc ^^^^

#3638: Drop a test dependency on the mock package, always use :external+python:py:mod:unittest.mock -- by :user:hroncok

#3659: Fixed REDoS vector in package_index.

v65.5.0

Changes ^^^^^^^

#3624: Fixed editable install for multi-module/no-package src-layout projects.

#3626: Minor refactorings to support distutils using stdlib logging module.

Documentation changes ^^^^^^^^^^^^^^^^^^^^^

#3419: Updated the example version numbers to be compliant with PEP-440 on the "Specifying Your Project’s Version" page of the user guide.

Misc ^^^^

#3569: Improved information about conflicting entries in the current working directory and editable install (in documentation and as an informational warning).

#3576: Updated version of validate_pyproject.

v65.4.1

Misc ^^^^

#3613: Fixed encoding errors in expand.StaticModule when system default encoding doesn't match expectations for source files.

#3617: Merge with pypa/distutils@6852b20 including fix for pypa/distutils#181.

v65.4.0

Changes ^^^^^^^

#3609: Merge with pypa/distutils@d82d926 including support for DIST_EXTRA_CONFIG in pypa/distutils#177.

v65.3.0

... (truncated)

Commits

a462cb5 Bump version: 65.5.0 → 65.5.1

de35d8b Merge pull request #3656 from bmorris3/typos

58e23de Update changelog. Ref #3659.

43a9c9b Limit the amount of whitespace to search/backtrack. Fixes #3659.

5791343 Add test capturing failed expectation. Ref #3659.

1f97905 ⚫ Fade to black.

6254567 Remove workaround for emacs.

729b180 ⚫ Fade to black.

c068081 Typo corrections

f777a40 Suppress deprecation warning in --rsyncdir. Workaround for #3655.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
unspecifed upstream models

hello there are several unspecified upstream models in s3prl hub like: passt_base ssast_frame_base wav2vec2_base_s2st_en_librilight wav2vec2_conformer_large_s2st_en_librilight ,... can you provide an explanation for these models? is there a place for all the upstream models details?

opened by marziye-A 0
ContentVec support
Paper: ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers Code: https://github.com/auspicious3000/contentvec

Models: (the default is contentvec_km100)

contentvec_km100

contentvec_km500

The model architecture is identical to HuBERT Base, so only s3prl/upstream/hubert/hubconf.py is modified.
opened by vectominist 0

Releases(v0.3.4)

v0.3.4(May 27, 2022)

Emergent fix for an installation bug
Source code(tar.gz)
Source code(zip)
s3prl-0.3.4.tar.gz(284.90 KB)
v0.3.3(May 23, 2022)
Add new models: discretebert, lighthubert, data2vec

upgrade to use the latest fairseq

Source code(tar.gz)
Source code(zip)
s3prl-0.3.3.tar.gz(285.94 KB)

Owner

s3prl

The Self-Supervised Speech Pre-training and Representation Learning Toolkit Development Team

GitHub https://youtu.be/PkMFnS6cjAc

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

2 Sep 15, 2022

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

Contrast to Divide: self-supervised pre-training for learning with noisy labels This is an official implementation of "Contrast to Divide: self-superv

55 Nov 23, 2022

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

92 Nov 30, 2022

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Pre-trained (foundation) models across tasks (understanding, generation and translation), languages (100+ languages), and modalities (language, image, audio, vision + language, audio + language, etc.)

7.6k Jan 1, 2023

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification [pdf] The official repository for Self-Supervised Pre-Training for Transfo

45 Dec 3, 2021

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training [Arxiv] VideoMAE: Masked Autoencoders are Data-Efficient Learne

Multimedia Computing Group, Nanjing University

697 Jan 7, 2023

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Dense Contrastive Learning for Self-Supervised Visual Pre-Training This project hosts the code for implementing the DenseCL algorithm for se

491 Jan 3, 2023

Self-supervised learning on Graph Representation Learning (node-level task)

graph_SSL Self-supervised learning on Graph Representation Learning (node-level task) How to run the code To run GRACE, sh run_GRACE.sh To run GCA, sh

3 Dec 31, 2021

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning [CVPR'21, Oral] By Zhicheng Huang*, Zhaoyang Zeng*, Yupan H

196 Dec 13, 2022

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

Self-supervised Graph-level Representation Learning with Local and Global Structure Introduction This project is an implementation of ``Self-supervise

50 Dec 9, 2022

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning This is the official PyTorch implementation for UniMoCo pape

49 Jan 2, 2023

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

Context Matters: Graph-based Self-supervised Representation Learning for Medical Images Official PyTorch implementation for paper Context Matters: Gra

49 Nov 23, 2022

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

TBE The source code for our paper "Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Le

150 Dec 28, 2022

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation This is a demo implementation of BYOL for Audio (BYOL-A), a self-sup

160 Jan 4, 2023

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND

32 Jan 2, 2023

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"

SubTab: Author: Talip Ucar ([email protected]) The official implementation of the paper, SubTab: Subsetting Features of Tabular Data for Self-Supervis

98 Dec 29, 2022

Self-Supervised Speech Pre-training and Representation Learning Toolkit.

Related tags

Overview

What's New

Introduction and Usages

Pretrain

Upstream

Downstream

Installation

Development pattern for contributors

Reference Repositories

License

Used by

Self-Supervised Pretraining

Explanability

Adversarial Attack

Voice Conversion

Benchmark and Evaluation

Citation

Comments

v65.5.1

v65.5.0

v65.4.1

v65.4.0

v65.3.0

v65.2.0

v65.1.1

v65.1.0

v65.0.2

v65.0.1

v65.0.0

v64.0.3

v64.0.2

v64.0.1

v64.0.0

v63.4.3

v63.4.2

v65.5.1

v65.5.0

v65.4.1

v65.4.0

Releases(v0.3.4)

v0.3.4(May 27, 2022)

v0.3.3(May 23, 2022)

Owner

s3prl

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

PyTorch implementation of "Contrast to Divide: self-supervised pre-training for learning with noisy labels"

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

UniLM AI - Large-scale Self-supervised Pre-training across Tasks, Languages, and Modalities

Self-Supervised Pre-Training for Transformer-Based Person Re-Identification

VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

Self-supervised learning on Graph Representation Learning (node-level task)

[CVPR'21 Oral] Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning

Implementation of Self-supervised Graph-level Representation Learning with Local and Global Structure (ICML 2021).

UniMoCo: Unsupervised, Semi-Supervised and Full-Supervised Visual Representation Learning

Official PyTorch implementation for paper Context Matters: Graph-based Self-supervised Representation Learning for Medical Images

[CVPR2021] The source code for our paper 《Removing the Background by Adding the Background: Towards Background Robust Self-supervised Video Representation Learning》.

BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

Code for the paper "Spatio-temporal Self-Supervised Representation Learning for 3D Point Clouds" (ICCV 2021)

A self-supervised 3D representation learning framework named viewpoint bottleneck.

A self-supervised 3D representation learning framework named viewpoint bottleneck.

The official implementation of the paper, "SubTab: Subsetting Features of Tabular Data for Self-Supervised Representation Learning"