The project is associated with the recently-launched ICASSP 2022 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT) to provide participants with baseline systems for speech recognition and speaker diarization in conference scenario.

Overview

M2MeT challenge baseline -- AliMeeting

This project provides the baseline system recipes for the ICASSP 2020 Multi-channel Multi-party Meeting Transcription Challenge (M2MeT). The challenge mainly consists of two tracks, named Automatic Speech Recognition (ASR) and Speaker Diarization. For each track, detailed descriptions can be found in its corresponding directory. The goal of this project is to simplify the training and evaluation procedures and make it flexible for participants to reproduce the baseline experiments and develop novelty methods.

Setup

git clone https://github.com/yufan-aslp/AliMeeting.git

Introduction

General steps

  1. Prepare the training data for speaker diarization and ASR model, respectively
  2. Follow the running steps of the speaker diarization experiment and obtain the rttm file. The rttm file includes the voice activity detection (VAD) and speaker diarization results, which will be used to compute the final Diarization Error Rate (DER) scores.
  3. For ASR track, we can train the single-speaker or multi-speaker ASR models. The evaluation metric of ASR systems is Character Error Rate (CER).

Citation

If you use the challenge dataset or our baseline systems, please consider citing the following:

@article{yu2021m2met,
title={M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge},
author={Yu, Fan and Zhang, Shiliang and Fu, Yihui and Xie, Lei and Zheng, Siqi and Du, Zhihao and Huang, Weilong and Guo, Pengcheng and Yan, Zhijie and Ma, Bin and others},
journal={arXiv preprint arXiv:2110.07393},
year={2021}
}

Our paper is available at https://arxiv.org/abs/2110.07393

The data download method will be sent to registered challenge participants via email.

Organizing Committee

Contributors

Code license

Apache 2.0

Comments
  • Speaker Diarization Usage

    Speaker Diarization Usage

    Hi~

    In your main stage 3 say that Use scripts/segment_to_lab.sh to change the file format, but the file is empty......

    in your code use scripts/segment_to_lab.py directly.

    I think this lab format tranform is done, but I still meet some problem.

    run.pl: 4 / 4 failed, log is in ./data/Eval_Ali_far/dia_part/exp/extract_embedding.*.log

    Log in extract_embedding.1.log:

    INFO:__main__:Start: Processing file R8001_M8004_MS801:
    
    filenames: ['R8001_M8004_MS801' 'R8003_M8001_MS801']
    Finished the feature extracting (25181600, 8)
    ^M  0%|          | 0/251 [00:00<?, ?it/s]^M  0%|          | 0/251 [00:00<?, ?it/s]
    INFO:__main__:End:   Processing file R8001_M8004_MS801: Elapsed: 4.490773439407349 seconds
    Traceback (most recent call last):
      File "/mnt/HDD/HDD2/DTDwind/vbx_new/AliMeeting/speaker/VBx/predict.py", line 176, in <module>
        fea = features.fbank_htk(seg, window, noverlap, fbank_mx, USEPOWER=True, ZMEANSOURCE=True)
      File "/mnt/HDD/HDD2/DTDwind/vbx_new/AliMeeting/speaker/VBx/features.py", line 101, in fbank_htk
        x *= window
    ValueError: operands could not be broadcast together with shapes (322,400,8) (400,) (322,400,8)
    # Accounting: time=7 threads=1
    # Ended (code 1) at Fri Aug  5 17:28:33 CST 2022, elapsed time 7 seconds
    

    Do you know what happen and have any suggest?

    Thanks!

    opened by DTDwind 5
  • Could you please share the room config of 13 recording rooms?

    Could you please share the room config of 13 recording rooms?

    Could you please share the room config of 13 rooms mentioned in the challenge description paper? It is not released on the OpenSLR. https://www.openslr.org/119

    Thank you!

    opened by anogkongda 3
  • Corpus release plans

    Corpus release plans

    I was wondering if there are any plans to make the data publicly available, now that the challenge is over? I have prepared a Lhotse recipe for the dataset here but I am waiting to push it to master until the corpus is officially available.

    opened by desh2608 2
  • Reproduce result fail

    Reproduce result fail

    I wanna to reproduce your work, but the results was very terrible......

    speaker_all_DER_overlaps_0.log

    ile                  DER     JER    B3-Precision    B3-Recall    B3-F1    GKT(ref, sys)    GKT(sys, ref)    H(ref|sys)    H(sys|ref)    MI    NMI
    -----------------  ------  ------  --------------  -----------  -------  ---------------  ---------------  ------------  ------------  ----  -----
    R8001_M8004_MS801  102.74   99.74            0.76         0.99     0.86             0.00             0.00          0.94          0.05  0.00   0.00
    R8003_M8001_MS801  102.83   99.77            0.78         0.99     0.87             0.00             0.00          0.87          0.05  0.00   0.00
    R8007_M8010_MS803  102.35   99.82            0.77         0.99     0.87             0.00             0.00          0.97          0.05  0.00   0.00
    R8007_M8011_MS806  133.25   99.37            0.77         0.89     0.83             0.00             0.00          0.89          0.33  0.00   0.00
    R8008_M8013_MS807  101.34   99.89            0.79         1.00     0.88             0.00             0.00          0.76          0.02  0.00   0.00
    R8009_M8018_MS809  103.67   99.72            0.79         0.99     0.88             0.00             0.00          0.67          0.05  0.00   0.00
    R8009_M8019_MS810  100.21   99.93            0.74         1.00     0.85             0.00             0.00          0.79          0.01  0.00   0.00
    R8009_M8020_MS810  100.33  100.00            0.79         1.00     0.88             0.00             0.00          0.66          0.01  0.00   0.00
    *** OVERALL ***    106.07   99.75            0.78         0.98     0.87             0.98             0.75          0.82          0.07  2.99   0.88
    

    I think that maybe something is wrong but I don't what happened....

    the log seem fine.

    Prepare Alimeeting data
    fix_data_dir.sh: kept all 8 utterances.
    fix_data_dir.sh: old files are kept in ./data/Eval_Ali_far/sad_part/.backup
    steps/make_mfcc.sh --nj 4 --cmd run.pl -q all.q --mem 4G --mfcc-config conf/mfcc_hires.conf ./data/Eval_Ali_far/sad_part ./data/Eval_Ali_far/sad_part/make_mfcc ./data/Eval_Ali_far/sad_part/feat/mfcc
    utils/validate_data_dir.sh: Successfully validated data-directory ./data/Eval_Ali_far/sad_part
    steps/make_mfcc.sh: [info]: no segments file exists: assuming wav.scp indexed by utterance.
    steps/make_mfcc.sh: Succeeded creating MFCC features for sad_part
    Do SAD
    --nj 4 --stage 0 --cmd run.pl -q all.q --mem 4G ./data/Eval_Ali_far/sad_part exp/segmentation_1a/tdnn_stats_sad_1a/ ./data/Eval_Ali_far/sad_part/feat/mfcc ./data/Eval_Ali_far/sad_part/exp ./data/Eval_Ali_far/sad_part/sad
    diff: ./data/Eval_Ali_far/sad_part/exp/final.raw: No such file or directory
    ./data/Eval_Ali_far/sad_part
    steps/nnet3/compute_output.sh --nj 4 --cmd run.pl -q all.q --mem 4G --iter final --extra-left-context 0 --extra-right-context 0 --extra-left-context-initial -1 --extra-right-context-final -1 --frames-per-chunk 150 --apply-exp true --frame-subsampling-factor 3 ./data/Eval_Ali_far/sad_part ./data/Eval_Ali_far/sad_part/exp ./data/Eval_Ali_far/sad_part/exp/sad_sad
    utils/data/get_utt2dur.sh: ./data/Eval_Ali_far/sad_part/utt2dur already exists with the expected length.  We won't recompute it.
    utils/data/get_utt2dur.sh: ./data/Eval_Ali_far/sad_part/utt2dur already exists with the expected length.  We won't recompute it.
    utils/data/subsegment_data_dir.sh: note: frame shift is 0.01 [affects feats.scp]
    utils/data/get_utt2num_frames.sh: ./data/Eval_Ali_far/sad_part/utt2num_frames already present!
    utils/data/subsegment_data_dir.sh: subsegmented data from ./data/Eval_Ali_far/sad_part to ./data/Eval_Ali_far/sad_part/sad_seg
    local/segmentation/detect_speech_activity.sh: Created output segmented kaldi data directory in ./data/Eval_Ali_far/sad_part/sad_seg
    Do Speaker Embedding Extractor
    Collect 8 utt2segments in file ./data/Eval_Ali_far/sad_part/sad_seg/segments
    Write 8 labels
    success
    Do the Speaker Embedding Cluster
    Process textgrid to obtain rttm label
    Get DER result
    Loading speaker turns from reference RTTMs...
    Loading speaker turns from system RTTMs...
    WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
    Trimming reference speaker turns to UEM scoring regions...
    Trimming system speaker turns to UEM scoring regions...
    Checking for overlapping reference speaker turns...
    WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
    WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
    WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
    Checking for overlapping system speaker turns...
    Scoring...
    Loading speaker turns from reference RTTMs...
    Loading speaker turns from system RTTMs...
    WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
    Trimming reference speaker turns to UEM scoring regions...
    Trimming system speaker turns to UEM scoring regions...
    Checking for overlapping reference speaker turns...
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
    Checking for overlapping system speaker turns...
    Scoring...
    Loading speaker turns from reference RTTMs...
    Loading speaker turns from system RTTMs...
    WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
    Trimming reference speaker turns to UEM scoring regions...
    Trimming system speaker turns to UEM scoring regions...
    Checking for overlapping reference speaker turns...
    Checking for overlapping system speaker turns...
    Scoring...
    Loading speaker turns from reference RTTMs...
    Loading speaker turns from system RTTMs...
    WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
    Trimming reference speaker turns to UEM scoring regions...
    Trimming system speaker turns to UEM scoring regions...
    Checking for overlapping reference speaker turns...
    WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
    WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
    WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
    Checking for overlapping system speaker turns...
    Scoring...
    Loading speaker turns from reference RTTMs...
    Loading speaker turns from system RTTMs...
    WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
    Trimming reference speaker turns to UEM scoring regions...
    Trimming system speaker turns to UEM scoring regions...
    Checking for overlapping reference speaker turns...
    WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
    WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
    WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
    Checking for overlapping system speaker turns...
    Scoring...
    Loading speaker turns from reference RTTMs...
    Loading speaker turns from system RTTMs...
    WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
    Trimming reference speaker turns to UEM scoring regions...
    Trimming system speaker turns to UEM scoring regions...
    Checking for overlapping reference speaker turns...
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 1 n_turns_pre: 390 n_turns_post: 376
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8018_MS809, SPEAKER: 2 n_turns_pre: 275 n_turns_post: 268
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 1 n_turns_pre: 493 n_turns_post: 458
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8019_MS810, SPEAKER: 2 n_turns_pre: 466 n_turns_post: 442
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 1 n_turns_pre: 509 n_turns_post: 487
    WARNING: Merging overlapping speaker turns. FILE: R8009_M8020_MS810, SPEAKER: 2 n_turns_pre: 365 n_turns_post: 345
    Checking for overlapping system speaker turns...
    Scoring...
    Loading speaker turns from reference RTTMs...
    Loading speaker turns from system RTTMs...
    WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
    Trimming reference speaker turns to UEM scoring regions...
    Trimming system speaker turns to UEM scoring regions...
    Checking for overlapping reference speaker turns...
    Checking for overlapping system speaker turns...
    Scoring...
    Loading speaker turns from reference RTTMs...
    Loading speaker turns from system RTTMs...
    WARNING: No universal evaluation map specified. Approximating from reference and speaker turn extents...
    Trimming reference speaker turns to UEM scoring regions...
    Trimming system speaker turns to UEM scoring regions...
    Checking for overlapping reference speaker turns...
    WARNING: Merging overlapping speaker turns. FILE: R8001_M8004_MS801, SPEAKER: 1 n_turns_pre: 277 n_turns_post: 276
    WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 1 n_turns_pre: 196 n_turns_post: 194
    WARNING: Merging overlapping speaker turns. FILE: R8007_M8011_MS806, SPEAKER: 2 n_turns_pre: 170 n_turns_post: 169
    Checking for overlapping system speaker turns...
    Scoring...
    

    When I try to see the rttm file i find that the segment very few......

    R8009_M8020_MS810.rttm

    SPEAKER R8009_M8020_MS810 1 5220.040000 1.600000 <NA> <NA> 1 <NA> <NA>
    SPEAKER R8009_M8020_MS810 1 9910.240000 1.120000 <NA> <NA> 1 <NA> <NA>
    SPEAKER R8009_M8020_MS810 1 9943.990000 1.360000 <NA> <NA> 1 <NA> <NA>
    SPEAKER R8009_M8020_MS810 1 10217.830000 1.270000 <NA> <NA> 1 <NA> <NA>
    SPEAKER R8009_M8020_MS810 1 15154.510000 0.790000 <NA> <NA> 1 <NA> <NA>
    

    And then I check the lab file.

    R8009_M8020_MS810.lab

    5220.04 5221.64 sp
    9910.24 9911.36 sp
    9943.99 9945.35 sp
    10217.83 10219.1 sp
    15154.51 15155.3 sp
    

    I think that maybe the SAD stage has some problem. But my SAD model download from path. Then, move the exp directory to the speaker directory. This is follow the usage.

    My segments file.

    May you help me? If you need more information, please contact me. Thank you very much.

    opened by DTDwind 0
  • Inference error on custom file having 2 speakers.

    Inference error on custom file having 2 speakers.

    filenames: ['aaaa']
    Finished the feature extracting (12921856, 2)
    
      0%|          | 0/174 [00:00<?, ?it/s]
      0%|          | 0/174 [00:00<?, ?it/s]
    INFO:__main__:End:   Processing file aaaa: Elapsed: 1.611116647720337 seconds
    Traceback (most recent call last):
      File "VBx/predict.py", line 176, in <module>
        fea = features.fbank_htk(seg, window, noverlap, fbank_mx, USEPOWER=True, ZMEANSOURCE=True)
      File "/hdd/saumya/AliMeeting/speaker/VBx/features.py", line 101, in fbank_htk
        x *= window
    ValueError: operands could not be broadcast together with shapes (182,400,2) (400,) (182,400,2) 
    # Accounting: time=5 threads=1
    # Ended (code 1) at Thu Nov 18 16:16:36 IST 2021, elapsed time 5 seconds
    
    

    path: data/test/dia_part/exp/extract_embedding.1.log While inferencing on file 'aaaa.wav' the script fails at extracting embeddings. Can someone help?

    opened by saumyaborwankar 2
Owner
yufan
yufan
This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

LEAP Lab 2 Sep 15, 2022
ERISHA is a mulitilingual multispeaker expressive speech synthesis framework. It can transfer the expressivity to the speaker's voice for which no expressive speech corpus is available.

ERISHA: Multilingual Multispeaker Expressive Text-to-Speech Library ERISHA is a multilingual multispeaker expressive speech synthesis framework. It ca

Ajinkya Kulkarni 43 Nov 27, 2022
Code for the ICASSP-2021 paper: Continuous Speech Separation with Conformer.

Continuous Speech Separation with Conformer Introduction We examine the use of the Conformer architecture for continuous speech separation. Conformer

Sanyuan Chen (陈三元) 81 Nov 28, 2022
Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time.

BBB Face Recognizer Face recognition system using MTCNN, FACENET, SVM and FAST API to track participants of Big Brother Brasil in real time. Instalati

Rafael Azevedo 232 Dec 24, 2022
Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

midon 1 Jan 14, 2022
Pytorch implementation of ICASSP 2022 paper Attention Probe: Vision Transformer Distillation in the Wild

Attention Probe: Vision Transformer Distillation in the Wild Jiahao Wang, Mingdeng Cao, Shuwei Shi, Baoyuan Wu, Yujiu Yang In ICASSP 2022 This code is

IIGROUP 6 Sep 21, 2022
A repository with exploration into using transformers to predict DNA ↔ transcription factor binding

Transcription Factor binding predictions with Attention and Transformers A repository with exploration into using transformers to predict DNA ↔ transc

Phil Wang 62 Dec 20, 2022
"MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction" (CVPRW 2022) & (Winner of NTIRE 2022 Challenge on Spectral Reconstruction from RGB)

MST++: Multi-stage Spectral-wise Transformer for Efficient Spectral Reconstruction (CVPRW 2022) Yuanhao Cai, Jing Lin, Zudi Lin, Haoqian Wang, Yulun Z

Yuanhao Cai 274 Jan 5, 2023
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model

SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model Edresson Casanova, Christopher Shulby, Eren Gölge, Nicolas Michael Müller, Frede

Edresson Casanova 92 Dec 9, 2022
PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

StyleSpeech - PyTorch Implementation PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation. Status (2021.06.13

Keon Lee 140 Dec 21, 2022
Official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch.

Multi-speaker DGP This repository provides official implementation of deep Gaussian process (DGP)-based multi-speaker speech synthesis with PyTorch. O

sarulab-speech 24 Sep 7, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
A commany has recently introduced a new type of bidding, the average bidding, as an alternative to the bid given to the current maximum bidding

Business Problem A commany has recently introduced a new type of bidding, the average bidding, as an alternative to the bid given to the current maxim

Kübra Bilinmiş 1 Jan 15, 2022
Code for Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022)

Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? (SDM 2022) We consider how a user of a web servi

joisino 20 Aug 21, 2022
Codes for our IJCAI21 paper: Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization

DDAMS This is the pytorch code for our IJCAI 2021 paper Dialogue Discourse-Aware Graph Model and Data Augmentation for Meeting Summarization [Arxiv Pr

xcfeng 55 Dec 27, 2022
SpecAugmentPyTorch - A Pytorch (support batch and channel) implementation of GoogleBrain's SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

SpecAugment An implementation of SpecAugment for Pytorch How to use Install pytorch, version>=1.9.0 (new feature (torch.Tensor.take_along_dim) is used

IMLHF 3 Oct 11, 2022
Learning the Beauty in Songs: Neural Singing Voice Beautifier; ACL 2022 (Main conference); Official code

Learning the Beauty in Songs: Neural Singing Voice Beautifier Jinglin Liu, Chengxi Li, Yi Ren, Zhiying Zhu, Zhou Zhao Zhejiang University ACL 2022 Mai

Jinglin Liu 257 Dec 30, 2022