A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

Overview

Chimera: Learning Shared Semantic Space for Speech-to-Text Translation


This is a Pytorch implementation for the "Chimera" paper Learning Shared Semantic Space for Speech-to-Text Translation https://arxiv.org/abs/2105.03095 (accepted by ACL Findings 2021), which aims to bridge the modality gap by unifying the task of MT (textual Machine Translation) and ST (Speech-to-Text Translation). It has achieved new SOTA performance on all 8 language pairs in MuST-C benchmark, by utilizing an external MT corpus.


This repository is up to now a nightly version, and is bug-prone because of code refactoring. Also it is not fully tested on configurations other than the authors' working environment yet. However, we encourage you to first have a look at the results and model codes to get a general impression of what this project is about.

The code base is forked from FairSeq repository https://github.com/pytorch/fairseq.git (without an actual forking operation) in Septempber 2020. It than lags behind the later updates in FairSeq, and both the codes and checkpoints are not compatible with currect Fairseq version. You will need to modify the model codes for checkpoint configurations if you want to follow the new FairSeq codes.

CONTRIBUTION: You are also more than welcomed to test our code on your machines, and report feedbacks on results, bugs and performance!



Results

Our model (Chimera) achieves new state-of-the-art results on all 8 language pairs on MuST-C:

Direction EN-DE EN-FR EN-RU EN-ES EN-IT EN-RO EN-PT EN-NL
BLEU 26.3 35.6 17.4 30.6 25.0 24.0 30.2 29.2

Chimera novelly learns M distinct "memories" to store specific types of semantic information from both audio and text inputs. Shown below is a visualization of the "Memories" learned by Chimera-16, which is a variant with M = 16. Each learned cluster represents a individual type of information, while each marker is a sentence sample. "+" and "." means text and audio samples, respectively.

We can see more clearly from below (left) that memories learn a well-clustered semantic space, forming a "semantic" alignment (rather than spatial) between audio and text inputs, while ignoring the modality differences.

On the right, we zoom in to focus one cluster in specific, and it can be easily observed that the vectors are well structured as well, with inputs with (probably one of) similar semantic features close in space to each other.

We can even focus on one instance of translation, and see how the memories works. Shown below visualizes the alignment between audio attention and text attention, which tightly gather around the diagonal line. Different colors represents different memories, which attend to different semantic segments of sentence / audio as shown in the figure.



Trained Checkpoints

Our trained checkpoints are available at:

Translation Direction filename External url
English-to-Deutsch Chimera_EN2DE.pt http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera/Chimera_EN2DE.pt
English-to-French Chimera_EN2FR.pt http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera/Chimera_EN2FR.pt
English-to-Russian Chimera_EN2RU.pt http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera/Chimera_EN2RU.pt
English-to-Espanol Chimera_EN2ES.pt http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera/Chimera_EN2ES.pt
English-to-Italiano Chimera_EN2IT.pt http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera/Chimera_EN2IT.pt
English-to-Romanian Chimera_EN2RO.pt http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera/Chimera_EN2RO.pt
English-to-Portuguese Chimera_EN2PT.pt http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera/Chimera_EN2PT.pt
English-to-Dutch Chimera_EN2NL.pt http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera/Chimera_EN2NL.pt



Interactive Translation

You can download any one checkpoint mentioned above to local, and translate local audios (only .wav files supported) to another language! To do this, you only need to run the model in an interactive mode. For example, you want to translate from English to Deutsh (DE) with an already trained checkpoint at $CHECKPOINT:

bash run.sh --script chimera/scripts/interactive-en2any-ST.sh \
    --target de --checkpoint $CHECKPOINT

The program will prompt an input file name like this:

2021-04-02 10:00:00 | INFO | fairseq_cli.interactive | Type the input sentence and press return:

After inputing the file name, the program will translate outputs like:

H-0     -1.0      ▁Nach ▁dem ...
D-0     -1.0      Nach dem ...
P-0     -1.0000 -1.0000 ...

NOTE: Do not input a file too large. Normally the model can translate 1~5 normal-length sentences in one time. If the input sentence is too long, the program could crash.

To exit the interactive mode, you only need to input an invalid file name.

To translate to other languages, remember to replace de with their language codes (in lower case):

Language Code
Deutsch (German) DE / de
French FR / fr
Espanol (Spanish) ES / es
Russian RU / ru
Italiano (Italian) IT / it
Romanian RO / ro
Portuguese PT / pt
Dutch (Netherlands) NL / nl



Training a Model on MuST-C

Let's first take a look at training an English-to-Deutsch model as an example.

Data Preparation

  1. Prerequisites and Configuration First check that requirements are met for pip in requirements.txt and for apt in apt-requirements.txt. Some items in the two files may be redundant, but we haven't got time to check and eliminate them.

For configuration, please set the global variables of $WMT_ROOT, $MUSTC_ROOT and SAVE_ROOT These will be where to put the datasets and checkpoints. For example:

export MUSTC_ROOT="speech_data/mustc"
export WMT_ROOT="wmt_data"
export SAVE_ROOT="checkpoints"
export target=de
mkdir -p $MUSTC_ROOT $WMT_ROOT $SAVE_ROOT

NOTE: This simple configuration is a prerequisite for most of the following steps. Here export target=de means the translation direction is English to Deutsch.

  1. Download and uncompress the EN-to-DE MuST-C dataset to $MUSTC_ROOT/en-$target. TIP: to speed up uncompressing a file too large, you can replace tar xzvf with: pigz -dc $TARFILE | tar xvf -

  2. Download the WMT to $WMT_ROOT/orig via:

bash chimera/prepare_data/download-wmt.sh --wmt14 --data-dir $WMT_ROOT --target $target

This may sometimes be too slow as the connection to statmt.org is not steady in some places. In this case you can turn to other faster download sources if possible.

  1. Append MuST-C text data to $WMT_ROOT, and prepare the datasets and produce a joint spm dictionary:
bash chimera/prepare_data/prepare-wmt-en2any.sh \
    --data-dir $WMT_ROOT --wmt14 --original-dev \
    --external mustc --target $target --subword spm
python3 chimera/prepare_data/prep_mustc_data.py \
    --data-root $MUSTC_ROOT --task wave \
    --ignore_fbank80 --joint_spm wmt14-en-$target-spm \
    --languages $target --vocab-type unigram --vocab-size 10000

NOTE: if the first command is executed correctly, you will see one line in the output:

Existing spm dictionary chimera/resources/wmt14-en-de-spm detected. Copying...

If not, the program will still produce one dictionary on the run and reports No existing spm detected. Learning unigram spm on wmt14_en_de/tmp/train.de-en ... This is okay in most cases, with the only risk being a potential mismatch to already trained checkpoints we provided.

Training

To reproduce the results in the last row in Figure 1 in paper, you can directly use the training scripts available as follows.

  1. Pre-training on MT data:
bash run.sh --script chimera/scripts/train-en2any-MT.sh \
    --target $target --dataset wmt14 --max_updates 500000

If you like, you can specify some arguments other than default values. The default setting is --seed 1 --num-gpus 8, which makes the command look like bash run.sh --script chimera/scripts/train-en2$target-MT.sh --seed 1 --num-gpus 8. Value for --num-gpus is recommended to be power of 2, and smaller than 8, e.g. {1, 2, 4, 8}.

  1. Fine-tuning on MuST-C data:
bash run.sh --script chimera/scripts/train-en2any-ST.sh \
    --target $target --dataset wmt14 --max_updates 150000

This script moves the MT-pre-trained model from ${MT_SAVE_DIR}/checkpoint_best.pt to ${ST_SAVE_DIR} as a initialization for ST fine-tuning.

Optionally, if you need to resume a single ST training, you can add argument --resume to the command to avoid overwriting the existing ${ST_SAVE_DIR}/checkpoint_last.pt.

The scripts in step 4 and 5 forks a separate background evaluation process while running. The process monitors $MT_SAVE_ROOT or $ST_SAVE_ROOT and evaluates any new checkpoints. Don't worry, it will be automatically killed after the training finishes, unless the script is Ctrl-C'ed, in which case, you can manually raise the suicide flag by touch chimera/tools/auto-generate-suicide.code to kill the background generation process.

Note that this automatic process only evaluates a single checkpoint (with no averaging), and with a low beam width.

  1. Averaging Checkpoints and Evaluate It

Suppose the best ST checkpoint is at epoch $BEST_EPOCH, and we want to averaging 7 checkpoints around it.

python3 chimera/tools/eval-average-checkpoint.py \
    --ckpt-dir $ST_SAVE_ROOT --number-of-ckpts 7 \
    --center-of-ckpts $BEST_EPOCH

Other Language Pairs

For language pairs English-to-{French, Russian, Espanol}, you only need to replace the export target=de with {fr, ru, es} in step 0, and then run the steps 1~5.

For language pairs English-to-{Italiano, Portuguese, Dutch, Romanian}, the MT data is different, so we need to modify Step 2 and 3. All other Steps remains unchanged.

English to Romanian

For Romanian, we use WMT16 corpora in our paper.

The Step 2 changes to

bash chimera/prepare_data/download-wmt.sh --wmt16 --data-dir $WMT_ROOT --target ro

Step 3 remains unchanged.

English to {Italiano, Portuguese, Dutch}

These language pairs uses OPUS100 as external MT corpora.

The Step 2 changes to

bash chimera/prepare_data/download-opus100.sh --data-dir $WMT_ROOT

Step 3 changes to

bash chimera/prepare_data/prepare-opus100-en2any.sh \
    --data-dir $WMT_ROOT --original-dev \
    --external mustc --target $target --subword spm
python3 chimera/prepare_data/prep_mustc_data.py \
    --data-root $MUSTC_ROOT --task wave \
    --ignore_fbank80 --joint_spm wmt14-en-$target-spm \
    --languages $target --vocab-type unigram --vocab-size 10000

Actually, only the first command of Step 3 changes.

Evaluating a Checkpoint

You can also manually evaluate the performance of any one checkpoint on MuST-C test set. Suppose the path to your checkpoint is $CHECKPOINT

target=de bash chimera/generate/generate-mustc-final.sh $CHECKPOINT



License

Part of codes (especially codes outside chimera/) is adapted from FAIRSEQ code base, therefore carrying the MIT License of its original codes. See NOTICE.md for more details.

Citation

Please cite as:

@article{han2021learning,
  title={Learning Shared Semantic Space for Speech-to-Text Translation},
  author={Han, Chi and Wang, Mingxuan and Ji, Heng and Li, Lei},
  journal={arXiv preprint arXiv:2105.03095},
  year={2021}
}
You might also like...
A2T: Towards Improving Adversarial Training of NLP Models (EMNLP 2021 Findings)

A2T: Towards Improving Adversarial Training of NLP Models This is the source code for the EMNLP 2021 (Findings) paper "Towards Improving Adversarial T

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks
Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks arXiv link: upcoming To be published in Findings of NA

Code for ACL 2021 main conference paper
Code for ACL 2021 main conference paper "Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances".

Conversations are not Flat: Modeling the Intrinsic Information Flow between Dialogue Utterances This repository contains the code and pre-trained mode

Code for our paper
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

ConSERT Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer Requirements torch==1.6.0

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.

TRICE: a task-agnostic transferring framework for multi-source sequence generation This is the source code of our work Transfer Learning for Sequence

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

This repo provides the code of the following papers: (GAR) "Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021 (RIDER) "Read

ACL'2021: Learning Dense Representations of Phrases at Scale
ACL'2021: Learning Dense Representations of Phrases at Scale

DensePhrases DensePhrases is an extractive phrase search tool based on your natural language inputs. From 5 million Wikipedia articles, it can search

(ACL 2022) The source code for the paper
(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

Towards Abstractive Grounded Summarization of Podcast Transcripts We provide the source code for the paper "Towards Abstractive Grounded Summarization

Comments
  • wav2vec 2.0模型地址失效

    wav2vec 2.0模型地址失效

    您好,感谢开源模型代码!我在复现时遇到了一点问题: 代码中脚本下载w2v2.0模型的地址: http://sf3-ttcdn-tos.pstatp.com/obj/nlp-opensource/acl2021/chimera 似乎已经失效,我自己在: https://github.com/pytorch/fairseq/blob/main/examples/wav2vec/README.md 下载了官方所提供的模型: https://dl.fbaipublicfiles.com/fairseq/wav2vec/wav2vec_small_960h.pt 但是发现不匹配,报错如下:

    RuntimeError: Error(s) in loading state_dict for Wav2Vec2Model:
            Missing key(s) in state_dict: "mask_emb", "feature_extractor.conv_layers.0.0.weight", "feature_extractor.conv_layers.0.2.weight", "feature_extractor.conv_layers.0.2.bias", "feature_extractor.conv_layers.1.0.weight", "feature_extractor.conv_layers.2.0.weight", "feature_extractor.conv_layers.3.0.weight", "feature_extractor.conv_layers.4.0.weight", "feature_extractor.conv_layers.5.0.weight", "project_q.weight", "project_q.bias", "encoder.pos_conv.0.bias", "encoder.pos_conv.0.weight_g", "encoder.pos_conv.0.weight_v", "encoder.layers.0.self_attn.k_proj.weight", "encoder.layers.0.self_attn.k_proj.bias", "encoder.layers.0.self_attn.v_proj.weight", "encoder.layers.0.self_attn.v_proj.bias", "encoder.layers.0.self_attn.q_proj.weight", "encoder.layers.0.self_attn.q_proj.bias", "encoder.layers.0.self_attn.out_proj.weight", "encoder.layers.0.self_attn.out_proj.bias", "encoder.layers.0.self_attn_layer_norm.weight", "encoder.layers.0.self_attn_layer_norm.bias", "encoder.layers.0.fc1.weight", "encoder.layers.0.fc1.bias", "encoder.layers.0.fc2.weight", "encoder.layers.0.fc2.bias", "encoder.layers.0.final_layer_norm.weight", "encoder.layers.0.final_layer_norm.bias", "encoder.layers.1.self_attn.k_proj.weight", "encoder.layers.1.self_attn.k_proj.bias", "encoder.layers.1.self_attn.v_proj.weight", "encoder.layers.1.self_attn.v_proj.bias", "encoder.layers.1.self_attn.q_proj.weight", "encoder.layers.1.self_attn.q_proj.bias", "encoder.layers.1.self_attn.out_proj.weight", "encoder.layers.1.self_attn.out_proj.bias", "encoder.layers.1.self_attn_layer_norm.weight", "encoder.layers.1.self_attn_layer_norm.bias", "encoder.layers.1.fc1.weight", "encoder.layers.1.fc1.bias", "encoder.layers.1.fc2.weight", "encoder.layers.1.fc2.bias", "encoder.layers.1.final_layer_norm.weight", "encoder.layers.1.final_layer_norm.bias", "encoder.layers.2.self_attn.k_proj.weight", "encoder.layers.2.self_attn.k_proj.bias", "encoder.layers.2.self_attn.v_proj.weight", "encoder.layers.2.self_attn.v_proj.bias", "encoder.layers.2.self_attn.q_proj.weight", "encoder.layers.2.self_attn.q_proj.bias", "encoder.layers.2.self_attn.out_proj.weight", "encoder.layers.2.self_attn.out_proj.bias", "encoder.layers.2.self_attn_layer_norm.weight", "encoder.layers.2.self_attn_layer_norm.bias", "encoder.layers.2.fc1.weight", "encoder.layers.2.fc1.bias", "encoder.layers.2.fc2.weight", "encoder.layers.2.fc2.bias", "encoder.layers.2.final_layer_norm.weight", "encoder.layers.2.final_layer_norm.bias", "encoder.layers.3.self_attn.k_proj.weight", "encoder.layers.3.self_attn.k_proj.bias", "encoder.layers.3.self_attn.v_proj.weight", "encoder.layers.3.self_attn.v_proj.bias", "encoder.layers.3.self_attn.q_proj.weight", "encoder.layers.3.self_attn.q_proj.bias", "encoder.layers.3.self_attn.out_proj.weight", "encoder.layers.3.self_attn.out_proj.bias", "encoder.layers.3.self_attn_layer_norm.weight", "encoder.layers.3.self_attn_layer_norm.bias", "encoder.layers.3.fc1.weight", "encoder.layers.3.fc1.bias", "encoder.layers.3.fc2.weight", "encoder.layers.3.fc2.bias", "encoder.layers.3.final_layer_norm.weight", "encoder.layers.3.final_layer_norm.bias", "encoder.layers.4.self_attn.k_proj.weight", "encoder.layers.4.self_attn.k_proj.bias", "encoder.layers.4.self_attn.v_proj.weight", "encoder.layers.4.self_attn.v_proj.bias", "encoder.layers.4.self_attn.q_proj.weight", "encoder.layers.4.self_attn.q_proj.bias", "encoder.layers.4.self_attn.out_proj.weight", "encoder.layers.4.self_attn.out_proj.bias", "encoder.layers.4.self_attn_layer_norm.weight", "encoder.layers.4.self_attn_layer_norm.bias", "encoder.layers.4.fc1.weight", "encoder.layers.4.fc1.bias", "encoder.layers.4.fc2.weight", "encoder.layers.4.fc2.bias", "encoder.layers.4.final_layer_norm.weight", "encoder.layers.4.final_layer_norm.bias", "encoder.layers.5.self_attn.k_proj.weight", "encoder.layers.5.self_attn.k_proj.bias", "encoder.layers.5.self_attn.v_proj.weight", "encoder.layers.5.self_attn.v_proj.bias", "encoder.layers.5.self_attn.q_proj.weight", "encoder.layers.5.self_attn.q_proj.bias", "encoder.layers.5.self_attn.out_proj.weight", "encoder.layers.5.self_attn.out_proj.bias", "encoder.layers.5.self_attn_layer_norm.weight", "encoder.layers.5.self_attn_layer_norm.bias", "encoder.layers.5.fc1.weight", "encoder.layers.5.fc1.bias", "encoder.layers.5.fc2.weight", "encoder.layers.5.fc2.bias", "encoder.layers.5.final_layer_norm.weight", "encoder.layers.5.final_layer_norm.bias", "encoder.layer_norm.weight", "encoder.layer_norm.bias", "layer_norm.weight", "layer_norm.bias", "final_proj.weight", "final_proj.bias". 
            Unexpected key(s) in state_dict: "w2v_encoder.proj.weight", "w2v_encoder.proj.bias", "w2v_encoder.w2v_model.feature_extractor.conv_layers.0.0.weight", "w2v_encoder.w2v_model.feature_extractor.conv_layers.0.2.weight", "w2v_encoder.w2v_model.feature_extractor.conv_layers.0.2.bias", "w2v_encoder.w2v_model.feature_extractor.conv_layers.1.0.weight", "w2v_encoder.w2v_model.feature_extractor.conv_layers.2.0.weight", "w2v_encoder.w2v_model.feature_extractor.conv_layers.3.0.weight", "w2v_encoder.w2v_model.feature_extractor.conv_layers.4.0.weight", "w2v_encoder.w2v_model.feature_extractor.conv_layers.5.0.weight", "w2v_encoder.w2v_model.feature_extractor.conv_layers.6.0.weight", "w2v_encoder.w2v_model.encoder.pos_conv.0.bias", "w2v_encoder.w2v_model.encoder.pos_conv.0.weight_g", "w2v_encoder.w2v_model.encoder.pos_conv.0.weight_v", "w2v_encoder.w2v_model.encoder.layers.0.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.0.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.0.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.0.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.0.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.0.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.0.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.0.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.0.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.0.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.0.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.0.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.0.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.0.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.0.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.0.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.1.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.1.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.1.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.1.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.1.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.1.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.1.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.1.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.1.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.1.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.1.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.1.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.1.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.1.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.1.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.1.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.2.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.2.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.2.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.2.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.2.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.2.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.2.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.2.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.2.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.2.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.2.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.2.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.2.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.2.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.2.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.2.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.3.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.3.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.3.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.3.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.3.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.3.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.3.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.3.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.3.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.3.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.3.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.3.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.3.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.3.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.3.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.3.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.4.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.4.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.4.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.4.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.4.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.4.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.4.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.4.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.4.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.4.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.4.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.4.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.4.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.4.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.4.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.4.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.5.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.5.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.5.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.5.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.5.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.5.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.5.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.5.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.5.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.5.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.5.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.5.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.5.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.5.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.5.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.5.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.6.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.6.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.6.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.6.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.6.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.6.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.6.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.6.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.6.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.6.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.6.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.6.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.6.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.6.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.6.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.6.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.7.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.7.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.7.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.7.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.7.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.7.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.7.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.7.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.7.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.7.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.7.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.7.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.7.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.7.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.7.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.7.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.8.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.8.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.8.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.8.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.8.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.8.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.8.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.8.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.8.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.8.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.8.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.8.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.8.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.8.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.8.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.8.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.9.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.9.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.9.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.9.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.9.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.9.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.9.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.9.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.9.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.9.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.9.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.9.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.9.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.9.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.9.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.9.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.10.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.10.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.10.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.10.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.10.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.10.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.10.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.10.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.10.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.10.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.10.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.10.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.10.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.10.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.10.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.10.final_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.11.self_attn.k_proj.weight", "w2v_encoder.w2v_model.encoder.layers.11.self_attn.k_proj.bias", "w2v_encoder.w2v_model.encoder.layers.11.self_attn.v_proj.weight", "w2v_encoder.w2v_model.encoder.layers.11.self_attn.v_proj.bias", "w2v_encoder.w2v_model.encoder.layers.11.self_attn.q_proj.weight", "w2v_encoder.w2v_model.encoder.layers.11.self_attn.q_proj.bias", "w2v_encoder.w2v_model.encoder.layers.11.self_attn.out_proj.weight", "w2v_encoder.w2v_model.encoder.layers.11.self_attn.out_proj.bias", "w2v_encoder.w2v_model.encoder.layers.11.self_attn_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.11.self_attn_layer_norm.bias", "w2v_encoder.w2v_model.encoder.layers.11.fc1.weight", "w2v_encoder.w2v_model.encoder.layers.11.fc1.bias", "w2v_encoder.w2v_model.encoder.layers.11.fc2.weight", "w2v_encoder.w2v_model.encoder.layers.11.fc2.bias", "w2v_encoder.w2v_model.encoder.layers.11.final_layer_norm.weight", "w2v_encoder.w2v_model.encoder.layers.11.final_layer_norm.bias", "w2v_encoder.w2v_model.layer_norm.weight", "w2v_encoder.w2v_model.layer_norm.bias", "w2v_encoder.w2v_model.post_extract_proj.weight", "w2v_encoder.w2v_model.post_extract_proj.bias", "w2v_encoder.w2v_model.mask_emb", "w2v_encoder.w2v_model.encoder.layer_norm.weight", "w2v_encoder.w2v_model.encoder.layer_norm.bias".
    

    可以提供一下文中使用的w2v2.0模型的新地址吗?十分感谢!

    opened by hannlp 2
  • some question on your shared image

    some question on your shared image

    Thanks for your great work. I watch your report in techbeat. I has a question that in the below image, you say dirrerent color represent different semantic. However, in your grapg, one color may has two clusters like green and blue. It means that it contain different semantic infromation? Thanks image

    opened by renrenzsbbb 1
  • some version problem

    some version problem

    Chimera-ST/chimera/prepare_data/prep_mustc_data.py 88 change to sample_rate = torchaudio.info(wav_path).sample_rate 114 change to waveform, _ = torchaudio.load( wav_path, frame_offset=offset, num_frames=n_frames)

    opened by tjshu 0
Owner
Chi Han
CS Graduate student at UIUC.
Chi Han
Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings of ACL: ACL 2021)

BERT-for-Surprisal Python Implementation of ``Modeling the Influence of Verb Aspect on the Activation of Typical Event Locations with BERT'' (Findings

null 7 Dec 5, 2022
Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

?? Fingerprinting Fine-tuned Language Models in the wild This is the code and dataset for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned La

LCS2-IIITDelhi 5 Sep 13, 2022
null 189 Jan 2, 2023
PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Feature_CRF_AE Feature_CRF_AE provides a implementation of Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging

Jacob Zhou 6 Apr 29, 2022
Findings of ACL 2021

Assessing Dialogue Systems with Distribution Distances [arXiv][code] We propose to measure the performance of a dialogue system by computing the distr

Yahui Liu 16 Feb 24, 2022
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

null 14 Jan 3, 2023
Hostapd-mac-tod-acl - Setup a hostapd AP with MAC ToD ACL

A brief explanation This script provides a quick way to setup a Time-of-day (Tod

null 2 Feb 3, 2022
Code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

This repository contains the code for the paper in Findings of EMNLP 2021: "EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation".

Chenhe Dong 28 Nov 10, 2022
Code for Findings at EMNLP 2021 paper: "Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning"

Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning This repo is for Findings at EMNLP 2021 paper: Learn Cont

INK Lab @ USC 6 Sep 2, 2022