This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks.

Overview

slue-toolkit

made-with-python License: MIT

We introduce Spoken Language Understanding Evaluation (SLUE) benchmark. This toolkit provides codes to download and pre-process the SLUE datasets, train the baseline models, and evaluate SLUE tasks. Refer https://arxiv.org/abs/2111.10367 for more details.

News

  • Nov. 22: We release the SLUE paper on arXiv along with the slue-toolkit repository. The repository contains data processing and evaluation scripts. We will publish the scripts for trainig the baseline models soon.

Installation

  1. git clone this repository and install slue-toolkit (development mode)
git clone https://github.com/asappresearch/slue-toolkit.git
pip install -e .

or install directly from Github

pip install git+https://github.com/asappresearch/slue-toolkit.git
  1. Install additional dependency based on your choice (e.g. you need fairseq and transformers for baselines)

SLUE Tasks

Automatic Speech Recognition (ASR)

Although this is not a SLU task, ASR can help analyze performance of downstream SLU tasks on the same domain. Additionally, pipeline approaches depend on ASR outputs, making ASR relevant to SLU. ASR is evaluated using word error rate (WER).

Named Entity Recognition (NER)

Named entity recognition involves detecting the named entities and their tags (types) in a given sentence. We evaluate performance using micro-averaged F1 and label-F1 scores. The F1 score evaluates an unordered list of named entity phrase and tag pairs predicted for each sentence. Only the tag predictions are considered for label-F1.

Sentiment Analysis (SA)

Sentiment analysis refers to classifying a given speech segment as having negative, neutral, or positive sentiment. We evaluate SA using macro-averaged (unweighted) recall and F1 scores.

Datasets

Corpus Size - utts (hours) Tasks License
Fine-tune Dev Test
SLUE-VoxPopuli 5,000 (14.5) 1,753 (5.0) 1,842 (4.9) ASR, NER CC0 (check complete license here)
SLUE-VoxCeleb 5,777 (12.8) 955 (2.1) 4,052 (9.0) ASR, SA CC-BY 4.0 (check complete license here)

For SLUE, you need VoxCeleb and VoxPopuli dataset. We carefully curated subset of those dataset for fine-tuning and evaluation for SLUE tasks, and we re-distribute the the subsets. Thus, you don't need to download a whole gigantic datasets. In the dataset, we also includes the human annotation and transcription for SLUE tasks. All you need to do is just running the script below and it will download and pre-process the dataset.

Download and pre-process dataset

bash scripts/download_datasets.sh

SLUE score evaluation

The test set data and annotation will be used for the official SLUE score evaluation, however we will not release the test set annotation. Thus, the SLUE score can be evaluated by submitting your prediction result in tsv format. We will prepare the website to accept your submission. Please stay tuned for this.

Model development rule

To train model, You can use fine-tuning and dev sets (audio, transcription and annotation) except the test set of SLUE task. Additionally you can use any kind of external dataset whether it is labeled or unlabeled for any purpose of training (e.g. pre-training and fine-tuning).

For vadidation of your model, you can use official dev set we provide, or you can make your own splits or cross-validation splits by mixing fine-tuning and dev set all together.

Baselines

ASR

Fine-tuning

Assuming that the preprocessed manifest files are in manifest/slue-voxceleb and manifest/slue-voxpopuli for SLUE-VoxCeleb and SLUE-VoxPopuli. This command fine-tune a wav2vec 2.0 base model on these two datasets using one GPU.

bash baselines/asr/ft-w2v2-base.sh manifest/slue-voxceleb save/asr/w2v2-base-vc
bash baselines/asr/ft-w2v2-base.sh manifest/slue-voxpopuli save/asr/w2v2-base-vp

Evaluation

To evaluate the fine-tuned wav2vec 2.0 ASR models on the dev set, please run the following commands.

python slue_toolkit/eval/eval_w2v.py eval_asr save/asr/w2v2-base-vc --data manifest/slue-voxceleb --subset dev
python slue_toolkit/eval/eval_w2v.py eval_asr save/asr/w2v2-base-vp --data manifest/slue-voxpopuli --subset dev

The WER will be printed directly. The predictions are saved in save/asr/w2v2-base-vc/pred-dev.wrd and save/asr/w2v2-base-vp/pred-dev.wrd and can be used for pipeline models.

More detail baseline experiment described here

NER

Fine-tuning End-to-end model

Assuming that the preprocessed manifest files are in manifest/slue-voxpopuli for SLUE-VoxPopuli. This command fine-tune a wav2vec 2.0 base model using one GPU.

bash baselines/ner/e2e_scripts/ft-w2v2-base.sh manifest/slue-voxpopuli/e2e_ner save/e2e_ner/w2v2-base

Evaluating End-to-End model

To evaluate the fine-tuned wav2vec 2.0 E2E NER model on the dev set, please run the following command. (decoding without language model)

bash baselines/ner/e2e_scripts/eval-ner.sh w2v2-base dev combined nolm

More detail baseline experiment described here

Sentiment Analysis

Fine-tuning

This command fine-tune a wav2vec 2.0 base model on the voxceleb dataset

bash baselines/sentiment/ft-w2v2-base-senti.sh manifest/slue-voxceleb save/sentiment/w2v2-base

Evaluation

To evaluate the fine-tuned wav2vec 2.0 sentiment model, run following commands or run baselines/sentiment/e2e_scripts/eval.sh

python3 slue_toolkit/eval/eval_w2v_sentiment.py --save-dir save/sentiment/w2v2-base --data manifest/slue-voxceleb --subset dev

More detail baseline experiment described here

Comments
  • Fix text NER evaluation

    Fix text NER evaluation

    Following the PR in https://github.com/asappresearch/slue-toolkit/pull/5. I started testing the evaluation pipeline for the text NER and it seems to also be quite flaky. I have fixed most of the errors but there are some critical ones that I mention in https://github.com/asappresearch/slue-toolkit/issues/6.

    If you could make the edits to this PR for handling the remaining todos mentioned in the issue. That would be great.

    I have currently kept my previous PRs also committed to this one but happy to remove them once you merge those.

    Thanks Sid

    opened by siddalmia 11
  • Sentiment Analysis baseline

    Sentiment Analysis baseline

    Hi,

    I wanted to reproduce the sentiment analysis baseline through

    bash baselines/sentiment/e2e_scripts/ft-w2v2-base-senti.sh manifest/slue-voxceleb save/sentiment/w2v2-base
    

    Fairseq Config log:

    [2022-02-14 01:39:15,687][fairseq_cli.train][INFO] - {'_name': None, 'common': {'_name': None, 'no_progress_bar': False, 'log_interval': 100, 'log_format': 'json', 'log_fil[37/1798]
     'tensorboard_logdir': 'save/sentiment/w2v2-base/tb_logs', 'wandb_project': None, 'azureml_logging': False, 'seed': 1, 'cpu': False, 'tpu': False, 'bf16': False, 'memory_efficient_b
    f16': False, 'fp16': True, 'memory_efficient_fp16': False, 'fp16_no_flatten_grads': False, 'fp16_init_scale': 128, 'fp16_scale_window': None, 'fp16_scale_tolerance': 0.0, 'on_cpu_c$
    nvert_precision': False, 'min_loss_scale': 0.0001, 'threshold_loss_scale': None, 'amp': False, 'amp_batch_retries': 2, 'amp_init_scale': 128, 'amp_scale_window': None, 'user_dir': $
    /root/pushkal/slue-toolkit/slue_toolkit/fairseq_addon', 'empty_cache_freq': 0, 'all_gather_list_size': 16384, 'model_parallel_size': 1, 'quantization_config_path': None, 'profile':
    False, 'reset_logging': False, 'suppress_crashes': False, 'use_plasma_view': False, 'plasma_path': '/tmp/plasma'}, 'common_eval': {'_name': None, 'path': None, 'post_process': None$
     'quiet': False, 'model_overrides': '{}', 'results_path': None}, 'distributed_training': {'_name': None, 'distributed_world_size': 1, 'distributed_num_procs': 1, 'distributed_rank'$
     0, 'distributed_backend': 'nccl', 'distributed_init_method': None, 'distributed_port': -1, 'device_id': 0, 'distributed_no_spawn': False, 'ddp_backend': 'c10d', 'ddp_comm_hook': '$
    one', 'bucket_cap_mb': 25, 'fix_batches_to_gpus': False, 'find_unused_parameters': True, 'gradient_as_bucket_view': False, 'fast_stat_sync': False, 'heartbeat_timeout': -1, 'broadc$
    st_buffers': False, 'slowmo_momentum': None, 'slowmo_base_algorithm': 'localsgd', 'localsgd_frequency': 3, 'nprocs_per_node': 1, 'pipeline_model_parallel': False, 'pipeline_balance$
    : None, 'pipeline_devices': None, 'pipeline_chunks': 0, 'pipeline_encoder_balance': None, 'pipeline_encoder_devices': None, 'pipeline_decoder_balance': None, 'pipeline_decoder_devi$
    es': None, 'pipeline_checkpoint': 'never', 'zero_sharding': 'none', 'fp16': True, 'memory_efficient_fp16': False, 'tpu': False, 'no_reshard_after_forward': False, 'fp32_reduce_scat$
    er': False, 'cpu_offload': False, 'use_sharded_state': False, 'not_fsdp_flatten_parameters': False}, 'dataset': {'_name': None, 'num_workers': 0, 'skip_invalid_size_inputs_valid_te$
    t': False, 'max_tokens': 1400000, 'batch_size': None, 'required_batch_size_multiple': 8, 'required_seq_len_multiple': 1, 'dataset_impl': None, 'data_buffer_size': 10, 'train_subset$
    : 'fine-tune', 'valid_subset': 'dev', 'combine_valid_subsets': None, 'ignore_unused_valid_subsets': False, 'validate_interval': 1000000, 'validate_interval_updates': 0, 'validate_a$
    ter_updates': 2000, 'fixed_validation_seed': None, 'disable_validation': False, 'max_tokens_valid': 1400000, 'batch_size_valid': None, 'max_valid_steps': None, 'curriculum': 0, 'ge$
    _subset': 'test', 'num_shards': 1, 'shard_id': 0, 'grouped_shuffling': False, 'update_epoch_batch_itr': False, 'update_ordered_indices_seed': False}, 'optimization': {'_name': None$
     'max_epoch': 0, 'max_update': 50000, 'stop_time_hours': 0.0, 'clip_norm': 0.0, 'sentence_avg': True, 'update_freq': [1], 'lr': [2e-05], 'stop_min_lr': -1.0, 'use_bmuf': False, 'sk$
    p_remainder_batch': False}, 'checkpoint': {'_name': None, 'save_dir': 'checkpoints', 'restore_file': 'checkpoint_last.pt', 'continue_once': None, 'finetune_from_model': None, 'rese$
    _dataloader': False, 'reset_lr_scheduler': False, 'reset_meters': False, 'reset_optimizer': False, 'optimizer_overrides': '{}', 'save_interval': 50, 'save_interval_updates': 1000, $
    keep_interval_updates': 1, 'keep_interval_updates_pattern': -1, 'keep_last_epochs': -1, 'keep_best_checkpoints': -1, 'no_save': False, 'no_epoch_checkpoints': True, 'no_last_checkp$
    ints': False, 'no_save_optimizer_state': False, 'best_checkpoint_metric': 'macro_f1', 'maximize_best_checkpoint_metric': True, 'patience': -1, 'checkpoint_suffix': '', 'checkpoint_$
    hard_count': 1, 'load_checkpoint_on_all_dp_ranks': False, 'write_checkpoints_asynchronously': False, 'model_parallel_size': 1}, 'bmuf': {'_name': None, 'block_lr': 1.0, 'block_mome$
    tum': 0.875, 'global_sync_iter': 50, 'warmup_iterations': 500, 'use_nbm': False, 'average_sync': False, 'distributed_world_size': 1}, 'generation': {'_name': None, 'beam': 5, 'nbes$
    ': 1, 'max_len_a': 0.0, 'max_len_b': 200, 'min_len': 1, 'match_source_len': False, 'unnormalized': False, 'no_early_stop': False, 'no_beamable_mm': False, 'lenpen': 1.0, 'unkpen': $
    .0, 'replace_unk': None, 'sacrebleu': False, 'score_reference': False, 'prefix_size': 0, 'no_repeat_ngram_size': 0, 'sampling': False, 'sampling_topk': -1, 'sampling_topp': -1.0, '$
    onstraints': None, 'temperature': 1.0, 'diverse_beam_groups': -1, 'diverse_beam_strength': 0.5, 'diversity_rate': -1.0, 'print_alignment': None, 'print_step': False, 'lm_path': Non$
    , 'lm_weight': 0.0, 'iter_decode_eos_penalty': 0.0, 'iter_decode_max_iter': 10, 'iter_decode_force_max_iter': False, 'iter_decode_with_beam': 1, 'iter_decode_with_external_reranker$
    : False, 'retain_iter_history': False, 'retain_dropout': False, 'retain_dropout_modules': None, 'decoding_format': None, 'no_seed_provided': False}, 'eval_lm': {'_name': None, 'out$
    ut_word_probs': False, 'output_word_stats': False, 'context_window': 0, 'softmax_batch': 9223372036854775807}, 'interactive': {'_name': None, 'buffer_size': 0, 'input': '-'}, 'mode$
    ': {'_name': 'wav2vec2_seq_cls', 'w2v_path': '/root/pushkal/slue-toolkit/save/pretrained/wav2vec_small.pt', 'no_pretrained_weights': False, 'dropout_input': 0.0, 'final_dropout': 0$
    0, 'dropout': 0.0, 'attention_dropout': 0.0, 'activation_dropout': 0.1, 'conv_feature_layers': '[(512, 10, 5)] + [(512, 3, 2)] * 4 + [(512,2,2)] + [(512,2,2)]', 'encoder_embed_dim'$
     768, 'apply_mask': True, 'mask_length': 10, 'mask_prob': 0.65, 'mask_selection': static, 'mask_other': 0.0, 'no_mask_overlap': False, 'mask_min_space': 1, 'mask_channel_length': 6$
    , 'mask_channel_prob': 0.5, 'mask_channel_selection': static, 'mask_channel_other': 0.0, 'no_mask_channel_overlap': False, 'freeze_finetune_updates': 2000, 'feature_grad_mult': 0.0$
     'layerdrop': 0.1, 'mask_channel_min_space': 1, 'mask_channel_before': False, 'normalize': '${task.normalize}', 'data': '${task.data}', 'w2v_args': None, 'pool_method': 'self_attn'$
     'classifier_dropout': 0.2}, 'task': {'_name': 'slue_audio_classification', 'data': '/root/pushkal/slue-toolkit/manifest/slue-voxceleb', 'labels': 'sent', 'binarized_dataset': Fals$
    , 'sample_rate': 16000, 'normalize': False, 'enable_padding': False, 'max_sample_size': None, 'min_sample_size': None, 'num_batch_buckets': 0, 'precompute_mask_indices': False, 'in$
    erred_w2v_config': None, 'tpu': '${common.tpu}', 'text_compression_level': none, 'label_dir': '???'}, 'criterion': {'_name': 'slue_sequence_classification'}, 'optimizer': {'_name':
    'adam', 'adam_betas': '(0.9,0.98)', 'adam_eps': 1e-08, 'weight_decay': 0.0, 'use_old_adam': False, 'fp16_adam_stats': False, 'tpu': False, 'lr': [2e-05]}, 'lr_scheduler': {'_name':
    'tri_stage', 'warmup_steps': 0, 'hold_steps': 0, 'decay_steps': 0, 'phase_ratio': [0.1, 0.0, 0.9], 'init_lr_scale': 0.01, 'final_lr_scale': 0.05, 'max_update': 50000.0, 'lr': [2e-05
    ]}, 'scoring': None, 'bpe': None, 'tokenizer': None, 'ema': {'_name': None, 'store_ema': False, 'ema_decay': 0.9999, 'ema_start_update': 0, 'ema_seed_model': None, 'ema_update_freq'
    : 1, 'ema_fp32': False}, 'job_logging_cfg': {'version': 1, 'formatters': {'simple': {'format': '[%(asctime)s][%(name)s][%(levelname)s] - %(message)s'}}, 'handlers': {'console': {'cl
    ass': 'logging.StreamHandler', 'formatter': 'simple', 'stream': 'ext://sys.stdout'}, 'file': {'class': 'logging.FileHandler', 'formatter': 'simple', 'filename': 'hydra_train.log'}},
     'root': {'level': 'INFO', 'handlers': ['console', 'file']}, 'disable_existing_loggers': False}}
    

    But facing this error:

    Traceback (most recent call last):
      File "/root/miniconda3/envs/slue/bin/fairseq-hydra-train", line 33, in <module>
        sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-hydra-train')())
      File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq_cli/hydra_train.py", line 87, in cli_main
        hydra_main()
      File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main
        _run_hydra(
      File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra
        run_and_report(
      File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report
        raise ex
      File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report
        return func()
      File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>
        lambda: hydra.run(
      File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run
        return run_job(
      File "/root/miniconda3/envs/slue/lib/python3.8/site-packages/hydra/core/utils.py", line 129, in run_job
        ret.return_value = task_function(task_cfg)
      File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq_cli/hydra_train.py", line 27, in hydra_main
        _hydra_main(cfg)
      File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq_cli/hydra_train.py", line 56, in _hydra_main
        distributed_utils.call_main(cfg, pre_main, **kwargs)
      File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq/distributed/utils.py", line 369, in call_main
        main(cfg, **kwargs)
      File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq_cli/train.py", line 97, in main
        criterion = task.build_criterion(cfg.criterion)
      File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq/tasks/fairseq_task.py", line 352, in build_criterion
        return criterions.build_criterion(cfg, self)
      File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq/criterions/__init__.py", line 29, in build_criterion
        return build_criterion_(cfg, task)
      File "/root/pushkal/slue-toolkit/deps/fairseq/fairseq/registry.py", line 55, in build_x
        cls = REGISTRY[choice]
    KeyError: 'slue_sequence_classification'
    
    opened by pushkalkatara 7
  • Text NER baseline issues

    Text NER baseline issues

    Hi @ankitapasad @fwu-asapp @sshon-asapp,

    I really apologize for my many issue creation. I thought of running the baselines again but I think there are still some issues in the slue_toolkit/text_ner/ner_deberta_modules.py file.

    For example -

    1. https://github.com/asappresearch/slue-toolkit/blob/main/slue_toolkit/text_ner/ner_deberta_modules.py#L63 - Is trying to run regex on a list. But I think they can run only on string or bytes.
    2. https://github.com/asappresearch/slue-toolkit/blob/main/slue_toolkit/text_ner/ner_deberta_modules.py#L111 - Is trying to read a wrong file format. It should be f"{split_name}.{label_type}.tsv" instead.

    Would it be possible to run a new bare slue-toolkit and see what all bugs pops up?

    The command that I am trying to run is - bash baselines/ner/nlp_scripts/ft-deberta.sh deberta-base combined

    Please ignore if you have already caught these issues. Thanks Sid

    opened by siddalmia 6
  • Clean up and fix E2E NER baselines

    Clean up and fix E2E NER baselines

    • Create the dictionary and link the tsv file (instead of copying) for NER
    • Remove pickle files and directly hardcode the mappings since 1) it is more readable and 2) the pkl files can't be loaded correctly, if people import slue_toolkit outside (not from the root of this repo)
    • Fix the train_subset, valid_subset, and labels in the E2E baseline scripts
    opened by fwu-asapp 6
  • Issues with baseline scripts

    Issues with baseline scripts

    Hi,

    I am running into an error when trying to run the baseline script for voxpopuli:

    Traceback (most recent call last):                                                                                                                                                                    
      File "~/bin/anaconda3/envs/fairseq/bin/fairseq-hydra-train", line 33, in <module>                                                                                       
        sys.exit(load_entry_point('fairseq', 'console_scripts', 'fairseq-hydra-train')())                                                                                                                 
      File "~/repos/fairseq/fairseq_cli/hydra_train.py", line 87, in cli_main                                                                                                 
        hydra_main()                                                                                                                                                                                      
      File "~/bin/anaconda3/envs/fairseq/lib/python3.8/site-packages/hydra/main.py", line 32, in decorated_main                                                               
        _run_hydra(                                                                                                                                                                                       
      File "~/bin/anaconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/utils.py", line 346, in _run_hydra                                                       
        run_and_report(                                                                                                                                                                                   
      File "~/bin/anaconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/utils.py", line 201, in run_and_report                                                   
        raise ex                                                                                                                                                                                          
      File "~/bin/anaconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/utils.py", line 198, in run_and_report                                                   
        return func()                                                                                                                                                                                     
      File "~/bin/anaconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/utils.py", line 347, in <lambda>                                                         
        lambda: hydra.run(                                                                                                                                                                                
      File "~/bin/anaconda3/envs/fairseq/lib/python3.8/site-packages/hydra/_internal/hydra.py", line 107, in run                                                              
        return run_job(                                                                                                                                                                                   
      File "~/bin/anaconda3/envs/fairseq/lib/python3.8/site-packages/hydra/core/utils.py", line 129, in run_job                                                               
        ret.return_value = task_function(task_cfg)                                                                                                                                                        
      File "~/repos/fairseq/fairseq_cli/hydra_train.py", line 27, in hydra_main                                                                                               
        _hydra_main(cfg)                                                                                                                                                                                  
      File "~/repos/fairseq/fairseq_cli/hydra_train.py", line 56, in _hydra_main                                                                                              
        distributed_utils.call_main(cfg, pre_main, **kwargs)                                                                                                                                              
      File "~/repos/fairseq/fairseq/distributed/utils.py", line 369, in call_main                                                                                             
        main(cfg, **kwargs)                                                                                                                                                                               
      File "~/repos/fairseq/fairseq_cli/train.py", line 164, in main                                                                                                          
        extra_state, epoch_itr = checkpoint_utils.load_checkpoint(                                                                                                                                        
      File "~/repos/fairseq/fairseq/checkpoint_utils.py", line 272, in load_checkpoint                                                                                        
        epoch_itr = trainer.get_train_iterator(                                                                                                                                                           
      File "~/repos/fairseq/fairseq/trainer.py", line 718, in get_train_iterator                                                                                              
        self.reset_dummy_batch(batch_iterator.first_batch)                                                                                                                                                
      File "~/repos/fairseq/fairseq/data/iterators.py", line 334, in first_batch                                                                                              
        return self.collate_fn([self.dataset[i] for i in self.frozen_batches[0]])                                                                                                                         
      File "~/repos/fairseq/fairseq/data/add_target_dataset.py", line 68, in collater                                                                                         
        collated["net_input"]["prev_output_tokens"],                                                                                                                                                      
    KeyError: 'prev_output_tokens'
    

    I have installed fairseq as recommended on the github page. Here is the installed version:

    fairseq                   1.0.0a0+0f078de           dev_0    <develop>                                                                                                                                
    

    Any pointers to solve this error?

    Thank you

    opened by qmeeus 2
  • Text NER Evaluation Pipeline doesn't seem to work

    Text NER Evaluation Pipeline doesn't seem to work

    I am trying to run baselines/ner/nlp_scripts/eval-deberta.sh but it seems to be broken quite a bit.

    I have fixed some of the bugs, in the PR (https://github.com/asappresearch/slue-toolkit/pull/7) but there seems to be some more, which I am unable to fix comfortably and would require someone with expertise of this code to have a look -

    1. eval_obj.get_scores in def eval( of slue_toolkit/text_ner/ner_deberta.py seems to be passing asr_val_dataset which is set to None when eval_asr is set to False.
    2. This then causes an issue in def get_scores function in slue_toolkit/text_ner/ner_deberta_modules.py. As the run_inference invoked by get_scores uses asr_val_dataset in their Dataloader.
    3. The self.get_entity_tags( call in the run_inference function when eval_asr is set to False is also broken as this calls self.get_tag_map(indices=True) which seems to be calling an undefined variable tag in tag2id_raw[pfx + tag]

    Could you please review that PR and also suggest the changes for the above errors. I have kept the PR as [WIP] you can make edits to them as you feel fit.

    Thanks Sid

    opened by siddalmia 2
  • Fix text ner

    Fix text ner

    Hi,

    The text NER pipelines had a few bugs. For the command bash baselines/ner/nlp_scripts/ft-deberta.sh deberta-base raw

    I have described them below and also provided the fix -

    1. It was using transformers and datasets library by hugging face, which I added to setup.py. Also added seqeval which was being used by datasets.
    2. Since datasets is the folder where the data was being downloaded. I have now renamed it to dataset because it conflicts with the library that the text NER uses.
    3. def train_module only uses train and val dataset but inside the function it was using eval_dataset which is not defined. I have fixed it to dev dataset.
    4. def align_labels has a bug. As the tag2id dictionary is created using the training data, but the validation data actually has tags which are not there. I have now mapped them as 'O'.

    I would highly recommend to test other parts of the toolkit! There seems to be many minor mistakes.

    Thanks Sid

    opened by siddalmia 2
  • Command to run E2E model of NER directly/only by speech model?

    Command to run E2E model of NER directly/only by speech model?

    Hi, I have seen the readme of NER in https://github.com/he159ok/slue-toolkit

    But I do not see a command to run a NER model directly/only by the speech model.

    May I know a command to run it?

    opened by he159ok 1
  • About submission

    About submission

    I sent my submission of the test set evaluation to "[email protected]" , but there was no reply. I do not know whether I sent the wrong email address or other reasons.

    opened by RuizhuoXu 1
  • Voxceleb evaluation

    Voxceleb evaluation

    Hi, I have some doubts regarding which data can be used for pretraining the models. I plan to do a mixture of self-supervised and supervised pretraining and I wanted to know:

    • Can I use Voxceleb1 audios for pretraining? I would only use the speaker id and nationality labels as supervision, and some self-supervision (without labels), so my model would be agnostic about the sentiment annotations, but maybe could have some advantage to differentiate the Voxceleb1 speakers during finetuning, specially if there is an imbalance in the sentiments by speaker.
    • Did you follow the same original dev/test splits from Voxceleb1? I am pretraining only with the dev split, so if the sentiment analysis task is evaluated only on the test split, it would not be a problem. Am I right?
    opened by mrpep 1
  • Black for consistent formatting

    Black for consistent formatting

    I ran black to bring to a consistent formatting. Regarding the issue - https://github.com/asappresearch/slue-toolkit/issues/3

    black --version
    black, version 19.10b0
    
    opened by siddalmia 1
  • Plans to release ASR finetuned-models

    Plans to release ASR finetuned-models

    Hi,

    Thanks for the toolkit! I was wondering if there are plans to release the ASR finetuned models (or it is already there but I missed it). If not, are you accepting PRs on the ASR finetuned models by the community? Thanks in advance!

    Jeff Hsu

    opened by Splend1d 1
Owner
ASAPP Research
AI for Enterprise
ASAPP Research
HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events globally on daily to subseasonal timescales.

HeatNet HeatNet is a python package that provides tools to build, train and evaluate neural networks designed to predict extreme heat wave events glob

Google Research 6 Jul 7, 2022
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in ???? Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Jan 5, 2023
Ludwig is a toolbox that allows to train and evaluate deep learning models without the need to write code.

Translated in ???? Korean/ Ludwig is a toolbox that allows users to train and test deep learning models without the need to write code. It is built on

Ludwig 8.7k Dec 31, 2022
TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

Microsoft 1.3k Dec 30, 2022
Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser.

Hera Train/evaluate a Keras model, get metrics streamed to a dashboard in your browser. Setting up Step 1. Plant the spy Install the package pip

Keplr 495 Dec 10, 2022
Image-retrieval-baseline - MUGE Multimodal Retrieval Baseline

MUGE Multimodal Retrieval Baseline This repo is implemented based on the open_cl

null 47 Dec 16, 2022
Image-generation-baseline - MUGE Text To Image Generation Baseline

MUGE Text To Image Generation Baseline Requirements and Installation More detail

null 23 Oct 17, 2022
Jingju baseline - A baseline model of our project of Beijing opera script generation

Jingju Baseline It is a baseline of our project about Beijing opera script gener

midon 1 Jan 14, 2022
Metrics to evaluate quality and efficacy of synthetic datasets.

An Open Source Project from the Data to AI Lab, at MIT Metrics for Synthetic Data Generation Projects Website: https://sdv.dev Documentation: https://

The Synthetic Data Vault Project 129 Jan 3, 2023
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)

OCTIS : Optimizing and Comparing Topic Models is Simple! OCTIS (Optimizing and Comparing Topic models Is Simple) aims at training, analyzing and compa

MIND 478 Jan 1, 2023
A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''.

P-tuning A novel method to tune language models. Codes and datasets for paper ``GPT understands, too''. How to use our code We have released the code

THUDM 562 Dec 27, 2022
We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction

We envision models that are pre-trained on a vast range of domain-relevant tasks to become key for molecule property prediction. This repository aims to give easy access to state-of-the-art pre-trained models.

GMUM 90 Jan 8, 2023
A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

Justin 1.1k Dec 24, 2022
STBP is a way to train SNN with datasets by Backward propagation.

Spiking neural network (SNN), compared with depth neural network (DNN), has faster processing speed, lower energy consumption and more biological interpretability, which is expected to approach Strong AI.

Ling Zhang 18 Dec 9, 2022
A Pytorch implementation of MoveNet from Google. Include training code and pre-train model.

Movenet.Pytorch Intro MoveNet is an ultra fast and accurate model that detects 17 keypoints of a body. This is A Pytorch implementation of MoveNet fro

Mr.Fire 241 Dec 26, 2022
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022
Code for CPM-2 Pre-Train

CPM-2 Pre-Train Pre-train CPM-2 此分支为110亿非 MoE 模型的预训练代码,MoE 模型的预训练代码请切换到 moe 分支 CPM-2技术报告请参考link。 0 模型下载 请在智源资源下载页面进行申请,文件介绍如下: 文件名 描述 参数大小 100000.tar

Tsinghua AI 136 Dec 28, 2022