This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Overview

MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Installation

  • Create and activate conda environment.
conda env create -f environment.yml
  • Install Transformers locally.
pip install -e .
  • Note: The code is adapted from this codebase. Arguments regarding LoRA and adapter can be safely ignored.

Instructions

MoEBERT targets task-specific distillation. Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path.

Importance Score Computation

  • Use bert_base_mnli_example.sh to compute the importance scores, add a --preprocess_importance argument, remove the --do_train argument.
  • If multiple GPUs are used to compute the importance scores, a importance_[rank].pkl file will be saved for each GPU. Use merge_importance.py to merge these files.
  • To use the pre-computed importance scores, pass the file name to --moebert_load_importance.

Knowledge Distillation

  • For GLUE tasks, see examples/text-classification/run_glue.py.
  • For question answering tasks, see examples/question-answering/run_qa.py.
  • Run bash bert_base_mnli_example.sh as an example.
  • The codebase supports different routing strategies: gate-token, gate-sentence, hash-random and hash-balance. Choices should be passed to --moebert_route_method.
    • To use hash-balance, a balanced hash list needs to be pre-computed using hash_balance.py. Path to the saved hash list should be passed to --moebert_route_hash_list.
    • Add a load balancing loss by setting --moebert_load_balance when using trainable gating mechanisms.
    • The sentence-based gating mechanism (gate-sentence) is advantageous for inference because it induces significantly less communication overhead compared with token-level routing methods.
Comments
  • The model on target task should be fined-tuned on the basis of BERT or MoEBERT?

    The model on target task should be fined-tuned on the basis of BERT or MoEBERT?

    In README, you mentioned that:

    Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path. Can I fine-tune on bert-base-uncased model and run distillation code with MoE options? Is pretrained MoEBERT model necessary? Thanks very much!

    opened by LisaWang0306 3
  • Parameters are not shared in experts

    Parameters are not shared in experts

    Hi, from the paper I thought that the most important parameters are shared across different experts. However, in the code I did n't see how to ensure the parameters are the same in the training process. I see in utils.py, expert_list[i].fc1.weight.data = fc1_weight_data[idx, :].clone(), but the variable created by clone will not be the same as the old one. I also do experiments to check my assumption. After several steps, the parameters in experts are no longer the same. Can you give more highlights on that? Thanks.

    opened by shukuangxi 0
  • What is the bash script of finetune without MoE

    What is the bash script of finetune without MoE

    Hi @SimiaoZuo , as you mentioned that we need to finetune first. But how to get the finetune model and translate into bert_base_mnli_example.sh! Many thanks!

    opened by CaffreyR 0
  • Error when run `bash bert_base_mnli_example.sh`

    Error when run `bash bert_base_mnli_example.sh`

    Hi @SimiaoZuo , I encoutered problems when run bash bert_base_mnli_example.sh

    The error information is below! Thanks very much!

    /home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
      "The module torch.distributed.launch is deprecated "
    The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
    *****************************************
    Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
    *****************************************
    WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
     Please read local_rank from `os.environ('LOCAL_RANK')` instead.
    INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
      entrypoint       : examples/text-classification/run_glue.py
      min_nodes        : 1
      max_nodes        : 1
      nproc_per_node   : 8
      run_id           : none
      rdzv_backend     : static
      rdzv_endpoint    : 127.0.0.1:29500
      rdzv_configs     : {'rank': 0, 'timeout': 900}
      max_restarts     : 3
      monitor_interval : 5
      log_dir          : None
      metrics_cfg      : {}
    
    INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4
    INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
    INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
    /home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/distributed/elastic/utils/store.py:53: FutureWarning: This is an experimental API and will be changed in future.
      "This is an experimental API and will be changed in future.", FutureWarning
    INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
      restart_count=0
      master_addr=127.0.0.1
      master_port=29500
      group_rank=0
      group_world_size=1
      local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
      global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
    
    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
    INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/0/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/1/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/2/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/3/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/4/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/5/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/6/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/7/error.json
    08/17/2022 10:52:17 - WARNING - __main__ -   Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
    08/17/2022 10:52:17 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir=mnli/model, overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.STEPS, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=mnli/log, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=20, save_strategy=IntervalStrategy.NO, save_steps=500, save_total_limit=None, no_cuda=False, seed=0, fp16=True, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=0, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=mnli/model, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, _n_gpu=1, cls_dropout=None, use_deterministic_algorithms=False)
    Traceback (most recent call last):
    Traceback (most recent call last):
      File "examples/text-classification/run_glue.py", line 729, in <module>
      File "examples/text-classification/run_glue.py", line 729, in <module>
    Traceback (most recent call last):
      File "examples/text-classification/run_glue.py", line 729, in <module>
        main()
      File "examples/text-classification/run_glue.py", line 281, in main
        main()
      File "examples/text-classification/run_glue.py", line 281, in main
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()
      File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()
      File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
        obj = dtype(**inputs)
      File "<string>", line 67, in __init__
        obj = dtype(**inputs)
      File "<string>", line 67, in __init__
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
        main()
      File "examples/text-classification/run_glue.py", line 281, in main
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
    
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()
      File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
                return func(*args, **kwargs)return func(*args, **kwargs)
    
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
    obj = dtype(**inputs)
      File "<string>", line 67, in __init__
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
            return self._setup_devicesreturn self._setup_devices
    
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
        cached = self.fget(obj)    
    cached = self.fget(obj)
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
            return func(*args, **kwargs)return func(*args, **kwargs)
    
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
        return func(*args, **kwargs)
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
            torch.cuda.set_device(device)torch.cuda.set_device(device)
    
      File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
      File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
            torch._C._cuda_setDevice(device)torch._C._cuda_setDevice(device)
    
    RuntimeErrorRuntimeError: : CUDA error: invalid device ordinal
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.CUDA error: invalid device ordinal
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    
        return self._setup_devices
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
        cached = self.fget(obj)
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
        return func(*args, **kwargs)
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
        torch.cuda.set_device(device)
      File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
        torch._C._cuda_setDevice(device)
    RuntimeError: CUDA error: invalid device ordinal
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    Traceback (most recent call last):
    Traceback (most recent call last):
      File "examples/text-classification/run_glue.py", line 729, in <module>
      File "examples/text-classification/run_glue.py", line 729, in <module>
    Traceback (most recent call last):
    Traceback (most recent call last):
      File "examples/text-classification/run_glue.py", line 729, in <module>
      File "examples/text-classification/run_glue.py", line 729, in <module>
        main()
      File "examples/text-classification/run_glue.py", line 281, in main
        main()
      File "examples/text-classification/run_glue.py", line 281, in main
        main()
        main()  File "examples/text-classification/run_glue.py", line 281, in main
    
      File "examples/text-classification/run_glue.py", line 281, in main
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()
      File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()
      File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
            model_args, data_args, training_args = parser.parse_args_into_dataclasses()model_args, data_args, training_args = parser.parse_args_into_dataclasses()
    
      File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
      File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
        obj = dtype(**inputs)
      File "<string>", line 67, in __init__
        obj = dtype(**inputs)
      File "<string>", line 67, in __init__
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
            obj = dtype(**inputs)obj = dtype(**inputs)  File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
    
    
      File "<string>", line 67, in __init__
      File "<string>", line 67, in __init__
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
          File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
            if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
    
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
        return func(*args, **kwargs)
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
        return func(*args, **kwargs)
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
            return func(*args, **kwargs)return func(*args, **kwargs)
    
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
            return self._setup_devicesreturn self._setup_devices
    
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
            return self._setup_devicesreturn self._setup_devices
    
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
            cached = self.fget(obj)cached = self.fget(obj)
    
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
            cached = self.fget(obj)cached = self.fget(obj)
    
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
        return func(*args, **kwargs)
          File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
    return func(*args, **kwargs)
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
            return func(*args, **kwargs)return func(*args, **kwargs)
    
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
        torch.cuda.set_device(device)
          File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch.cuda.set_device(device)
      File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
            torch.cuda.set_device(device)torch.cuda.set_device(device)
    
      File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
      File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
        torch._C._cuda_setDevice(device)
        torch._C._cuda_setDevice(device)RuntimeError
    : CUDA error: invalid device ordinal
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.RuntimeError
    : CUDA error: invalid device ordinal
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
            torch._C._cuda_setDevice(device)torch._C._cuda_setDevice(device)
    
    RuntimeErrorRuntimeError: : CUDA error: invalid device ordinal
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.CUDA error: invalid device ordinal
    CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
    For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
    
    Downloading: 28.8kB [00:00, 16.0MB/s]                                           
    Downloading: 28.7kB [00:00, 16.7MB/s]                                           
    ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 4113193) of binary: /home/user/anaconda3/envs/MoEBERT/bin/python
    ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
    INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
    INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
    INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
    INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
      restart_count=1
      master_addr=127.0.0.1
      master_port=29500
      group_rank=0
      group_world_size=1
      local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
      role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
      global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
    
    INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
    INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/0/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/1/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/2/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/3/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/4/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/5/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/6/error.json
    INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/7/error.json
    
    opened by CaffreyR 0
  • "Need to turn the model to a MoE first" error

    I just remove "--do_train" and "--do_eval" lines in bert_base_mnli_example.sh, an add a line that"--do_predict". But when I run it, "Need to turn the model to a MoE first" error happens. I wonder why it happens, thanks a lot.

    opened by Harry-zzh 5
Owner
Simiao Zuo
PhD Student @ Georgia Tech
Simiao Zuo
Tutel MoE: An Optimized Mixture-of-Experts Implementation

Project Tutel Tutel MoE: An Optimized Mixture-of-Experts Implementation. Supported Framework: Pytorch Supported GPUs: CUDA(fp32 + fp16), ROCm(fp32) Ho

Microsoft 344 Dec 29, 2022
I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

I-SECRET This is the implementation of the MICCAI 2021 Paper "I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive con

null 13 Dec 2, 2022
Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

TR-BERT Source code and dataset for "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference". The code is based on huggaface's transformers.

THUNLP 37 Oct 30, 2022
Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

Knowledge Distillation for BERT Unsupervised Domain Adaptation Official PyTorch implementation | Paper Abstract A pre-trained language model, BERT, ha

Minho Ryu 29 Nov 30, 2022
[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

SapBERT: Self-alignment pretraining for BERT This repo holds code for the SapBERT model presented in our NAACL 2021 paper: Self-Alignment Pretraining

Cambridge Language Technology Lab 104 Dec 7, 2022
[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

AMOS This repository contains the scripts for fine-tuning AMOS pretrained models on GLUE and SQuAD 2.0 benchmarks. Paper: Pretraining Text Encoders wi

Microsoft 22 Sep 15, 2022
[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation Prerequisite Please create and activate the following conda envrionment. To r

Qin Wang 87 Jan 8, 2023
PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation This is the PyTorch implemention of ICCV'21 paper SGPA: Structure

Chen Kai 24 Dec 5, 2022
Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

ATP: AMRize Then Parse! Enhancing AMR Parsing with PseudoAMRs Hi this is the source code of our paper "ATP: AMRize Then Parse! Enhancing AMR Parsing w

Chen Liang 13 Nov 23, 2022
Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Introduction Pytorch implementation of Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Expert. | paper Song Park1

Clova AI Research 97 Dec 23, 2022
Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Despite its importance, there are few previous works applying I2I translation to webtoon. I collected dataset from naver webtoon 연애혁명 and tried to transfer human faces to webtoon domain.

이상윤 64 Oct 19, 2022
Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

RIIT Our open-source code for RIIT: Rethinking the Importance of Implementation Tricks in Multi-AgentReinforcement Learning. We implement and standard

null 405 Jan 6, 2023
Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Image Crop Analysis This is a repo for the code used for reproducing our Image Crop Analysis paper as shared on our blog post. If you plan to use this

Twitter Research 239 Jan 2, 2023
Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

SYMPAIS: Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis Overview | Installation | Documentation | Examples | Notebo

Yicheng Luo 4 Sep 13, 2022
Differentiable Annealed Importance Sampling (DAIS)

Differentiable Annealed Importance Sampling (DAIS) This repository contains the code to reproduce the DAIS results from the paper Differentiable Annea

Guodong Zhang 6 Dec 26, 2021
Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

SCL Introduction Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)' We evaluated our approach using two baseline

null 34 Oct 8, 2022
Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

HackED 2022 Team 3IQ - 2022 Imposter Detector By Aneeljyot Alagh, Curtis Kan, Jo

Joshua Ji 3 Aug 20, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 68 Jul 18, 2022
EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

MADE (Multi-Adapter Dataset Experts) This repository contains the implementation of MADE (Multi-adapter dataset experts), which is described in the pa

Princeton Natural Language Processing 39 Oct 5, 2021