MoEBERT

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Installation

Create and activate conda environment.

conda env create -f environment.yml

Install Transformers locally.

pip install -e .

Note: The code is adapted from this codebase. Arguments regarding LoRA and adapter can be safely ignored.

Instructions

MoEBERT targets task-specific distillation. Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path.

Importance Score Computation

Use bert_base_mnli_example.sh to compute the importance scores, add a --preprocess_importance argument, remove the --do_train argument.
If multiple GPUs are used to compute the importance scores, a importance_[rank].pkl file will be saved for each GPU. Use merge_importance.py to merge these files.
To use the pre-computed importance scores, pass the file name to --moebert_load_importance.

Knowledge Distillation

For GLUE tasks, see examples/text-classification/run_glue.py.
For question answering tasks, see examples/question-answering/run_qa.py.
Run bash bert_base_mnli_example.sh as an example.
The codebase supports different routing strategies: gate-token, gate-sentence, hash-random and hash-balance. Choices should be passed to --moebert_route_method.
- To use hash-balance, a balanced hash list needs to be pre-computed using hash_balance.py. Path to the saved hash list should be passed to --moebert_route_hash_list.
- Add a load balancing loss by setting --moebert_load_balance when using trainable gating mechanisms.
- The sentence-based gating mechanism (gate-sentence) is advantageous for inference because it induces significantly less communication overhead compared with token-level routing methods.

Hi @SimiaoZuo , I encoutered problems when run bash bert_base_mnli_example.sh

The error information is below! Thanks very much!

/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/distributed/launch.py:164: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
  "The module torch.distributed.launch is deprecated "
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
*****************************************
WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
 Please read local_rank from `os.environ('LOCAL_RANK')` instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
  entrypoint       : examples/text-classification/run_glue.py
  min_nodes        : 1
  max_nodes        : 1
  nproc_per_node   : 8
  run_id           : none
  rdzv_backend     : static
  rdzv_endpoint    : 127.0.0.1:29500
  rdzv_configs     : {'rank': 0, 'timeout': 900}
  max_restarts     : 3
  monitor_interval : 5
  log_dir          : None
  metrics_cfg      : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/distributed/elastic/utils/store.py:53: FutureWarning: This is an experimental API and will be changed in future.
  "This is an experimental API and will be changed in future.", FutureWarning
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
  restart_count=0
  master_addr=127.0.0.1
  master_port=29500
  group_rank=0
  group_world_size=1
  local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
  global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_0/7/error.json
08/17/2022 10:52:17 - WARNING - __main__ -   Process rank: 0, device: cuda:0, n_gpu: 1distributed training: True, 16-bits training: True
08/17/2022 10:52:17 - INFO - __main__ -   Training/evaluation parameters TrainingArguments(output_dir=mnli/model, overwrite_output_dir=True, do_train=True, do_eval=True, do_predict=False, evaluation_strategy=IntervalStrategy.STEPS, prediction_loss_only=False, per_device_train_batch_size=8, per_device_eval_batch_size=8, gradient_accumulation_steps=1, eval_accumulation_steps=None, learning_rate=5e-05, weight_decay=0.0, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, max_grad_norm=1.0, num_train_epochs=5.0, max_steps=-1, lr_scheduler_type=SchedulerType.LINEAR, warmup_ratio=0.0, warmup_steps=0, logging_dir=mnli/log, logging_strategy=IntervalStrategy.STEPS, logging_first_step=False, logging_steps=20, save_strategy=IntervalStrategy.NO, save_steps=500, save_total_limit=None, no_cuda=False, seed=0, fp16=True, fp16_opt_level=O1, fp16_backend=auto, fp16_full_eval=False, local_rank=0, tpu_num_cores=None, tpu_metrics_debug=False, debug=False, dataloader_drop_last=False, eval_steps=500, dataloader_num_workers=0, past_index=-1, run_name=mnli/model, disable_tqdm=False, remove_unused_columns=True, label_names=None, load_best_model_at_end=False, metric_for_best_model=None, greater_is_better=None, ignore_data_skip=False, sharded_ddp=[], deepspeed=None, label_smoothing_factor=0.0, adafactor=False, group_by_length=False, report_to=['tensorboard'], ddp_find_unused_parameters=None, dataloader_pin_memory=True, skip_memory_metrics=False, _n_gpu=1, cls_dropout=None, use_deterministic_algorithms=False)
Traceback (most recent call last):
Traceback (most recent call last):
  File "examples/text-classification/run_glue.py", line 729, in <module>
  File "examples/text-classification/run_glue.py", line 729, in <module>
Traceback (most recent call last):
  File "examples/text-classification/run_glue.py", line 729, in <module>
    main()
  File "examples/text-classification/run_glue.py", line 281, in main
    main()
  File "examples/text-classification/run_glue.py", line 281, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 67, in __init__
    obj = dtype(**inputs)
  File "<string>", line 67, in __init__
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
    main()
  File "examples/text-classification/run_glue.py", line 281, in main
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper

  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
            return func(*args, **kwargs)return func(*args, **kwargs)

  File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
obj = dtype(**inputs)
  File "<string>", line 67, in __init__
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
        return self._setup_devicesreturn self._setup_devices

  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
    cached = self.fget(obj)    
cached = self.fget(obj)
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
        return func(*args, **kwargs)return func(*args, **kwargs)

  File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
    return func(*args, **kwargs)
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
        torch.cuda.set_device(device)torch.cuda.set_device(device)

  File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
  File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
        torch._C._cuda_setDevice(device)torch._C._cuda_setDevice(device)

RuntimeErrorRuntimeError: : CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

    return self._setup_devices
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
    cached = self.fget(obj)
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
    return func(*args, **kwargs)
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
    torch.cuda.set_device(device)
  File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch._C._cuda_setDevice(device)
RuntimeError: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Traceback (most recent call last):
Traceback (most recent call last):
  File "examples/text-classification/run_glue.py", line 729, in <module>
  File "examples/text-classification/run_glue.py", line 729, in <module>
Traceback (most recent call last):
Traceback (most recent call last):
  File "examples/text-classification/run_glue.py", line 729, in <module>
  File "examples/text-classification/run_glue.py", line 729, in <module>
    main()
  File "examples/text-classification/run_glue.py", line 281, in main
    main()
  File "examples/text-classification/run_glue.py", line 281, in main
    main()
    main()  File "examples/text-classification/run_glue.py", line 281, in main

  File "examples/text-classification/run_glue.py", line 281, in main
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
    model_args, data_args, training_args = parser.parse_args_into_dataclasses()
  File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()model_args, data_args, training_args = parser.parse_args_into_dataclasses()

  File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
  File "/home/user/MoEBERT/src/transformers/hf_argparser.py", line 187, in parse_args_into_dataclasses
    obj = dtype(**inputs)
  File "<string>", line 67, in __init__
    obj = dtype(**inputs)
  File "<string>", line 67, in __init__
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
        obj = dtype(**inputs)obj = dtype(**inputs)  File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__


  File "<string>", line 67, in __init__
  File "<string>", line 67, in __init__
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 552, in __post_init__
    if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
      File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
        if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):if is_torch_available() and self.device.type != "cuda" and (self.fp16 or self.fp16_full_eval):

  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
    return func(*args, **kwargs)
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
    return func(*args, **kwargs)
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
        return func(*args, **kwargs)return func(*args, **kwargs)

  File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 695, in device
        return self._setup_devicesreturn self._setup_devices

  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
        return self._setup_devicesreturn self._setup_devices

  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1420, in __get__
        cached = self.fget(obj)cached = self.fget(obj)

  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
        cached = self.fget(obj)cached = self.fget(obj)

  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
  File "/home/user/MoEBERT/src/transformers/file_utils.py", line 1430, in wrapper
    return func(*args, **kwargs)
      File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
return func(*args, **kwargs)
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
        return func(*args, **kwargs)return func(*args, **kwargs)

  File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
  File "/home/user/MoEBERT/src/transformers/training_args.py", line 685, in _setup_devices
    torch.cuda.set_device(device)
      File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
torch.cuda.set_device(device)
  File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
        torch.cuda.set_device(device)torch.cuda.set_device(device)

  File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
  File "/home/user/anaconda3/envs/MoEBERT/lib/python3.7/site-packages/torch/cuda/__init__.py", line 264, in set_device
    torch._C._cuda_setDevice(device)
    torch._C._cuda_setDevice(device)RuntimeError
: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.RuntimeError
: CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
        torch._C._cuda_setDevice(device)torch._C._cuda_setDevice(device)

RuntimeErrorRuntimeError: : CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.CUDA error: invalid device ordinal
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

Downloading: 28.8kB [00:00, 16.0MB/s]                                           
Downloading: 28.7kB [00:00, 16.7MB/s]                                           
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 1 (pid: 4113193) of binary: /home/user/anaconda3/envs/MoEBERT/bin/python
ERROR:torch.distributed.elastic.agent.server.local_elastic_agent:[default] Worker group failed
INFO:torch.distributed.elastic.agent.server.api:[default] Worker group FAILED. 3/3 attempts left; will restart worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Stopping worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
  restart_count=1
  master_addr=127.0.0.1
  master_port=29500
  group_rank=0
  group_world_size=1
  local_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  role_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  global_ranks=[0, 1, 2, 3, 4, 5, 6, 7]
  role_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]
  global_world_sizes=[8, 8, 8, 8, 8, 8, 8, 8]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/3/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker4 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/4/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker5 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/5/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker6 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/6/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker7 reply file to: /tmp/torchelastic_x6q4uwtj/none_xdo7jqx4/attempt_1/7/error.json

The model on target task should be fined-tuned on the basis of BERT or MoEBERT?

In README, you mentioned that:

Before running any distillation code, a pre-trained BERT model should be fine-tuned on the target task. Path to the fine-tuned model should be passed to --model_name_or_path. Can I fine-tune on bert-base-uncased model and run distillation code with MoE options? Is pretrained MoEBERT model necessary? Thanks very much!

opened by LisaWang0306 3
Parameters are not shared in experts

Hi, from the paper I thought that the most important parameters are shared across different experts. However, in the code I did n't see how to ensure the parameters are the same in the training process. I see in utils.py, expert_list[i].fc1.weight.data = fc1_weight_data[idx, :].clone(), but the variable created by clone will not be the same as the old one. I also do experiments to check my assumption. After several steps, the parameters in experts are no longer the same. Can you give more highlights on that? Thanks.

opened by shukuangxi 0
What is the bash script of finetune without MoE

Hi @SimiaoZuo , as you mentioned that we need to finetune first. But how to get the finetune model and translate into bert_base_mnli_example.sh! Many thanks!

opened by CaffreyR 0

Error when run `bash bert_base_mnli_example.sh`

opened by CaffreyR 0

"Need to turn the model to a MoE first" error

I just remove "--do_train" and "--do_eval" lines in bert_base_mnli_example.sh, an add a line that"--do_predict". But when I run it, "Need to turn the model to a MoE first" error happens. I wonder why it happens, thanks a lot.

opened by Harry-zzh 5

This PyTorch package implements MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation (NAACL 2022).

Related tags

Overview

MoEBERT

Installation

Instructions

Importance Score Computation

Knowledge Distillation

Comments

The model on target task should be fined-tuned on the basis of BERT or MoEBERT?

Parameters are not shared in experts

What is the bash script of finetune without MoE

Error when run `bash bert_base_mnli_example.sh`

"Need to turn the model to a MoE first" error

Owner

Simiao Zuo

Tutel MoE: An Optimized Mixture-of-Experts Implementation

I-SECRET: Importance-guided fundus image enhancement via semi-supervised contrastive constraining

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

Adversarial Adaptation with Distillation for BERT Unsupervised Domain Adaptation

[NAACL & ACL 2021] SapBERT: Self-alignment pretraining for BERT.

[ICLR 2022] Pretraining Text Encoders with Adversarial Mixture of Training Signal Generators

[CVPR 2022] CoTTA Code for our CVPR 2022 paper Continual Test-Time Domain Adaptation

PyTorch implemention of ICCV'21 paper SGPA: Structure-Guided Prior Adaptation for Category-Level 6D Object Pose Estimation

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

Official PyTorch implementation of MX-Font (Multiple Heads are Better than One: Few-shot Font Generation with Multiple Localized Experts)

Face2webtoon - Despite its importance, there are few previous works applying I2I translation to webtoon.

Rethinking the Importance of Implementation Tricks in Multi-Agent Reinforcement Learning

Code for reproducing our analysis in the paper titled: Image Cropping on Twitter: Fairness Metrics, their Limitations, and the Importance of Representation, Design, and Agency

Symbolic Parallel Adaptive Importance Sampling for Probabilistic Program Analysis in JAX

Differentiable Annealed Importance Sampling (DAIS)

Code for 'Self-Guided and Cross-Guided Learning for Few-shot segmentation. (CVPR' 2021)'

Imposter-detector-2022 - HackED 2022 Team 3IQ - 2022 Imposter Detector

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering