PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Related tags

Deep Learning hyperformer

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

This repo contains the PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Installation

python setup.py install

How to run the models

We provide example scripts for each model in hyperformer/scripts/ folder with their config files in hyperformer/configs. To run the models, please do cd hyperformer and:

To run hyperformer++ model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks and layers of a transformer.):
```
bash scripts/hyperformer++.sh
```
To run hyperformer model (This model generates the task-specific adapters using a shared hypernetwork, which is shared across the tasks, but this is specific to each layer of a transformer. This model is less efficient compared to hyperformer++.):
```
bash scripts/hyperformer.sh
```
To run adapter\dagger model (This model share the layer normalization between adapters across the tasks, and train adapters in a multi-task setting.):
```
bash scripts/adapters_dagger.sh   
```
To run adapter model (This model trains a single-adapter per task and trains the adapters in a single-task learning.):
```
bash scripts/adapters.sh 
```
To run T5 finetuning model in a multi-task learning setup:
```
bash scripts/finetune.sh
```
To run T5 finetuning model in a single-task learning setup:
```
bash scripts/finetune_single_task.sh
```

We run all the models on 4 GPUs, while this is not necessary and one can run the models on 1 GPU. In case running on one GPU, in all the scripts, please remove the -m torch.distributed.launch --nproc_per_node=4 part.

Bibliography

If you find this repo useful, please cite our paper.

@inproceedings{karimi2021parameterefficient,
  title={Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks},
  author={Karimi Mahabadi, Rabeeh and Ruder, Sebastian and Dehghani, Mostafa and Henderson, James},
  booktitle={Annual Meeting of the Association for Computational Linguistics},
  year={2021}
}

Final words

Hope this repo is useful for your research. For any questions, please create an issue or email [email protected], and I will get back to you as soon as possible.

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

xTune Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning. Environment DockerFile: dancingsoul/pytorch:xTune Install the f

42 Dec 9, 2022

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

🌈 ERASOR (RA-L'21 with ICRA Option) Official page of "ERASOR: Egocentric Ratio of Pseudo Occupancy-based Dynamic Object Removal for Static 3D Point C

225 Dec 29, 2022

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Spacetimeformer Multivariate Forecasting This repository contains the code for the paper, "Long-Range Transformers for Dynamic Spatiotemporal Forecast

440 Jan 2, 2023

Comments

Off the shelf generation from trained hyperformer++

from hyperformer.adapters import AdapterController, AutoAdapterConfig
from hyperformer.third_party.models import T5Config, T5ForConditionalGeneration
from transformers import AutoTokenizer, set_seed
import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

set_seed(42)
config = T5Config.from_pretrained('t5-3b',cache_dir="/local/nlpswordfish/tuhin/")
tokenizer = AutoTokenizer.from_pretrained('t5-3b',cache_dir="/local/nlpswordfish/tuhin/")

adapter_config = AutoAdapterConfig.get('meta-adapter')


#####################
adapter_config.input_dim = 1024
adapter_config.tasks = data_args.tasks
adapter_config.device = training_args.device
adapter_config.task_to_adapter = {task:adapter for task, adapter in zip(data_args.tasks, data_args.adapters)} if data_args.adapters is not None else None
adapter_config.task_to_embeddings = {task:embedding for task, embedding in zip(data_args.tasks, data_args.task_embeddings)} if (data_args.task_embeddings is not None) else None
######################

extra_adapter_params = ("task_embedding_dim","add_layer_norm_before_adapter","add_layer_norm_after_adapter","reduction_factor","hidden_dim","non_linearity","train_task_embeddings","projected_task_embedding_dim","task_hidden_dim","conditional_layer_norm","train_adapters_blocks","unique_hyper_net","unique_hyper_net_layer_norm","efficient_unique_hyper_net")

for p in extra_adapter_params:
    if hasattr(adapter_args, p) and hasattr(adapter_config, p):
        setattr(adapter_config, p, getattr(adapter_args, p))



model = T5ForConditionalGeneration.from_pretrained("/mnt/swordfish-datastore/tuhin/hyperformer++",from_tf=False, config=config,cache_dir="/local/nlpswordfish/tuhin/",adapter_config=adapter_config)
model.cuda()
inputs = tokenizer.encode("it 's a charming and often affecting journey .", return_tensors="pt")

gen_kwargs = {"max_length": 256, "num_beams": 1}
gen_kwargs["task"] = "sst"
gen_kwargs["task_embedding"] = model.task_embedding_controller("sst") if (self.config.train_adapters and isinstance(self.adapter_config, MetaAdapterConfig)) else None
outputs = model.generate(input_ids=inputs.cuda(),**gen_kwargs)
answer = tokenizer.decode(outputs[0],skip_special_tokens=True)


print("Predicted output", answer)

opened by tuhinjubcse 6

PyTorch implementation of the ACL, 2021 paper Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks.

Related tags

Overview

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

Installation

How to run the models

Bibliography

Final words

You might also like...

Code for ACL2021 paper Consistency Regularization for Cross-Lingual Fine-Tuning.

This is the official pytorch implementation for our ICCV 2021 paper "TRAR: Routing the Attention Spans in Transformers for Visual Question Answering" on VQA Task

Multivariate Time Series Forecasting with efficient Transformers. Code for the paper "Long-Range Transformers for Dynamic Spatiotemporal Forecasting."

Black-Box-Tuning - Black-Box Tuning for Language-Model-as-a-Service

The coda and data for "Measuring Fine-Grained Domain Relevance of Terms: A Hierarchical Core-Fringe Approach" (ACL '21)

Cartoon-StyleGan2 🙃 : Fine-tuning StyleGAN2 for Cartoon Face Generation

Official codebase for Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World

Fine-tuning StyleGAN2 for Cartoon Face Generation

Ensemble Knowledge Guided Sub-network Search and Fine-tuning for Filter Pruning

Comments

Off the shelf generation from trained hyperformer++

Owner

Rabeeh Karimi Mahabadi

Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"

Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer"

The Power of Scale for Parameter-Efficient Prompt Tuning

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

This repository is the official implementation of Unleashing the Power of Contrastive Self-Supervised Visual Models via Contrast-Regularized Fine-Tuning (NeurIPS21).

Official code implementation for "Personalized Federated Learning using Hypernetworks"

official Pytorch implementation of ICCV 2021 paper FuseFormer: Fusing Fine-Grained Information in Transformers for Video Inpainting.

NeuralWOZ: Learning to Collect Task-Oriented Dialogue via Model-based Simulation (ACL-IJCNLP 2021)

Implementation of the 😇 Attention layer from the paper, Scaling Local Self-Attention For Parameter Efficient Visual Backbones

Code for the ICML 2021 paper "Bridging Multi-Task Learning and Meta-Learning: Towards Efficient Training and Effective Adaptation", Haoxiang Wang, Han Zhao, Bo Li.