Huggingface Transformers + Adapters = ❤️

Overview

adapter-transformers

A friendly fork of HuggingFace's Transformers, adding Adapters to PyTorch language models

Tests GitHub PyPI

adapter-transformers is an extension of HuggingFace's Transformers library, integrating adapters into state-of-the-art language models by incorporating AdapterHub, a central repository for pre-trained adapter modules.

💡 Important: This library can be used as a drop-in replacement for HuggingFace Transformers and regularly synchronizes new upstream changes. Thus, most files in this repository are direct copies from the HuggingFace Transformers source, modified only with changes required for the adapter implementations.

Installation

adapter-transformers currently supports Python 3.6+ and PyTorch 1.3.1+. After installing PyTorch, you can install adapter-transformers from PyPI ...

pip install -U adapter-transformers

... or from source by cloning the repository:

git clone https://github.com/adapter-hub/adapter-transformers.git
cd adapter-transformers
pip install .

Getting Started

HuggingFace's great documentation on getting started with Transformers can be found here. adapter-transformers is fully compatible with Transformers.

To get started with adapters, refer to these locations:

  • Colab notebook tutorials, a series notebooks providing an introduction to all the main concepts of (adapter-)transformers and AdapterHub
  • https://docs.adapterhub.ml, our documentation on training and using adapters with adapter-transformers
  • https://adapterhub.ml to explore available pre-trained adapter modules and share your own adapters
  • Examples folder of this repository containing HuggingFace's example training scripts, many adapted for training adapters

Citation

If you use this library for your work, please consider citing our paper AdapterHub: A Framework for Adapting Transformers:

@inproceedings{pfeiffer2020AdapterHub,
    title={AdapterHub: A Framework for Adapting Transformers},
    author={Pfeiffer, Jonas and
            R{\"u}ckl{\'e}, Andreas and
            Poth, Clifton and
            Kamath, Aishwarya and
            Vuli{\'c}, Ivan and
            Ruder, Sebastian and
            Cho, Kyunghyun and
            Gurevych, Iryna},
    booktitle={Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations},
    pages={46--54},
    year={2020}
}
Comments
  • "Parallel" option for training? Parallel adapter outputs required (without interacting with each other).

    Hello,

    Thanks for this nice framework 👍 . I might be asking something that isn't yet possible but wanted to at least try asking!

    I am trying to feed two BERT-based model's outputs to subsequent NN. This requires having two BERT models to be loaded, however, the memory consumption becomes too high if I load two BERT models. To remedy this, I was wondering if I could do something like "Parallel" in training time. (FYI, I am not trying to dynamically drop the first few layers and simply trying to create two BERT forward paths with lesser memory consumption)

    I understand that active adapters can be switched by set_active_adapters(). (Actually, could you confirm if my understanding is correct?) But, this doesn't seem to fit my purpose as, in my case, I need both adapters to output independent representation based on respective adapters.

    Is there anyways that I can make adapters not interact with each other on the forward path while not loading original BERT parameters twice?

    • Making this question even more complex, I also need to make one adapter's parameters to be non-differentiable while requiring them in the forward loop. Any ideas perhaps? :)
    question 
    opened by leejayyoon 18
  • ImportError: cannot import name 'AutoModelWithHeads' from 'transformers'

    ImportError: cannot import name 'AutoModelWithHeads' from 'transformers'

    Hi I am trying with this example colab: https://colab.research.google.com/github/Adapter-Hub/website/blob/master/app/static/notebooks/Adapter_Quickstart_Training.ipynb#scrollTo=Lbwb3NRf8mBF

    getting this error:

    Traceback (most recent call last):
      File "test.py", line 11, in <module>
        from transformers import AutoTokenizer, EvalPrediction, GlueDataset, GlueDataTrainingArguments, AutoModelWithHeads, AdapterType
    ImportError: cannot import name 'AutoModelWithHeads' from 'transformers' (/idiap/user/rkarimi/libs/anaconda3/envs/adapter/lib/python3.7/site-packages/transformers/__init__.py)
    

    versions

    (adapter) rkarimi@italix17:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep transformers
    adapter-transformers      1.0.1                     <pip>
    transformers              3.5.1                     <pip>
    (adapter) rkarimi@italix17:/idiap/user/rkarimi/dev/internship/seq2seq/adapter-transformers$ conda list | grep pytorch
    pytorch-lightning         1.0.4                     <pip>
    adapter hub from github is installed
    
    bug 
    opened by rabeehkarimimahabadi 17
  • training the language adapters in the MAD-X paper

    training the language adapters in the MAD-X paper

    Hi I would need to train language adapters as done in MAD-X paper, I have downloaded wikipedia data, but these are very large-scale data and so far I did not managed to train them, I was wondering if you could share with me the script that you managed to train the language adapters, thank you very much in advance.

    question 
    opened by dorost1234 13
  • Add t5 adapter

    Add t5 adapter

    Followed the pattern of Bart to add adapters to T5. One change is that whereas Bart has separate classes for encoder and decoder, T5 does not. So I am using the is_decoder for changes between encoder and decoder classes, such as adding cross_attention adapters and adding invertible adapters.

    I'm working on some testing.

    opened by AmirAktify 12
  • Training an Adapter using own classification head and pytorch training loop

    Training an Adapter using own classification head and pytorch training loop

    Details

    Hello ! I want to add adapter approach in my text-classification pre-trained bert, but I did not find a good explanation in the documentation on how to that. My model class is the following:

    class BertClassifier(nn.Module):
        """Bert Model for Classification Tasks."""
        def __init__(self, freeze_bert=True):
            """
             @param    bert: a BertModel object
             @param    classifier: a torch.nn.Module classifier
             @param    freeze_bert (bool): Set `False` to fine-tune the BERT model
            """
            super(BertClassifier, self).__init__()
    
            # Instantiate BERT model
            # Specify hidden size of BERT, hidden size of our classifier, and number of labels
            self.bert = BertAdapterModel.from_pretrained(PREETRAINED_MODEL')
            self.D_in = 1024 
            self.H = 512
            self.D_out = 2
            
    
            # Add a new adapter
            self.bert.add_adapter("thermo_cl",set_active=True)
            self.bert.train_adapter(["thermo_cl"])
    
     
            # Instantiate the classifier head with some one-layer feed-forward classifier
            self.classifier = nn.Sequential(
                nn.Linear(self.D_in, 512),
                nn.Tanh(),
                nn.Linear(512, self.D_out),
                nn.Tanh()
            )
     
             # Freeze the BERT model
            if freeze_bert:
                for param in self.bert.parameters():
                    param.requires_grad = True
    
    
        def forward(self, input_ids, attention_mask):
            ''' Feed input to BERT and the classifier to compute logits.
             @param    input_ids (torch.Tensor): an input tensor with shape (batch_size,
                           max_length)
             @param    attention_mask (torch.Tensor): a tensor that hold attention mask
                           information with shape (batch_size, max_length)
             @return   logits (torch.Tensor): an output tensor with shape (batch_size,
                           num_labels) '''
             # Feed input to BERT
            outputs = self.bert(input_ids=input_ids,
                                 attention_mask=attention_mask)
             
             # Extract the last hidden state of the token `[CLS]` for classification task
            last_hidden_state_cls = outputs[0][:, 0, :]
     
             # Feed input to classifier to compute logits
            logits = self.classifier(last_hidden_state_cls)
     
            return logits
    

    The training loop is the following:

    def initialize_model(epochs):
        """ Initialize the Bert Classifier, the optimizer and the learning rate scheduler."""
        # Instantiate Bert Classifier
        bert_classifier = BertClassifier(freeze_bert=False) #false=freezed
    
        # Tell PyTorch to run the model on GPU
        bert_classifier = bert_classifier.to(device)
    
        # Create the optimizer
        optimizer = AdamW(bert_classifier.parameters(),
                          lr=lr,    # Default learning rate
                          eps=1e-8    # Default epsilon value
                          )
    
        # Total number of training steps
        total_steps = len(train_dataloader) * epochs
    
        # Set up the learning rate scheduler
        scheduler = get_linear_schedule_with_warmup(optimizer,
                                                    num_warmup_steps=0, # Default value
                                                    num_training_steps=total_steps)
    
        return bert_classifier, optimizer, scheduler
    
    def train(model, train_dataloader, val_dataloader, valid_loss_min_input, checkpoint_path, best_model_path, start_epochs, epochs, evaluation=True):
    
        """Train the BertClassifier model."""
        # Start training loop
        logging.info("--Start training...\n")
    
        # Initialize tracker for minimum validation loss
        valid_loss_min = valid_loss_min_input 
    
    
        for epoch_i in range(start_epochs, epochs):
    
                              ..............................
    
         if evaluation == True:
                # After the completion of each training epoch, measure the model's performance
                # on our validation set.
                val_loss, val_accuracy = evaluate(model, val_dataloader)
    
                # Print performance over the entire training data
                time_elapsed = time.time() - t0_epoch
                
                logging.info(f"{epoch_i + 1:^7} | {'-':^7} | {avg_train_loss:^12.6f} | {val_loss:^10.6f} | {val_accuracy:^10.6f} | {time_elapsed:^9.2f}")
    
                logging.info("-"*70)
            logging.info("\n")
    
             # create checkpoint variable and add important data
            checkpoint = {
                'epoch': epoch_i + 1,
                'valid_loss_min': val_loss,
                'state_dict': model.state_dict(),
                'optimizer': optimizer.state_dict(),
            }
            
            # save checkpoint
            save_ckp(checkpoint, False, checkpoint_path, best_model_path)
            
            ## TODO: save the model if validation loss has decreased
            if val_loss <= valid_loss_min:
                print('Validation loss decreased ({:.6f} --> {:.6f}).  Saving model ...'.format(valid_loss_min,val_loss))
                # save checkpoint as best model
                save_ckp(checkpoint, True, checkpoint_path, best_model_path)
                valid_loss_min = val_loss
    
    
        model.save_adapter("./final_adapter", "thermo_cl")
        logging.info("-----------------Training complete--------------------------")
    
    
    bert_classifier, optimizer, scheduler = initialize_model(epochs=n_epochs)
    train(model = bert_classifier....)
    

    As you can see I have my own personalized classification head, so I don't want to use the .add_classification_head() method. Is it correct to train and activate the adapter in this way? I would like to know if I'm using adapter properly and also how to save the checkpoint and my model weights because at the end of the training (where i suppose to save the adapter) I receive this error:

    AttributeError: 'BertClassifier' object has no attribute 'save_adapter'
    

    Thanks for the help!

    question Stale 
    opened by Ch-rode 11
  • Merge with original transformers library

    Merge with original transformers library

    🚀 Feature request

    Merge this into the original transformers library.

    Motivation

    This library is awesome so thanks a lot but it would be much more convenient to have this merged into the original transformers library. The Huggingface team seems to be focused on adding lightweight options for their models and adapters are huge time-and-memory-savers for multitask use cases and would be a great addition to the transformers library.

    Your contribution

    You've done the integration here already so it should be straightforward but happy to help. I've posted an issue on huggingface's end as well.

    discussion Stale 
    opened by salimmj 11
  • Unintuitive slowdown in data loading and model updating on using adapters

    Unintuitive slowdown in data loading and model updating on using adapters

    Environment info

    • transformers version: 1.0.1
    • Platform: Linux-3.10.0-1127.19.1.el7.x86_64-x86_64-with-glibc2.10
    • Python version: 3.8.5
    • PyTorch version (GPU?): 1.7.0 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: Yes

    Who can help: @LysandreJik @patrickvonplaten

    Model I am using: Bert

    Language I am using the model on:English

    Adapter setup I am using (if any): HoulsbyConfig

    The problem arises when using: My own modified scripts: I want to use adapters for a project of mine, which will require fine-tuning BERT multiple times. In order to get an understanding of how much speedup I shall get from using adapters, I profiled the various steps in the training loop of BERT, both with and without the use of adapters The tasks I am working on is: Stanford Natural Language inference(SNLI)

    To reproduce

    Steps to reproduce the behavior: The following function is executed for a period of 4 hours on identical GPUs(via an LSF bach system) once with UseAdapter set to true and once with it set to False. The path contains a preloaded and tokenized version of the SNLI training set(as well as the test and dev sets, dropped here via underscores)

    def load_and_train(path, UseAdapter):
        x_train,y_train,a_train,t_train,_,_,_,_,_,_,_,_=load(open(path,"rb"))
        train_inst=torch.tensor(x_train)
        train_att=torch.tensor(a_train)
        train_types=torch.tensor(t_train)
        train_targ=torch.tensor(y_train)
        train_data = TensorDataset(train_inst, train_att, train_types,train_targ)
        train_sampler = RandomSampler(train_data)
        train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=32)
        model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=3)
        if UseAdapter:
            model.add_adapter("SNLI",AdapterType.text_task,HoulsbyConfig().__dict__)
            model.train_adapter(["SNLI"])
            model.set_active_adapters(["SNLI"])
        model.cuda()
        optimizer=AdamW(model.parameters(),lr=1e-4)
        scheduler=get_linear_schedule_with_warmup(optimizer,0,len(train_dataloader)*EPOCHS)
        iter=0
        time_load=0
        time_cler=0
        time_forw=0
        time_back=0
        time_updt=0
        for e in range(15):
            model.train()
            for batch in train_dataloader:
                last=time()
                x=batch[0].cuda()
                a=batch[1].cuda()
                t=batch[2].cuda()
                y=batch[3].cuda()
                time_load+=time()-last
                last=time()
                model.zero_grad()
                time_cler+=time()-last
                last=time()
                outputs = model(x, token_type_ids=t, attention_mask=a, labels=y)
                time_forw+=time()-last
                last=time()
                loss=outputs[0]
                loss.backward()
                time_back+=time()-last
                last=time()
                optimizer.step()
                scheduler.step()
                time_updt+=time()-last
                iter+=1
                print(time_load,time_cler,time_forw,time_back,time_updt)
    

    Expected behavior

    1. With Adapters the trainer is able to run through more batches than without by the time the job gets timed out
    2. Per Batch time_load is identical for both cases
    3. Per Batch time_cler is slightly lower with adapters due to the presence of fewer gradients
    4. Per Batch time_forw is slightly higher with adapters due to extra layers that are introduced
    5. Per Batch time_back is significantly lower with adapters since it needs to save fewer gradients
    6. Per Batch time_updt is lower with adapters due to having fewer parameters to update

    Observed Behaviour

    Overall times(seconds):

    Adapter | Load Time | Clear Time | Forward Prop | Backward Prop | Update | Total | No of Batches -- | -- | -- | -- | -- | -- | -- | -- No | 9.141064644 | 349.405822 | 873.8870151 | 11770.82554 | 1159.772 | 14163.03 | 69022 Yes | 2721.683394 | 394.4980106 | 1652.686945 | 3192.402303 | 6304.335 | 14265.61 | 95981

    Per Batch Times(seconds):

    Adapter | Load Time | Clear Time | Forward Prop | Backward Prop | Update -- | -- | -- | -- | -- | -- No | 0.000132437 | 0.005062238 | 0.012660992 | 0.1705373 | 0.016803 Yes | 0.028356481 | 0.004110168 | 0.017218897 | 0.033260774 | 0.065683

    As is evident from above, points 2 and 6 above are not satisfied in this output. Note that similar observations were made in 2 reruns of the experiment. It is unclear to me if there is an explanation I am missing or if this is an implementation issue.

    bug 
    opened by cs1160701 9
  • Loading custom adapters and 'output_attentions' for AdapterFusion

    Loading custom adapters and 'output_attentions' for AdapterFusion

    Question

    Information

    Model I am using (Bert, XLNet ...): XLM-RoBERTa-base

    Language I am using the model on (English, Chinese ...): Korean

    Adapter setup I am using (if any):

    The problem arises when using:

    • [X] the official example scripts: (give details below)
    • [ ] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [X] my own task or dataset: (give details below)
    • Datasets: KorNLI and KorSTS (Machine translated Korean MNLI & STS-B dataset)
    • Its format and size are the same as the original datasets (MNLI & STS-B)

    Background

    What I'm doing is that:

    1. train Task-Adapters for KorNLI and KorSTS on the XLM-RoBERTa-base model (to train on Korean datasets) using the official code, 'run_glue_alt.py'
    2. fusion both adapters with a fusion layer using 'run_fusion_glue.py'

    Questions

    Sorry that I'm not familiar with the adapter-transformers codebase. Here are some questions about the AdapterFusion framework.

    1. Is it available to load my own pre-trained adapters using 'model.load_adapter' function in the current framework? (I'm using the latest version of adapter-transformers')
    2. The performance on the target task (KorSTS) composed with KorSTS and KorNLI single task adapters is markedly lower than the single task adapter trained on the KorSTS dataset. Even with various hyperparameter (batch size, epoch, learning rate, fusion config, ...) search, the performance doesn't seem to be improved. Is there any way to check whether the fusion layer is trained properly?
    3. Connected with the questions above, is it possible to investigate the attention distribution of the trained fusion layer? I've checked there is an option 'output_attentions' defined in the BertModel class, but I could not find a way to output attention weights of the fusion layers, not the self-attention layers of the original pre-trained model.

    Environment info

    • transformers version:
    • Platform:
    • Python version: 3.6.3
    • PyTorch version (GPU?): 1.4
    • Tensorflow version (GPU?):
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: No, I'm using a single GPU
    bug question 
    opened by bigkunzi 9
  • TypeError: unhashable type: 'Stack' error raised when using Parallel adapter heads

    TypeError: unhashable type: 'Stack' error raised when using Parallel adapter heads

    Environment info

    • adapter-transformers version:
    • Platform: Linux
    • Python version: 3.6.8
    • PyTorch version (GPU?): GPU / 1.7
    • Tensorflow version (GPU?): NA
    • Using GPU in script?: Yes
    • Using distributed or parallel set-up in script?: Using nn.DataParallel

    Information

    Model I am using (Bert, XLNet ...): BERT pretrained model with 3 custom adapters + heads are used.

    Language I am using the model on (English, Chinese ...): EN

    Adapter setup I am using (if any): 3 Adapters (with default configuration) and 3 Classification Head.

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [ ] my own modified scripts: (give details below)

    The tasks I am working on is: Multi-task finetuning using AdapterHub

    Error below :

     (from logs) active head : [<bound method AdapterCompositionBlock.last of Stack[combined, resource_type, action]>]
    
    Traceback (most recent call last):
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/transformers/models/bert/modeling_bert.py", line 1092, in forward
        head_inputs, head_name=head, attention_mask=attention_mask, return_dict=return_dict, **kwargs
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/transformers/adapters/heads.py", line 509, in forward_head
        if head not in self.heads:
      File "/home/hchoi/remote_sessions/.venv/lib/python3.6/site-packages/torch/nn/modules/container.py", line 304, in __contains__
        return key in self._modules
    TypeError: unhashable type: 'Stack'
    

    Modified code below

    
    model = AutoModelWithHeads.from_pretrained('bert_base_uncased')
    
    # 3 adapters and classification heads are added.
    model.add_adapter('name_a')
    model.add_classification_head('name_a',  {'num_labels' : 100})
    
    model.add_adapter('name_b')
    model.add_classification_head('name_b')
    
    model.add_adapter('name_c')
    model.add_classification_head('name_c',  {'num_labels' : 5})
    
    
    # Use `Parallel` to enable multiple active heads.
    adapter_names  = ['name_a', 'name_b', 'name_c']
    model.active_heads =  ac.Parallel(adapter_names)
    
    for name in adapter_names:
        model.train_adapter(name)
        
    # Invoke forward pass. This will trigger the error. 
    model(inputs)
    
    

    Expected behavior

    Model forward pass should work.

    bug 
    opened by hchoi-moveworks 8
  • Hinglish Sentiment Adapter

    Hinglish Sentiment Adapter

    🌟 New Adapter setup

    Model and Data Description

    Hinglish: Romanized version of Hindi, and is immensely popular in India, where Hindi is spoken by millions of people but typed quite often in Roman script

    Dataset: SemEval 2020 Task 9 Sentiment Analysis: 3 classes, +ve, -ve and neutral

    Open source status

    • [x] Code Implementation for the Adapter: https://colab.research.google.com/drive/19lofRd9n142xJCtUteZb5L_r7spGcGLL?usp=sharing
    • [x] Past Work: Accepted Paper, Code and Model Weights
    • [x] Who are the authors: @NirantK and @meghanabhange

    What I need help with

    • [x] Because there were no examples other than Glue Datasets, I ended up implementing a new HinglishDataset class and other skeleton code -- I'd appreciate a review if I got something wrong

    Next Steps

    If all is well in the code above, I'd like to continue along and contribute an adapter for Hinglish under the Sentiment task.

    enhancement 
    opened by NirantK 8
  • Train adapters without Hugging Face Trainer scripts

    Train adapters without Hugging Face Trainer scripts

    Hi, I was looking into example scripts for Adapter-Hub and almost all *_no_trainer.py scripts were not using adapters at all. Are you guys planning to add those scripts soon? I can also help in porting trainer scripts to no_trainer scripts if someone can guide me about what all changes will be required for that. Thank you!

    cc: @calpt

    question Stale 
    opened by bhavitvyamalik 7
  • T5: Missing tied weights crash `accelerate`

    T5: Missing tied weights crash `accelerate`

    First opened at https://github.com/huggingface/accelerate/issues/958 . When huggingface accelerate is used via device_map='auto', there is a weight tied with the missing lm_head that stimulates a crash inside the device map planning code. It would be nice if there were a clear way to retain the head and tied weight during loading.

    Environment info

    • adapter-transformers version: 3.1.0
    • Platform: Linux-3.10.0-1160.80.1.el7.x86_64-x86_64-with-glibc2.17
    • Python version: 3.9.16+
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.13.1+cu117 (True)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes, device_map='auto'
    • Using distributed or parallel set-up in script?: no

    Information

    Model I am using (Bert, XLNet ...): google/flan-t5-base

    Language I am using the model on (English, Chinese ...): n/a

    Adapter setup I am using (if any): AutoAdapterModel.from_pretrained

    The problem arises when using:

    • [ ] the official example scripts: (give details below)
    • [x] my own modified scripts: (give details below)

    The tasks I am working on is:

    • [ ] an official GLUE/SQUaD task: (give the name)
    • [x] my own task or dataset: (give details below)

    To reproduce

    Steps to reproduce the behavior:

    import transformers
    model = transformers.AutoAdapterModel.from_pretrained('google/flan-t5-base', device_map='auto')
    

    Result:

    ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
    │ /home/user/scratch/test-2023-01-07.py:2 in <module>                                              │
    │                                                                                                  │
    │   1 import transformers                                                                          │
    │ ❱ 2 model = transformers.AutoAdapterModel.from_pretrained('google/flan-t5-base', device_map=     │
    │   3                                                                                              │
    │                                                                                                  │
    │ /home/user/.local/lib/python3.9/site-packages/transformers/models/auto/auto_factory.py:446 in    │
    │ from_pretrained                                                                                  │
    │                                                                                                  │
    │   443 │   │   │   return model_class.from_pretrained(pretrained_model_name_or_path, *model_arg   │
    │   444 │   │   elif type(config) in cls._model_mapping.keys():                                    │
    │   445 │   │   │   model_class = _get_model_class(config, cls._model_mapping)                     │
    │ ❱ 446 │   │   │   return model_class.from_pretrained(pretrained_model_name_or_path, *model_arg   │
    │   447 │   │   raise ValueError(                                                                  │
    │   448 │   │   │   f"Unrecognized configuration class {config.__class__} for this kind of AutoM   │
    │   449 │   │   │   f"Model type should be one of {', '.join(c.__name__ for c in cls._model_mapp   │
    │                                                                                                  │
    │ /home/user/.local/lib/python3.9/site-packages/transformers/modeling_utils.py:2121 in             │
    │ from_pretrained                                                                                  │
    │                                                                                                  │
    │   2118 │   │   │   no_split_modules = model._no_split_modules                                    │
    │   2119 │   │   │   # Make sure tied weights are tied before creating the device map.             │
    │   2120 │   │   │   model.tie_weights()                                                           │
    │ ❱ 2121 │   │   │   device_map = infer_auto_device_map(                                           │
    │   2122 │   │   │   │   model, no_split_module_classes=no_split_modules, dtype=torch_dtype, max_  │
    │   2123 │   │   │   )                                                                             │
    │   2124                                                                                           │
    │                                                                                                  │
    │ /shared/src/accelerate/src/accelerate/utils/modeling.py:545 in infer_auto_device_map             │
    │                                                                                                  │
    │   542 │   │   elif tied_param is not None:                                                       │
    │   543 │   │   │   # Determine the sized occupied by this module + the module containing the ti   │
    │   544 │   │   │   tied_module_size = module_size                                                 │
    │ ❱ 545 │   │   │   tied_module_index = [i for i, (n, _) in enumerate(modules_to_treat) if n in    │
    │   546 │   │   │   tied_module_name, tied_module = modules_to_treat[tied_module_index]            │
    │   547 │   │   │   tied_module_size += module_sizes[tied_module_name] - module_sizes[tied_param   │
    │   548 │   │   │   if current_max_size is not None and current_memory_used + tied_module_size >   │
    ╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
    IndexError: list index out of range
    

    Expected behavior

    No crash. Ability to tie weights with seq2seq lm_head.

    bug 
    opened by xloem 0
  • Fusing task-specific and task-agnostic adapters

    Fusing task-specific and task-agnostic adapters

    Environment info

    • adapter-transformers version: 3.1.0
    • Platform: Linux-4.18.0-425.3.1.el8.x86_64-x86_64-with-glibc2.17
    • Python version: 3.8.11
    • Huggingface_hub version: 0.11.1
    • PyTorch version (GPU?): 1.12.1 (False)
    • Tensorflow version (GPU?): not installed (NA)
    • Flax version (CPU?/GPU?/TPU?): not installed (NA)
    • Jax version: not installed
    • JaxLib version: not installed
    • Using GPU in script?: yes
    • Using distributed or parallel set-up in script?: no

    Details

    Hi, I am trying to combine task-specific and task-agnostic adapters. Assume I have three tasks Task-A, Task-B, and, Task-C. I will add task-specific adapters and task-agnostic adapters as follows

    import transformers.adapters.composition as ac
    
    model.add_adapter("TASK-A")
    model.add_adapter("TASK-B")
    model.add_adapter("TASK-C")
    
    model.add_adapter("TASK-Agnostic")
    

    Now I want to fuse the task-specific adapter and task-agnostic adapter dynamically i.e, depending on what the task is.

    Should I fuse the adapters as follows?

    model.add_adapter_fusion(["TASK-A", "TASK-Agnostic"])
    model.add_adapter_fusion(["TASK-B", "TASK-Agnostic"])
    model.add_adapter_fusion(["TASK-C", "TASK-Agnostic"])
    

    Inside the forward_pass of Trainer, I will set the active adapters as follows

    task_name = get_task_name()
    model.active_adapters = ac.Fuse(task_name, "TASK-Agnostic")
    

    Is this the right way to implement this?

    Thanks

    question 
    opened by murthyrudra 0
  • Stacking two parallel composition blocks

    Stacking two parallel composition blocks

    Hi,

    Can I stack two Parallel composition blocks like this? ac.Stack(ac.Parallel('a', 'b'), ac.Parapllel('c', 'd'))

    I found that the inputs will only be replicated once, but should be twice. Could you help me fix it?

    Thanks!

    question 
    opened by HZQ950419 0
  • Add adapter to AutoModelForSequenceClassification model

    Add adapter to AutoModelForSequenceClassification model

    Environment info

    • adapter-transformers version: newest
    • Platform: Azure ML
    • Python version: 3.8
    • PyTorch version (GPU?):

    Details

    I try to use AutoModelForSequenceClassification model (using BART). The document is not so clear so I just load it directly and add adapter(LoRA) to it. When I run the trainer, I got the following errors

    RestException: INVALID_PARAMETER_VALUE: Response: {'Error': {'Code': 'ValidationError', 'Severity': None, 'Message': 'No more than 255 characters per params Value. Request contains 1 of greater length.', 'MessageFormat': None, 'MessageParameters': None, 'ReferenceCode': None, 'DetailsUri': None, 'Target': None, 'Details': [], 'InnerError': None, 'DebugInfo': None, 'AdditionalInfo': None}, 'Correlation': {'operation': '04d45ce3752c5e51c54e71f3950411ca', 'request': '6d216d8faea19d26'}, 'Environment': 'westus', 'Location': 'westus', 'Time': '2023-01-04T17:45:03.5650777+00:00', 'ComponentName': 'mlflow', 'error_code': 'INVALID_PARAMETER_VALUE'}

    Any ideas on how to solve it?

    question 
    opened by andyzengmath 0
  • Support for openai Whisper

    Support for openai Whisper

    🌟 New adapter setup

    Support for openai Whisper

    Add adapter integration for whisper.

    Open source status

    • [x] the model implementation is available: official code hf
    • [x] the model weights are available: hf
    • [x] who are the authors: @jongwook @ArthurZucker @sgugger
    enhancement 
    opened by karynaur 0
  • Add adapter configuration strings & restructure adapter method docs

    Add adapter configuration strings & restructure adapter method docs

    Configuration strings

    This PR adds the possibility to use flexible adapter configuration strings which allow specifying custom config attributes. Examples:

    • Set config attributes: model.add_adapter("name", config="parallel[reduction_factor=2]")
    • Config union model.add_adapter("name", config="prefix_tuning|parallel")
    • more examples: https://github.com/calpt/adapter-transformers/blob/8df62b9de2a8ab51115b191aca35b2fb53c96539/tests_adapters/test_adapter_config.py#L95-L102

    Documentation: https://github.com/calpt/adapter-transformers/blob/8df62b9de2a8ab51115b191aca35b2fb53c96539/adapter_docs/overview.md

    Configuration strings can allow passing complex configurations e.g. via command line.

    Documentation restructuring

    The adapter method documentation is now split into three pages:

    • Overview and Configuration: introduction, table, configuration
    • Adapter Methods
    • Method Combinations
    opened by calpt 0
Releases(adapters3.1.0)
  • adapters3.1.0(Sep 15, 2022)

    Based on transformers v4.21.3

    New

    New adapter methods

    New model integrations

    • Add Deberta and DebertaV2 integration(@hSterz via #340)
    • Add Vision Transformer integration (@calpt via #363)

    Misc

    • Add adapter_summary() method (@calpt via #371): More info
    • Return AdapterFusion attentions using output_adapter_fusion_attentions argument (@calpt via #417): Documentation

    Changed

    • Upgrade of underlying transformers version (@calpt via #344, #368, #404)

    Fixed

    • Infer label names for training for flex head models (@calpt via #367)
    • Ensure root dir exists when saving all adapters/heads/fusions (@calpt via #375)
    • Avoid attempting to set prediction head if non-existent (@calpt via #377)
    • Fix T5EncoderModel adapter integration (@calpt via #376)
    • Fix loading adapters together with full model (@calpt via #378)
    • Multi-gpu support for prefix-tuning (@alexanderhanboli via #359)
    • Fix issues with embedding training (@calpt via #386)
    • Fix initialization of added embeddings (@calpt via #402)
    • Fix model serialization using torch.save() & torch.load() (@calpt via #406)
    Source code(tar.gz)
    Source code(zip)
  • adapters3.0.1(May 18, 2022)

    Based on transformers v4.17.0

    New

    • Support float reduction factors in bottleneck adapter configs (@calpt via #339)

    Fixed

    • [AdapterTrainer] add missing preprocess_logits_for_metrics argument (@stefan-it via #317)
    • Fix save_all_adapters such that with_head is not ignored (@hSterz via #325)
    • Fix inferring batch size for prefix tuning (@calpt via #335)
    • Fix bug when using compacters with AdapterSetup context (@calpt via #328)
    • [Trainer] Fix issue with AdapterFusion and load_best_model_at_end (@calpt via #341)
    • Fix generation with GPT-2, T5 and Prefix Tuning (@calpt via #343)
    Source code(tar.gz)
    Source code(zip)
  • adapters3.0.0(Mar 23, 2022)

    Based on transformers v4.17.0

    New

    Efficient Fine-Tuning Methods

    • Add Prefix Tuning (@calpt via #292)
    • Add Parallel adapters & Mix-and-Match adapter (@calpt via #292)
    • Add Compacter (@hSterz via #297)

    Misc

    • Introduce XAdapterModel classes as central & recommended model classes (@calpt via #289)
    • Introduce ConfigUnion class for flexible combination of adapter configs (@calpt via #292)
    • Add AdapterSetup context manager to replace adapter_names parameter (@calpt via #257)
    • Add ForwardContext to wrap model forward pass with adapters (@calpt via #267, #295)
    • Search all remote sources when passing source=None (new default) to load_adapter() (@calpt via #309)

    Changed

    • Deprecate XModelWithHeads in favor of XAdapterModel (@calpt via #289)
    • Refactored adapter integration into model classes and model configs (@calpt via #263, #304)
    • Rename activation functions to match Transformers' names (@hSterz via #298)
    • Upgrade of underlying transformers version (@calpt via #311)

    Fixed

    • Fix seq2seq generation with flexible heads classes (@calpt via #275, @hSterz via #285)
    • Parallel composition for XLM-Roberta (@calpt via #305)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.3.0(Feb 9, 2022)

    Based on transformers v4.12.5

    New

    • Allow adding, loading & training of model embeddings (@hSterz via #245). See https://docs.adapterhub.ml/embeddings.html.

    Changed

    • Unify built-in & custom head implementation (@hSterz via #252)
    • Upgrade of underlying transformers version (@calpt via #255)

    Fixed

    • Fix documentation and consistency issues for AdapterFusion methods (@calpt via #259)
    • Fix serialization/ deserialization issues with custom adapter config classes (@calpt via #253)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.2.0(Oct 14, 2021)

    Based on transformers v4.11.3

    New

    Model support

    • T5 adapter implementation (@AmirAktify & @hSterz via #182)
    • EncoderDecoderModel adapter implementation (@calpt via #222)

    Prediction heads

    • AutoModelWithHeads prediction heads for language modeling (@calpt via #210)
    • AutoModelWithHeads prediction head & training example for dependency parsing (@calpt via #208)

    Training

    • Add a new AdapterTrainer for training adapters (@hSterz via #218, #241 )
    • Enable training of Parallel block (@hSterz via #226)

    Misc

    • Add get_adapter_info() method (@calpt via #220)
    • Add set_active argument to add & load adapter/fusion/head methods (@calpt via #214)
    • Minor improvements for adapter card creation for HF Hub upload (@calpt via #225)

    Changed

    • Upgrade of underlying transformers version (@calpt via #232, #234, #239 )
    • Allow multiple AdapterFusion configs per model; remove set_adapter_fusion_config() (@calpt via #216)

    Fixed

    • Incorrect referencing between adapter layer and layer norm for DataParallel (@calpt via #228)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.1.0(Jul 8, 2021)

    Based on transformers v4.8.2

    New

    Integration into HuggingFace's Model Hub

    • Add support for loading adapters from HuggingFace Model Hub (@calpt via #162)
    • Add method to push adapters to HuggingFace Model Hub (@calpt via #197)
    • Learn more

    BatchSplit adapter composition

    • BatchSplit composition block for adapters and heads (@hSterz via #177)
    • Learn more

    Various new features

    • Add automatic conversion of static heads when loaded via XModelWithHeads (@calpt via #181) Learn more
    • Add list_adapters() method to search for adapters (@calpt via #193) Learn more
    • Add delete_adapter(), delete_adapter_fusion() and delete_head() methods (@calpt via #189)
    • MAD-X 2.0 WikiAnn NER notebook (@hSterz via #187)
    • Upgrade of underlying transformers version (@hSterz via #183, @calpt via #194 & #200)

    Changed

    • Deprecate add_fusion() and train_fusion() in favor of add_adapter_fusion() and train_adapter_fusion() (@calpt via #190)

    Fixed

    • Suppress no-adapter warning when adapter_names is given (@calpt via #186)
    • leave_out in load_adapter() when loading language adapters from Hub (@hSterz via #177)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.0.1(May 28, 2021)

    Based on transformers v4.5.1

    New

    • Allow different reduction factors for different adapter layers (@hSterz via #161)
    • Allow dynamic dropping of adapter layers in load_adapter() (@calpt via #172)
    • Add method get_adapter() to retrieve weights of an adapter (@hSterz via #169)

    Changed

    • Re-add adapter_names argument to model forward() methods (@calpt via #176)

    Fixed

    • Fix resolving of adapter from Hub when multiple options available (@Aaronsom via #164)
    • Fix & improve adapter saving/ loading using Trainer class (@calpt via #178)
    Source code(tar.gz)
    Source code(zip)
  • adapters2.0.0(Apr 29, 2021)

    Based on transformers v4.5.1

    All major new features & changes are described at https://docs.adapterhub.ml/v2_transition.

    • all changes merged via #105

    Additional changes & Fixes

    • Support loading adapters with load_best_model_at_end in Trainer (@calpt via #122)
    • Add setter for active_adapters property (@calpt via #132)
    • New notebooks for NER, text generation & AdapterDrop (@hSterz via #135)
    • Enable trainer to load adapters from checkpoints (@hSterz via #138)
    • Update & clean up example scripts (@hSterz via #154 & @calpt via #141, #155)
    • Add unfreeze_adapters param to train_fusion() (@calpt via #156)
    • Ensure eval/ train mode is correct for AdapterFusion (@calpt via #157)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.1.1(Jan 14, 2021)

    Based on transformers v3.5.1

    New

    • Modular & custom prediction heads for flex head models (@hSterz via #88)

    Fixed

    • Fixes for DistilBERT layer norm and AdapterFusion (@calpt via #102)
    • Fix for reloading full models with AdapterFusion (@calpt via #110)
    • Fix attention and logits output for flex head models (@calpt via #103 & #111)
    • Fix loss output of flex model with QA head (@hSterz via #88)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.1.0(Nov 30, 2020)

    Based on transformers v3.5.1

    New

    • New model with adapter support: DistilBERT (@calpt via #67)
    • Save label->id mapping of the task together with the adapter prediction head (@hSterz via #75)
    • Automatically set matching label->id mapping together with active prediction head (@hSterz via #81)
    • Upgraded underlying transformers version (@calpt via #55, #72 and #85)
    • Colab notebook tutorials showcasing all AdapterHub concepts (@calpt via #89)

    Fixed

    • Support for models with flexible heads in pipelines (@calpt via #80)
    • Adapt input to models with flexible heads to static prediction heads input (@calpt via #90)
    Source code(tar.gz)
    Source code(zip)
  • adapters1.0.1(Oct 6, 2020)

    Based on transformers v2.11.0

    New

    • Adds squad-style QA prediction head to flex-head models

    Bug fixes

    • Fixes loading and saving of adapter config in model.save_pretrained()
    • Fixes parsing of adapter names in fusion setup
    Source code(tar.gz)
    Source code(zip)
  • adapters1.0(Sep 9, 2020)

KoBART model on huggingface transformers

KoBART-Transformers SKT에서 공개한 KoBART를 편리하게 사용할 수 있게 transformers로 포팅하였습니다. Install (Optional) BartModel과 PreTrainedTokenizerFast를 이용하면 설치하실 필요 없습니다. p

Hyunwoong Ko 58 Dec 7, 2022
:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

deepset 6.4k Jan 9, 2023
:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

deepset 1.4k Feb 18, 2021
A deep learning-based translation library built on Huggingface transformers

DL Translate A deep learning-based translation library built on Huggingface transformers and Facebook's mBART-Large ?? GitHub Repository ?? Documentat

Xing Han Lu 244 Dec 30, 2022
Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 Billion Parameters) on a single 16 GB VRAM V100 Google Cloud instance with Huggingfa

null 289 Jan 6, 2023
Code for lyric-section-to-comment generation based on huggingface transformers.

CommentGeneration Code for lyric-section-to-comment generation based on huggingface transformers. Migrate Guyu model and code (both 12-layers and 24-l

Yawei Sun 8 Sep 4, 2021
Partially offline multi-language translator built upon Huggingface transformers.

Translate Command-line interface to translation pipelines, powered by Huggingface transformers. This tool can download translation models, and then us

Richard Jarry 8 Oct 25, 2022
Label data using HuggingFace's transformers and automatically get a prediction service

Label Studio for Hugging Face's Transformers Website • Docs • Twitter • Join Slack Community Transfer learning for NLP models by annotating your textu

Heartex 135 Dec 29, 2022
Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET

Training COMET using seq2seq setting Use AutoModelForSeq2SeqLM in Huggingface Transformers to train COMET. The codes are modified from run_summarizati

tqfang 9 Dec 17, 2022
Client library to download and publish models and other files on the huggingface.co hub

huggingface_hub Client library to download and publish models and other files on the huggingface.co hub Do you have an open source ML library? We're l

Hugging Face 644 Jan 1, 2023
Incorporating KenLM language model with HuggingFace implementation of Wav2Vec2CTC Model using beam search decoding

Wav2Vec2CTC With KenLM Using KenLM ARPA language model with beam search to decode audio files and show the most probable transcription. Assuming you'v

farisalasmary 65 Sep 21, 2022
NLP codes implemented with Pytorch (w/o library such as huggingface)

NLP_scratch NLP codes implemented with Pytorch (w/o library such as huggingface) scripts ├── models: Neural Network models ├── data: codes for dataloa

null 3 Dec 28, 2021
Train BPE with fastBPE, and load to Huggingface Tokenizer.

BPEer Train BPE with fastBPE, and load to Huggingface Tokenizer. Description The BPETrainer of Huggingface consumes a lot of memory when I am training

Lizhuo 1 Dec 23, 2021
This is a general repo that helps you develop fast/effective NLP classifiers using Huggingface

NLP Classifier Introduction This project trains a bert model on any NLP classifcation model. And uses the model in make predictions on new data using

Abdullah Tarek 3 Mar 11, 2022
HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

HuggingSound HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools. I have no intention of building a very complex tool here.

Jonatas Grosman 247 Dec 26, 2022
Flexible interface for high-performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra. What is Lightning Tran

Pytorch Lightning 581 Dec 21, 2022
Framework for fine-tuning pretrained transformers for Named-Entity Recognition (NER) tasks

NERDA Not only is NERDA a mesmerizing muppet-like character. NERDA is also a python package, that offers a slick easy-to-use interface for fine-tuning

Ekstra Bladet 141 Dec 30, 2022
Big Bird: Transformers for Longer Sequences

BigBird, is a sparse-attention based transformer which extends Transformer based models, such as BERT to much longer sequences. Moreover, BigBird comes along with a theoretical understanding of the capabilities of a complete transformer that the sparse model can handle.

Google Research 457 Dec 23, 2022
🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

Hugging Face 77.3k Jan 3, 2023