Supervised Contrastive Learning for Downstream Optimized Sequence Representations

Overview

PyPI license arXiv

SupCL-Seq πŸ“–

Supervised Contrastive Learning for Downstream Optimized Sequence representations (SupCS-Seq) accepted to be published in EMNLP 2021, extends the supervised contrastive learning from computer vision to the optimization of sequence representations in NLP. By altering the dropout mask probability in standard Transformer architectures (e.g. BERT_base), for every representation (anchor), we generate augmented altered views. A supervised contrastive loss is then utilized to maximize the system’s capability of pulling together similar samples (e.g. anchors and their altered views) and pushing apart the samples belonging to the other classes. Despite its simplicity, SupCL-Seq leads to large gains in many sequence classification tasks on the GLUE benchmark compared to a standard BERT_base, including 6% absolute improvement on CoLA, 5.4% on MRPC, 4.7% on RTE and 2.6% on STS-B.

This package can be easily run on almost all of the transformer models in Huggingface πŸ€— that contain an encoder including but not limited to:

  1. ALBERT
  2. BERT
  3. BigBird
  4. RoBerta
  5. ERNIE
  6. And many more models!

SupCL-Seq

Table of Contents

GLUE Benchmark BERT SupCL-SEQ

Installation

Usage

Run on GLUE

How to Cite

References

GLUE Benchmark BERT SupCL-SEQ

The table below reports the improvements over naive finetuning of BERT model on GLUE benchmark. We employed [CLS] token during training and expect that using the mean would further improve these results.

Glue

Installation

  1. First you need to install one of, or both, TensorFlow 2.0 and PyTorch. Please refer to TensorFlow installation page, PyTorch installation page and/or Flax installation page regarding the specific install command for your platform.

  2. Second step:

$ pip install SupCL-Seq

Usage

The package builds on the trainer from Huggingface πŸ€— . Therefore, its use is exactly similar to trainer. The pipeline works as follows:

  1. First employ supervised contrastive learning to constratively optimize sentence embeddings using your annotated data.
from SupCL_Seq import SupCsTrainer

SupCL_trainer = SupCsTrainer.SupCsTrainer(
            w_drop_out=[0.0,0.05,0.2],      # Number of views and their associated mask drop-out probabilities [Optional]
            temperature= 0.05,              # Temeprature for the contrastive loss function [Optional]
            def_drop_out=0.1,               # Default drop out of the transformer, this is usually 0.1 [Optional]
            pooling_strategy='mean',        # Strategy used to extract embeddings can be from `mean` or `pooling` [Optional]
            model = model,                  # model
            args = CL_args,                 # Arguments from `TrainingArguments` [Optional]
            train_dataset=train_dataset,    # Train dataloader
            tokenizer=tokenizer,            # Tokenizer
            compute_metrics=compute_metrics # If you need a customized evaluation [Optional]
        )
  1. After contrastive training:

    2.1 Add a linear classification layer to your model

    2.2 Freeze the base layer

    2.3 Finetune the linear layer on your annotated data

For detailed implementation see glue.ipynb

Run on GLUE

In order to evaluate the method on GLUE benchmark please see the glue.ipynb

How to Cite

@misc{sedghamiz2021supclseq,
      title={SupCL-Seq: Supervised Contrastive Learning for Downstream Optimized Sequence Representations}, 
      author={Hooman Sedghamiz and Shivam Raval and Enrico Santus and Tuka Alhanai and Mohammad Ghassemi},
      year={2021},
      eprint={2109.07424},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

References

[1] Supervised Contrastive Learning

[2] SimCSE: Simple Contrastive Learning of Sentence Embeddings

You might also like...
An implementation of a sequence to sequence neural network using an encoder-decoder
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Sequence lineage information extracted from RKI sequence data repo
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Code for the paper
Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

Unsupervised Contrastive Learning of Sound Event Representations This repository contains the code for the following paper. If you use this code or pa

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

CURL Rainbow Status: Archive (code is provided as-is, no updates expected) This is an implementation of CURL: Contrastive Unsupervised Representations

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]
Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations This repo contains the Pytorch implementation of our paper: Revisit

[ICCV'21] Official implementation for the paper  Social NCE: Contrastive Learning of Socially-aware Motion Representations
[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

CrowdNav with Social-NCE This is an official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations by

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

PyTorch implementation of SimCLR: A Simple Framework for Contrastive Learning of Visual Representations

CLASP - Contrastive Language-Aminoacid Sequence Pretraining

CLASP - Contrastive Language-Aminoacid Sequence Pretraining Repository for creating models pretrained on language and aminoacid sequences similar to C

Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)
Tensorflow implementation of Semi-supervised Sequence Learning (https://arxiv.org/abs/1511.01432)

Transfer Learning for Text Classification with Tensorflow Tensorflow implementation of Semi-supervised Sequence Learning(https://arxiv.org/abs/1511.01

Comments
  • Error when I run the code in glue.ipynb

    Error when I run the code in glue.ipynb

    Firstly, you have done a great job by adapting supervised contrastive loss to NLP problem. I have gone through the paper and its nicely written. [1] I tried to execute the code given in "glue.ipynb" notebook. I got this error

    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    <ipython-input-19-a357322971c2> in <module>()
    ----> 1 SupCL_trainer.train()
          2 SupCL_trainer.save_model('./cs_baseline')
    
    5 frames
    <ipython-input-17-dcaab5467474> in compute_loss(self, model, inputs, return_outputs)
         87         loss = loss_fn(logits, labels) # added rounding for stsb
         88 
    ---> 89         return (loss, outputs) if return_outputs else loss
         90 
         91     def set_dropout_mf(
    
    NameError: name 'outputs' is not defined
    

    [2] I copied the code from "SupCsTrainer.py" into "glue.ipynb" notebook and then replaced outputs with output. When I ran the code with evaluation_strategy=no, I didn't get any error. But when I ran the code with evaluation_strategy=epoch, I got the error

    <__array_function__ internals> in argmax(*args, **kwargs)
    
    /usr/local/lib/python3.7/dist-packages/numpy/core/_asarray.py in asarray(a, dtype, order)
         81 
         82     """
    ---> 83     return array(a, dtype, copy=False, order=order)
         84 
         85 
    
    ValueError: could not broadcast input array from shape (1043,36,768) into shape (1043)
    
    opened by nlp-max16 0
Owner
Hooman Sedghamiz
Data Science Lead interested in ML/AI and algorithm development for healthcare challenges.
Hooman Sedghamiz
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning ?? ?? ?? Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab 398 Dec 30, 2022
Saeed Lotfi 28 Dec 12, 2022
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Sequence-to-Sequence learning using PyTorch

Seq2Seq in PyTorch This is a complete suite for training sequence-to-sequence models in PyTorch. It consists of several models and code to both train

Elad Hoffer 514 Nov 17, 2022
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

OFA Sys 1.4k Jan 8, 2023
Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Segmentation Transformer Implementation of Segmentation Transformer in PyTorch, a new model to achieve SOTA in semantic segmentation while using trans

Abhay Gupta 161 Dec 8, 2022
Implementation of SETR model, Original paper: Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.

SETR - Pytorch Since the original paper (Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers.) has no official

zhaohu xing 112 Dec 16, 2022
[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

[CVPR 2021] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Fudan Zhang Vision Group 897 Jan 5, 2023
Sequence to Sequence Models with PyTorch

Sequence to Sequence models with PyTorch This repository contains implementations of Sequence to Sequence (Seq2Seq) models in PyTorch At present it ha

Sandeep Subramanian 708 Dec 19, 2022
Pervasive Attention: 2D Convolutional Networks for Sequence-to-Sequence Prediction

This is a fork of Fairseq(-py) with implementations of the following models: Pervasive Attention - 2D Convolutional Neural Networks for Sequence-to-Se

Maha 490 Dec 15, 2022