CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

XiaoMing

Last update: Aug 19, 2022

Related tags

Deep Learning CRLT

Overview

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

This repository contains the code and relevant instructions of CRLT.

Overview

The goal of CRLT is to provide an out-of-the-box toolkit for contrastive learning. Users only need to provide unlabeled data and edit a configuration file in the format of JSON, and then they can quickly train, use and evaluate representation learning models. CRLT consists of 6 critical modules, including data synthesis, negative sampling, representation encoders, learning paradigm, optimizing strategy and model evaluation. For each module, CRLT provides various popular implementations and therefore different kinds of CL architectures can be easily constructed using CRLT.

Installation

Requirements

First, run the following script to install the relevant dependencies

conda env create -f requirements.yaml

Then, install PyTorch by following the instructions from the official website. Please use the correct 1.10 version corresponding to your platforms/CUDA versions. PyTorch version higher than 1.10 should also work. For example, if you use Linux and CUDA10.2, install PyTorch by the following command,

conda activate crlt
conda install pytorch==1.10.0 cudatoolkit=10.2 -c pytorch

The evaluation code for sentence embeddings is based on a modified version of SentEval. It evaluates sentence embeddings on semantic textual similarity (STS) tasks and downstream transfer tasks. For STS tasks, our evaluation takes the "all" setting, and report Spearman's correlation. See SimCSE for more details.

Before training, please download the relevent datasets by running:

cd utils/SentEval/data/downstream/
bash download.sh

Then, running the command to install the SentEval toolkit:

cd utils/SentEval
python setyp.py install

Getting Started

Data

For unsupervised training, we use sentences from English Wikipedia provided by SimCSE, and the relevant dataset should be download and moved to the data/wiki folder:

Filename	Data Path	Google Drive
wiki1m_for_simcse.csv	data/wiki/	Download
wiki.csv	data/wiki/	Download

When training, CRLT use the dev set of STSB task to evaluate the model, so the used file need to be download to data/STSB folder:

Filename	Data Path	Google Drive
stsb_above_4.csv	data/STSB/	Download

Training

GUI

We provide example training scripts for SimCSE (the unsupervised version) by running:

conda activate crlt
python app.py

After editing the training parameters, users click the RUN button and will get the evaluation result on the same page.

Terminal

Rather than training with the web GUI, users can also train by running:

python main.py examples/simcse.json

Using different types of devices or different versions of CUDA/other softwares may lead to slightly different performance:

STS12	STS13	STS14	STS15	STS16	STSBenchmark	SICKRelatedness	Avg.
71.61	81.99	75.13	81.39	78.78	77.93	69.17	76.57

Bugs or questions?

If you have any questions related to the code or the usage, feel free to email [email protected]. If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Representation

How to Reproduce our Results This repository contains PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Represen

46 Dec 15, 2022

Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

HanCo Dataset & Contrastive Representation Learning for Hand Shape Estimation Code in conjunction with the publication: Contrastive Representation Lea

Computer Vision Group, Albert-Ludwigs-Universität Freiburg

38 Dec 13, 2022

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

2021-CVPR-MvCLN This repo contains the code and data of the following paper accepted by CVPR 2021 Partially View-aligned Representation Learning with

33 Nov 1, 2022

Object-aware Contrastive Learning for Debiased Scene Representation

Object-aware Contrastive Learning Official PyTorch implementation of "Object-aware Contrastive Learning for Debiased Scene Representation" by Sangwoo

43 Dec 14, 2022

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

MERIT A PyTorch implementation of our IJCAI-21 paper Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning. Depen

Graph Analysis & Deep Learning Laboratory, GRAND

32 Jan 2, 2023

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

Related tags

Overview

CRLT: A Unified Contrastive Learning Toolkit for Unsupervised Text Representation Learning

Overview

Installation

Requirements

Getting Started

Data

Training

GUI

Terminal

Bugs or questions?

You might also like...

PyTorch implementation code for the paper MixCo: Mix-up Contrastive Learning for Visual Representation

Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

PyTorch implementation for Partially View-aligned Representation Learning with Noise-robust Contrastive Loss (CVPR 2021)

Object-aware Contrastive Learning for Debiased Scene Representation

A PyTorch implementation of "Multi-Scale Contrastive Siamese Networks for Self-Supervised Graph Representation Learning", IJCAI-21

Object-aware Contrastive Learning for Debiased Scene Representation

Pytorch Implementation of "Contrastive Representation Learning for Exemplar-Guided Paraphrase Generation"

[CVPR 2022 Oral] Rethinking Minimal Sufficient Representation in Contrastive Learning

[CVPR 2022 Oral] Crafting Better Contrastive Views for Siamese Representation Learning

Owner

XiaoMing

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

Unified unsupervised and semi-supervised domain adaptation network for cross-scenario face anti-spoofing, Pattern Recognition

implementation of paper - You Only Learn One Representation: Unified Network for Multiple Tasks

Dense Contrastive Learning (DenseCL) for self-supervised representation learning, CVPR 2021.

SUPERVISED-CONTRASTIVE-LEARNING-FOR-PRE-TRAINED-LANGUAGE-MODEL-FINE-TUNING - The Facebook paper about fine tuning RoBERTa with contrastive loss

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Code for Dual Contrastive Learning for Unsupervised Image-to-Image Translation, NTIRE, CVPRW 2021.

Code for the paper "Unsupervised Contrastive Learning of Sound Event Representations", ICASSP 2021.

CURL: Contrastive Unsupervised Representations for Reinforcement Learning

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations. [2021]