Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Amazon

Last update: Dec 29, 2022

Related tags

Deep Learning trans-encoder

Overview

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Code repo for paper Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations.

Dependencies

torch=1.8.1
transformers=4.9.0
sentence-transformers=2.0.0

Please view `requirements.txt' for more details.

Train

Self-distillation:

>> bash train_self_distill.sh 0

0 denotes GPU device index.

Mutual-distillation (two GPUs needed):

>> bash train_mutual_distill.sh 1,2

Train with your custom corpus:

>> CUDA_VISIBLE_DEVICES=0,1 python src/mutual_distill_parallel.py \
         --batch_size_bi_encoder 128 \
         --batch_size_cross_encoder 64 \
         --num_epochs_bi_encoder 10 \
         --num_epochs_cross_encoder 1 \
         --cycle 3 \
         --bi_encoder1_pooling_mode cls \
         --bi_encoder2_pooling_mode cls \
         --init_with_new_models \
         --task custom \
         --random_seed 2021 \
         --custom_corpus_path CORPUS_PATH

CORPUS_PATH should point to your custom corpus in which every line should be a sentence pair in the form of sent1||sent2.

Evaluate

>> python src/eval.py

Authors

Fangyu Liu: Main contributor

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Comments

Need to fine-tune pretained models?
Hi there, I find this work very interesting, and I was trying to replicate your results using the models you've shared on Huggingface. The bi-encoder models are behaving as expected; however, the cross-encoders are getting much lower scores than I expect on STS (results in the 30s-40s rather than 70s to 80s), which makes me think I'm missing a step.

Should the Huggingface pretrained models for STS work out of the box, or do I need to fine-tune them on the train set for each STS dataset?

The models at issue are:

trans-encoder-cross-simcse-roberta-base

trans-encoder-cross-simcse-roberta-large

trans-encoder-cross-simcse-bert-large

trans-encoder-cross-simcse-bert-base

Thanks for any advice you can give!
opened by finegan-dollak 2
Update README.md

Issue #, if available:

Description of changes: update ICLR video link

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by hardyqr 0
Update README.md

Issue #, if available:

Description of changes: add amazon.science blog and talk links

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by hardyqr 0
eval bug fix, add option, help info, and readme update

Issue #, if available: Minor bug in eval.py.

Description of changes: Evaluation bug fix, add option help info, and readme update.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by hardyqr 0
Update README.md

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by hardyqr 0
correct a typo

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by hardyqr 0
Update README.md

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by hardyqr 0

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Related tags

Overview

Trans-Encoder: Unsupervised sentence-pair modelling through self- and mutual-distillations

Dependencies

Train

Evaluate

Authors

Security

License

Comments

Need to fine-tune pretained models?

Update README.md

Update README.md

eval bug fix, add option, help info, and readme update

Update README.md

correct a typo

Update README.md

Owner

Amazon

Computational modelling of ray propagation through optical elements using the principles of geometric optics (Ray Tracer)

Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

The code for our paper "NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task —— Next Sentence Prediction"

Official repository for "PAIR: Planning and Iterative Refinement in Pre-trained Transformers for Long Text Generation"

Joint learning of images and text via maximization of mutual information

Using deep actor-critic model to learn best strategies in pair trading

Code for the paper: Learning Adversarially Robust Representations via Worst-Case Mutual Information Maximization (https://arxiv.org/abs/2002.11798)

(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

Code release for The Devil is in the Channels: Mutual-Channel Loss for Fine-Grained Image Classification (TIP 2020)

The implement of papar "Enhanced Graph Learning for Collaborative Filtering via Mutual Information Maximization"

Official PyTorch code for Mutual Affine Network for Spatially Variant Kernel Estimation in Blind Image Super-Resolution (MANet, ICCV2021)

This repository contains the official implementation code of the paper Improving Multimodal Fusion with Hierarchical Mutual Information Maximization for Multimodal Sentiment Analysis, accepted at EMNLP 2021.

Code for our ACL 2021 paper - ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer

A Structured Self-attentive Sentence Embedding

Pre-trained BERT Models for Ancient and Medieval Greek, and associated code for LaTeCH 2021 paper titled - "A Pilot Study for BERT Language Modelling and Morphological Analysis for Ancient and Medieval Greek"

pytorch implementation of "Contrastive Multiview Coding", "Momentum Contrast for Unsupervised Visual Representation Learning", and "Unsupervised Feature Learning via Non-Parametric Instance-level Discrimination"

A python-image-classification web application project, written in Python and served through the Flask Microframework. This Project implements the VGG16 covolutional neural network, through Keras and Tensorflow wrappers, to make predictions on uploaded images.

code for our paper "Source Data-absent Unsupervised Domain Adaptation through Hypothesis Transfer and Labeling Transfer"

Fast, flexible and easy to use probabilistic modelling in Python.