Pytorch-version BERT-flow: One can apply BERT-flow to any PLM within Pytorch framework.

Overview

Pytorch-bertflow

This is an re-implemented version of BERT-flow using Pytorch framework, which can reproduce the results from the original repo. This code is used to reproduce the results in the TSDAE paper.

Usage

Please refer to the simple example ./example.py

python example.py

Note

  • Please shuffle your training data, which makes a huge difference.
  • The pooling function makes a huge difference in some datasets (especially for the ones used in the paper). To reproduce the results, please use 'first-last-avg'.

Contact

Contact person and main contributor: Kexin Wang, [email protected]

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

You might also like...
Simple GUI where you can enter an article and get a crisp summarized version.

Text-Summarization-using-TextRank-BART Simple GUI where you can enter an article and get a crisp summarized version. How to run: Clone the repo Instal

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format
A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated

Create a semantic search engine with a neural network (i.e. BERT) whose knowledge base can be updated. This engine can later be used for downstream tasks in NLP such as Q&A, summarization, generation, and natural language understanding (NLU).

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing πŸŽ‰ πŸŽ‰ πŸŽ‰ We released the 2.0.0 version with TF2 Support. πŸŽ‰ πŸŽ‰ πŸŽ‰ If you

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing πŸŽ‰ πŸŽ‰ πŸŽ‰ We released the 2.0.0 version with TF2 Support. πŸŽ‰ πŸŽ‰ πŸŽ‰ If you

An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations

FantasyBert English | δΈ­ζ–‡ Introduction An easy-to-use framework for BERT models, with trainers, various NLP tasks and detailed annonations. You can imp

A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis

WaveGlow A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis Quick Start: Install requirements: pip install

An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.
An IVR Chatbot which can exponentially reduce the burden of companies as well as can improve the consumer/end user experience.

IVR-Chatbot Achievements πŸ† Team Uhtred won the Maverick 2.0 Bot-a-thon 2021 organized by AbInbev India. ❓ Problem Statement As we all know that, lot

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

"# bpe_algorithm_can_finetune_tokenizer" this is an implyment for https://github

Comments
  • Reproducing the results of bertflow in the original paper

    Reproducing the results of bertflow in the original paper

    Hi, thank you for your great work! I am grateful to your pytorch-bertflow framework and I am using it to reproduce original bertflow experiments. But the result(SRCC on STS-B) is always lower than it is reported in the paper. I guess there are some details I ignore when reproducing. loss: -1.180362 [473600/551204] corrcoef_dev: 0.223951 loss: -1.119098 [480000/551204] corrcoef_dev: 0.223691 loss: -1.132908 [486400/551204] corrcoef_dev: 0.224357 loss: -1.211618 [492800/551204] corrcoef_dev: 0.225161 I choose WNLI as the training set and STS-B as the dev set. As the result shown above, SRCC is about 22, which is quite low.

    opened by fzp0424 1
  • Finetuning all-MiniLM-L6-v2 ValueError

    Finetuning all-MiniLM-L6-v2 ValueError

    Hello thank you for your contribution! I am training to fine-tune the all-MiniLM-L6-v2 (https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) on my data but after the first batch, I get a ValueError and a loss of inf.

    ValueError: Expected value argument (Tensor of shape (4, 192, 1, 1)) to be within the support (Real()) of the distribution Normal(loc: torch.Size([4, 192, 1, 1]), scale: torch.Size([4, 192, 1, 1])), but found invalid values: tensor([[[[nan]],

         [[nan]],
    

    .....

    Here is my very simple script (I just replaced the data and put the training in a loop). The error that I get is:

    
    import pandas as pd
    import numpy as np
    from tflow_utils import TransformerGlow, AdamWeightDecayOptimizer
    from transformers import AutoTokenizer,AutoModel
    
    model_name_or_path = '/tmp/all-MiniLM-L6-v2'
    bertflow = TransformerGlow(model_name_or_path, pooling='mean')  # pooling could be 'mean', 'max', 'cls' or 'first-last-avg' (mean pooling over the first and the last layers)
    tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
    no_decay = ["bias", "LayerNorm.weight"]
    optimizer_grouped_parameters= [
        {
            "params": [p for n, p in bertflow.glow.named_parameters()  \
                            if not any(nd in n for nd in no_decay)],  # Note only the parameters within bertflow.glow will be updated and the Transformer will be freezed during training.
            "weight_decay": 0.01,
        },
        {
            "params": [p for n, p in bertflow.glow.named_parameters()  \
                            if any(nd in n for nd in no_decay)],
            "weight_decay": 0.0,
        },
    ]
    optimizer = AdamWeightDecayOptimizer(
        params=optimizer_grouped_parameters, 
        lr=1e-5, 
        eps=1e-6,
    )
    # Important: Remember to shuffle your training data!!! This makes a huge difference!!!
    
    np.random.seed(0)
    df = pd.read_csv("data/classification/data_small.csv")
    data = df.text.to_list().copy()
    np.random.shuffle(data)
    
    
    bertflow.train()
    batch_size = 4
    nb_batch = int(np.ceil(len(data) / batch_size))
    print(nb_batch)
    for batch_id in range(nb_batch):
        batch = data[batch_id*batch_size:(batch_id+1)*batch_size]
        model_inputs = tokenizer(
            batch,
            add_special_tokens=True,
            return_tensors='pt',
            max_length=256,
            padding='longest',
            truncation=True
        )
        z, loss = bertflow(model_inputs['input_ids'], model_inputs['attention_mask'], return_loss=True)  # Here z is the sentence embedding
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        print(batch_id, loss)
    
    

    Do you have any ideas where this could come from ? I have tried different learning rates but it doesn't solve the problem

    opened by GriesserP 2
Owner
Ubiquitous Knowledge Processing Lab
Ubiquitous Knowledge Processing Lab
Khandakar Muhtasim Ferdous Ruhan 1 Dec 30, 2021
Pytorch version of BERT-whitening

BERT-whitening This is the Pytorch implementation of "Whitening Sentence Representations for Better Semantics and Faster Retrieval". BERT-whitening is

Weijie Liu 255 Dec 27, 2022
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr

Max Woolf 4.8k Dec 30, 2022
Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr

Max Woolf 4.3k Feb 18, 2021
LV-BERT: Exploiting Layer Variety for BERT (Findings of ACL 2021)

LV-BERT Introduction In this repo, we introduce LV-BERT by exploiting layer variety for BERT. For detailed description and experimental results, pleas

Weihao Yu 14 Aug 24, 2022
VD-BERT: A Unified Vision and Dialog Transformer with BERT

VD-BERT: A Unified Vision and Dialog Transformer with BERT PyTorch Code for the following paper at EMNLP2020: Title: VD-BERT: A Unified Vision and Dia

Salesforce 44 Nov 1, 2022
PRAnCER is a web platform that enables the rapid annotation of medical terms within clinical notes.

PRAnCER (Platform enabling Rapid Annotation for Clinical Entity Recognition) is a web platform that enables the rapid annotation of medical terms within clinical notes. A user can highlight spans of text and quickly map them to concepts in large vocabularies within a single, intuitive platform.

Sontag Lab 39 Nov 14, 2022
Official source for spanish Language Models and resources made @ BSC-TEMU within the "Plan de las TecnologΓ­as del Lenguaje" (Plan-TL).

Spanish Language Models ???? Corpora ?? Corpora Number of documents Size (GB) BNE 201,080,084 570GB Models ?? RoBERTa-base BNE: https://huggingface.co

PlanTL-SANIDAD 203 Dec 20, 2022