Repository for XLM-T, a framework for evaluating multilingual language models on Twitter data

Related tags

Deep Learning xlm-t
Overview

This is the XLM-T repository, which includes data, code and pre-trained multilingual language models for Twitter.

XLM-T - A Multilingual Language Model Toolkit for Twitter

As explained in the reference paper, we make start from XLM-Roberta base and continue pre-training on a large corpus of Twitter in multiple languages. This masked language model, which we named twitter-xlm-roberta-base in the 🤗 Huggingface hub, can be downloaded from here.

Note: This Twitter-specific pretrained LM was pretrained following a similar strategy to its English-only counterpart, which was introduced as part of the TweetEval framework, and available here.

We also provide task-specific models based on the Adapter technique, fine-tuned for cross-lingual sentiment analysis (See #2):

1 - Code

We include code with various functionalities to complement this release. We provide examples for, among others, feature extraction and adapter-based inference with language models in this notebook. Also with examples for training and evaluating language models on multiple tweet classification tasks, compatible with UMSAB (see #2) and TweetEval datasets.

Perform inference with Huggingface's pipelines

Using Huggingface's pipelines, obtaining predictions is as easy as:

from transformers import pipeline
model_path = "cardiffnlp/twitter-xlm-roberta-base-sentiment"
sentiment_task = pipeline("sentiment-analysis", model=model_path, tokenizer=model_path)
sentiment_task("Huggingface es lo mejor! Awesome library 🤗😎")
[{'label': 'Positive', 'score': 0.9343640804290771}]

Fine-tune xlm-t with adapters

You can fine-tune an adapter built on top of your language model of choice by running the src/adapter_finetuning.py script, for example:

python3 src/adapter_finetuning.py --language spanish --model cardfiffnlp/twitter-xlm-roberta-base --seed 1 --lr 0.0001 --max_epochs 20

Notebooks

For quick prototyping, you can direclty use the Colab notebooks we provide below:

Notebook Description Colab Link
01: Playgroud examples Minimal start examples Open In Colab
02: Extract embeddings Extract embeddings from tweets Open In Colab
03: Sentiment prediction Predict sentiment Open In Colab
04: Fine-tuning Fine-tune a model on custom data Open In Colab

2 - UMSAB, the Unified Multilingual Sentiment Analysis Benchmark

As part of our framework, we also release a unified benchmark for cross-lingual sentiment analysis for eight different languages. All datasets are framed as tweet classification with three labels (positive, negative and neutral). The languages included in the benchmark, as well as the datasets they are based on, are: Arabic (SemEval-2017, Rosenthal et al. 2017), English (SemEval-17, Rosenthal et al. 2017), French (Deft-2017, Benamara et al. 2017), German (SB-10K, Cieliebak et al. 2017), Hindi (SAIL 2015, Patra et al. 2015), Italian (Sentipolc-2016, Barbieri et al. 2016), Portuguese (SentiBR, Brum and Nunes, 2017) and Spanish (Intertass 2017, Díaz Galiano et al. 2018). The format for each dataset follows that of TweetEval with one line per tweet and label per line.

UMSAB Results / Leaderboard

The following results (Macro F1 reported) correspond to XLM-R (Conneau et al. 2020) and XLM-Tw, the same model retrained on Twitter as explained in the reference paper. The two settings are monolingual (trained and tested in the same language) and multilingual (considering all languages for training). Check the reference paper for more details on the setting and the metrics.

FT Mono XLM-R Mono XLM-Tw Mono XLM-R Multi XLM-Tw Multi
Arabic 46.0 63.6 67.7 64.3 66.9
English 50.9 68.2 66.9 68.5 70.6
French 54.8 72.0 68.2 70.5 71.2
German 59.6 73.6 76.1 72.8 77.3
Hindi 37.1 36.6 40.3 53.4 56.4
Italian 54.7 71.5 70.9 68.6 69.1
Portuguese 55.1 67.1 76.0 69.8 75.4
Spanish 50.1 65.9 68.5 66.0 67.9
All lang. 51.0 64.8 66.8 66.8 69.4

If you would like to have your results added to the leaderboard you can either submit a pull request or send an email to any of the paper authors with results and the predictions of your model. Please also submit a reference to a paper describing your approach.

Evaluating your system

For evaluating your system according to Macro-F1, you simply need an individual prediction file for each of the languages. The format of the predictions file should be the same as the output examples in the predictions folder (one output label per line as per the original test file) and the files should be named language.txt (e.g. arabic.txt or all.txt if evaluating all languages at once). The predictions included as an example in this repo correspond to xlm-t trained and evaluated on all languages (All lang.).

Example usage

python src/evaluation_script.py

The script takes as input a set of test labels and the predictions from the "predictions" folder by default, but you can set this to suit your needs as optional arguments.

Optional arguments

Three optional arguments can be modified:

--gold_path: Path to gold datasets. Default: ./data/sentiment

--predictions_path: Path to predictions directory. Default: ./predictions/sentiment

--language: Language to evaluate (arabic, english ... or all). Default: all

Evaluation script sample usage from the terminal with parameters:

python src/evaluation_script.py --gold_path ./data/sentiment --predictions_path ./predictions/sentiment --language arabic

(this script would output the results for the Arabic dataset only)

Reference paper

If you use this repository in your research, please use the following bib entry to cite the reference paper.

@inproceedings{barbieri2021xlmtwitter,
  title={{A Multilingual Language Model Toolkit for Twitter}},
  author={Barbieri, Francesco and Espinosa-Anke, Luis and Camacho-Collados, Jose},
  booktitle={arXiv preprint arXiv:2104.12250},
  year={2021}
}

If using UMSAB, please also cite their corresponding datasets.

License

This repository is released open-source but but restrictions may apply to individual datasets (which are derived from existing data) or Twitter (main data source). We refer users to the original licenses accompanying each dataset and Twitter regulations.

Comments
  • ValueError: You have to specify either input_ids or inputs_embeds

    ValueError: You have to specify either input_ids or inputs_embeds

    ubuntu16.04 adapter-transformers==1.1.1 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 465.19.01 Driver Version: 465.19.01 CUDA Version: 11.3 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:81:00.0 Off | N/A | | 41% 26C P8 20W / 250W | 0MiB / 11019MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

    +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

    when I run adapter_fintuning.py I got this error:

    root@ubuntu:/home/project/xlm-t-main# python src/adapter_finetuning.py Some weights of the model checkpoint at cardiffnlp/twitter-xlm-roberta-base were not used when initializing XLMRobertaModelWithHeads: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.decoder.bias']

    • This IS expected if you are initializing XLMRobertaModelWithHeads from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    • This IS NOT expected if you are initializing XLMRobertaModelWithHeads from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of XLMRobertaModelWithHeads were not initialized from the model checkpoint at cardiffnlp/twitter-xlm-roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 0%| | 0/200 [00:00<?, ?it/s]Traceback (most recent call last): File "src/adapter_finetuning.py", line 157, in trainer.train() File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/trainer.py", line 787, in train tr_loss += self.training_step(model, inputs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/trainer.py", line 1138, in training_step loss = self.compute_loss(model, inputs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/trainer.py", line 1162, in compute_loss outputs = model(**inputs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/modeling_roberta.py", line 805, in forward return_dict=return_dict, File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/modeling_roberta.py", line 685, in forward raise ValueError("You have to specify either input_ids or inputs_embeds") ValueError: You have to specify either input_ids or inputs_embeds 0%| | 0/200 [00:00<?, ?it/s] (train_cpu) root@ubuntu:/home/project/xlm-t-main# python src/adapter_finetuning.py Some weights of the model checkpoint at cardiffnlp/twitter-xlm-roberta-base were not used when initializing XLMRobertaModelWithHeads: ['lm_head.bias', 'lm_head.dense.weight', 'lm_head.dense.bias', 'lm_head.layer_norm.weight', 'lm_head.layer_norm.bias', 'lm_head.decoder.weight', 'lm_head.decoder.bias']
    • This IS expected if you are initializing XLMRobertaModelWithHeads from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
    • This IS NOT expected if you are initializing XLMRobertaModelWithHeads from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of XLMRobertaModelWithHeads were not initialized from the model checkpoint at cardiffnlp/twitter-xlm-roberta-base and are newly initialized: ['roberta.pooler.dense.weight', 'roberta.pooler.dense.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. 0%| | 0/200 [00:00<?, ?it/s]Traceback (most recent call last): File "src/adapter_finetuning.py", line 157, in trainer.train() File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/trainer.py", line 787, in train tr_loss += self.training_step(model, inputs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/trainer.py", line 1138, in training_step loss = self.compute_loss(model, inputs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/trainer.py", line 1162, in compute_loss outputs = model(**inputs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/modeling_roberta.py", line 805, in forward return_dict=return_dict, File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl result = self.forward(*input, **kwargs) File "/root/anaconda3/envs/train_cpu/lib/python3.7/site-packages/transformers/modeling_roberta.py", line 685, in forward raise ValueError("You have to specify either input_ids or inputs_embeds") ValueError: You have to specify either input_ids or inputs_embeds 0%| | 0/200 [00:00<?, ?it/s]

    anybody can help?

    opened by longshared 0
  • readme dead link

    readme dead link

    The work on this is excellent. I've been using for fine-tuning sentiment / topic models with great success!

    One very nitpicky thing I noticed in the repo is that the README.md contains a link to https://huggingface.co/docs/transformers/model_doc/xlmroberta which is now broken. I believe that this should be https://huggingface.co/docs/transformers/model_doc/xlm-roberta.

    opened by AndrewBoney 0
  • Run on Colab CRASH

    Run on Colab CRASH

    I follow the step to run the model using a dataset with 5k tweets. However, when doing the 'output = model(**encoded_input)' the colab pro+ always crashes. Is there any solution to this issue? Thanks a lot!!!!

    opened by zwz-111 0
  • Distilled or smaller pretrained models?

    Distilled or smaller pretrained models?

    Hi

    xlm-t works really well on my dataset however to make it effective performance-wise I was wondering if you guys are planning to release smaller pretrained or distilled models?

    Best Anup

    opened by kaliaanup 0
  • Issues with finetuning colab notebook

    Issues with finetuning colab notebook

    The fine-tuning colab notebook doesn't fully work.

    The first cell in the Fine-tuning section errors out. I'm guessing it's probably due to specific versions of packages being required (which are not enforced in the first cell).

    Also, as a very minor additional thing, this link is broken: This notebook was modified from https://huggingface.co/transformers/custom_datasets.html.

    opened by micahcarroll 0
  • Potential (very minor) bug in fine-tuning script

    Potential (very minor) bug in fine-tuning script

    I believe this line may be a bug, as it overrides the global variable set earlier in the file:

    https://github.com/cardiffnlp/xlm-t/blob/aa9b15ef035876cec53d4ce55c04ec2e93acb92d/src/adapter_finetuning.py#L63

    opened by micahcarroll 1
Owner
Cardiff NLP
Cardiff NLP
Deep Text Search is an AI-powered multilingual text search and recommendation engine with state-of-the-art transformer-based multilingual text embedding (50+ languages).

Deep Text Search - AI Based Text Search & Recommendation System Deep Text Search is an AI-powered multilingual text search and recommendation engine w

null 19 Sep 29, 2022
Meta Language-Specific Layers in Multilingual Language Models

Meta Language-Specific Layers in Multilingual Language Models This repo contains the source codes for our paper On Negative Interference in Multilingu

Zirui Wang 20 Feb 13, 2022
XtremeDistil framework for distilling/compressing massive multilingual neural network models to tiny and efficient models for AI at scale

XtremeDistilTransformers for Distilling Massive Multilingual Neural Networks ACL 2020 Microsoft Research [Paper] [Video] Releasing [XtremeDistilTransf

Microsoft 125 Jan 4, 2023
null 190 Jan 3, 2023
Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology

Official repository for the ICLR 2021 paper Evaluating the Disentanglement of Deep Generative Models with Manifold Topology Sharon Zhou, Eric Zelikman

Stanford Machine Learning Group 34 Nov 16, 2022
This project is for a Twitter bot that monitors a bird feeder in my backyard. Any detected birds are identified and posted to Twitter.

Backyard Birdbot Introduction This is a silly hobby project to use existing ML models to: Detect any birds sighted by a webcam Identify whic

Chi Young Moon 71 Dec 25, 2022
Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

One model to speak them all ?? Audio Language Text ▷ Chinese 人人生而自由,在尊严和权利上一律平等。 ▷ English All human beings are born free and equal in dignity and rig

Mutian He 60 Nov 14, 2022
Code and models used in "MUSS Multilingual Unsupervised Sentence Simplification by Mining Paraphrases".

Multilingual Unsupervised Sentence Simplification Code and pretrained models to reproduce experiments in "MUSS: Multilingual Unsupervised Sentence Sim

Facebook Research 81 Dec 29, 2022
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

seq-to-mind 18 Dec 11, 2022
torchlm is aims to build a high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations

??A high level pipeline for face landmarks detection, supports training, evaluating, exporting, inference and 100+ data augmentations, compatible with torchvision and albumentations, can easily install with pip.

DefTruth 142 Dec 25, 2022
The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

SuperGen The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding. Requirements Before running, you

Yu Meng 38 Dec 12, 2022
High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.

TL;DR Ignite is a high-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently. Click on the image to

null 4.2k Jan 1, 2023
UNION: An Unreferenced Metric for Evaluating Open-ended Story Generation

UNION Automatic Evaluation Metric described in the paper UNION: An UNreferenced MetrIc for Evaluating Open-eNded Story Generation (EMNLP 2020). Please

null 50 Dec 30, 2022
A library for preparing, training, and evaluating scalable deep learning hybrid recommender systems using PyTorch.

collie_recs Collie is a library for preparing, training, and evaluating implicit deep learning hybrid recommender systems, named after the Border Coll

ShopRunner 97 Jan 3, 2023
Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper

Evaluating the Factual Consistency of Abstractive Text Summarization Authors: Wojciech Kryściński, Bryan McCann, Caiming Xiong, and Richard Socher Int

Salesforce 165 Dec 21, 2022
Evaluating different engineering tricks that make RL work

Reinforcement Learning Tricks, Index This repository contains the code for the paper "Distilling Reinforcement Learning Tricks for Video Games". Short

Anssi 15 Dec 26, 2022
BARTScore: Evaluating Generated Text as Text Generation

This is the Repo for the paper: BARTScore: Evaluating Generated Text as Text Generation Updates 2021.06.28 Release online evaluation Demo 2021.06.25 R

NeuLab 196 Dec 17, 2022
A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API

Timbre Dissimilarity Metrics A collection of metrics for evaluating timbre dissimilarity using the TorchMetrics API Installation pip install -e . Usag

Ben Hayes 21 Jan 5, 2022