Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Allegro Tech

Last update: Oct 18, 2022

Related tags

Text Data & NLP klejbenchmark-baselines

Overview

The KLEJ Benchmark Baselines

The KLEJ benchmark (Kompleksowa Lista Ewaluacji Językowych) is a set of nine evaluation tasks for the Polish language understanding.

This repository contains example scripts to easily fine-tune models from the transformers library on the KLEJ benchmark.

Installation

Install the Python package using the following commands:

$ git clone https://github.com/allegro/klejbenchmark-baselines
$ pip install klejbenchmark-baselines/

Quick Start

To fine-tune your model on KLEJ tasks using the default settings, you can use the provided example scripts.

First, download the KLEJ benchmark datasets:

$ bash scripts/download_klej.sh

After downloading KLEJ, customize training parameters inside the scripts/run_training.sh script and train the models using:

$ bash scripts/run_training.sh

It will create:

Tensorboard logs with training and validation metrics,
checkpoints of the best models,
a zip file with predictions for the test sets, which is a valid submission for the KLEJ benchmark.

The zip file can be submitted at the klejbenchmark.com website for the evaluation on the test sets.

Custom Training

It's also possible to train each model separately and customize the training parameters using the klejbenchmark_baselines/main.py script.

License

Apache 2 License

Citation

If you use this code, please cite the following paper:

@inproceedings{rybak-etal-2020-klej,
    title = "{KLEJ}: Comprehensive Benchmark for Polish Language Understanding",
    author = "Rybak, Piotr and Mroczkowski, Robert and Tracz, Janusz and Gawlik, Ireneusz",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.111",
    pages = "1191--1201",
}

Authors

This code was created by the Allegro Machine Learning Research team.

You can contact us at: [email protected]

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

wav2vec-toolkit A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models This repository accompanies the

29 Oct 23, 2022

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

PTR Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification" If you use the code, please cite the following paper: @art

118 Dec 30, 2022

A high-level yet extensible library for fast language model tuning via automatic prompt search

ruPrompts ruPrompts is a high-level yet extensible library for fast language model tuning via automatic prompt search, featuring integration with Hugg

37 Dec 7, 2022

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding This repository contains the official PyTorch implementation of th

26 Dec 14, 2022

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Styleformer A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/cas

431 Dec 19, 2022

Comments

Bump numpy from 1.16.5 to 1.22.0
Bumps numpy from 1.16.5 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Related tags

Overview

The KLEJ Benchmark Baselines

Installation

Quick Start

Custom Training

License

Citation

Authors

You might also like...

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

A high-level yet extensible library for fast language model tuning via automatic prompt search

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

Fine-tune GPT-3 with a Google Chat conversation history

Comments

Bump numpy from 1.16.5 to 1.22.0

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio

Owner

Allegro Tech

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Beyond the Imitation Game collaborative benchmark for enormous language models

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Fine-tuning scripts for evaluating transformer-based models on KLEJ benchmark.

Related tags

Overview

The KLEJ Benchmark Baselines

Installation

Quick Start

Custom Training

License

Citation

Authors

You might also like...

A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models

Code and datasets for our paper "PTR: Prompt Tuning with Rules for Text Classification"

A high-level yet extensible library for fast language model tuning via automatic prompt search

PyTorch implementation of the paper: Text is no more Enough! A Benchmark for Profile-based Spoken Language Understanding

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP

[ICCV 2021] Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification

NLPIR tutorial: pretrain for IR. pre-train on raw textual corpus, fine-tune on MS MARCO Document Ranking

Fine-tune GPT-3 with a Google Chat conversation history

Comments

Bump numpy from 1.16.5 to 1.22.0

v1.22.0

NumPy 1.22.0 Release Notes

Expired deprecations

Deprecated numeric style dtype strings have been removed

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

Owner

Allegro Tech

Multilingual Emotion classification using BERT (fine-tuning). Published at the WASSA workshop (ACL2022).

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

A framework for training and evaluating AI models on a variety of openly available dialogue datasets.

Code for evaluating Japanese pretrained models provided by NTT Ltd.

Code for our ACL 2021 (Findings) Paper - Fingerprinting Fine-tuned Language Models in the wild .

PyTorch implementation and pretrained models for XCiT models. See XCiT: Cross-Covariance Image Transformer

CrossNER: Evaluating Cross-Domain Named Entity Recognition (AAAI-2021)

Simple tool/toolkit for evaluating NLG (Natural Language Generation) offering various automated metrics.

Beyond the Imitation Game collaborative benchmark for enormous language models

Code and checkpoints for training the transformer-based Table QA models introduced in the paper TAPAS: Weakly Supervised Table Parsing via Pre-training.

Expired deprecations for `loads`, `ndfromtxt`, and `mafromtxt` in npyio