LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Last update: Dec 8, 2022

Related tags

Overview

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ 🏆 🧑‍🎓 👩‍⚖️

Dataset Summary

Inspired by the recent widespread use of the GLUE multi-task benchmark NLP dataset (Wang et al., 2018), the subsequent more difficult SuperGLUE (Wang et al., 2109), other previous multi-task NLP benchmarks (Conneau and Kiela,2018; McCann et al., 2018), and similar initiatives in other domains (Peng et al., 2019), we introduce LexGLUE, a benchmark dataset to evaluate the performance of NLP methods in legal tasks. LexGLUE is based on seven existing legal NLP datasets, selected using criteria largely from SuperGLUE.

We anticipate that more datasets, tasks, and languages will be added in later versions of LexGLUE. As more legal NLP datasets become available, we also plan to favor datasets checked thoroughly for validity (scores reflecting real-life performance), annotation quality, statistical power,and social bias (Bowman and Dahl, 2021).

As in GLUE and SuperGLUE (Wang et al., 2109) one of our goals is to push towards generic (or foundation) models that can cope with multiple NLP tasks, in our case legal NLP tasks,possibly with limited task-specific fine-tuning. An-other goal is to provide a convenient and informative entry point for NLP researchers and practitioners wishing to explore or develop methods for legalNLP. Having these goals in mind, the datasets we include in LexGLUE and the tasks they address have been simplified in several ways, discussed below, to make it easier for newcomers and generic models to address all tasks. We provide PythonAPIs integrated with Hugging Face (Wolf et al.,2020; Lhoest et al., 2021) to easily import all the datasets, experiment with and evaluate their performance.

By unifying and facilitating the access to a set of law-related datasets and tasks, we hope to attract not only more NLP experts, but also more interdisciplinary researchers (e.g., law doctoral students willing to take NLP courses). More broadly, we hope LexGLUE will speed up the adoption and transparent evaluation of new legal NLP methods and approaches in the commercial sector too. Indeed, there have been many commercial press releases in legal-tech industry, but almost no independent evaluation of the veracity of the performance of various machine learning and NLP-based offerings. A standard publicly available benchmark would also allay concerns of undue influence in predictive models, including the use of metadata which the relevant law expressly disregards.

If you participate, use the LexGLUE benchmark, or our experimentation library, please cite:

Ilias Chalkidis, Abhik Jana, Dirk Hartung, Michael Bommarito, Ion Androutsopoulos, Daniel Martin Katz, and Nikolaos Aletras. LexGLUE: A Benchmark Dataset for Legal Language Understanding in English. 2021. arXiv: 2110.00976.

@article{chalkidis-etal-2021-lexglue,
        title={LexGLUE: A Benchmark Dataset for Legal Language Understanding in English}, 
        author={Chalkidis, Ilias and Jana, Abhik and Hartung, Dirk and
        Bommarito, Michael and Androutsopoulos, Ion and Katz, Daniel Martin and
        Aletras, Nikolaos},
        year={2021},
        eprint={2110.00976},
        archivePrefix={arXiv},
        primaryClass={cs.CL},
        note = {arXiv: 2110.00976},
}

Supported Tasks

Dataset	Source	Sub-domain	Task Type	Training/Dev/Test Instances	Classes
ECtHR (Task A)	Chalkidis et al. (2019)	ECHR	Multi-label classification	9,000/1,000/1,000	10+1
ECtHR (Task B)	Chalkidis et al. (2021a)	ECHR	Multi-label classification	9,000/1,000/1,000	10
SCOTUS	Spaeth et al. (2020)	US Law	Multi-class classification	5,000/1,400/1,400	14
EUR-LEX	Chalkidis et al. (2021b)	EU Law	Multi-label classification	55,000/5,000/5,000	100
LEDGAR	Tuggener et al. (2020)	Contracts	Multi-class classification	60,000/10,000/10,000	100
UNFAIR-ToS	Lippi et al. (2019)	Contracts	Multi-label classification	5,532/2,275/1,607	8
CaseHOLD	Zheng et al. (2021)	US Law	Multiple choice QA	45,000/3,900/3,900	n/a

ECtHR (Task A)

The European Court of Human Rights (ECtHR) hears allegations that a state has breached human rights provisions of the European Convention of Human Rights (ECHR). For each case, the dataset provides a list of factual paragraphs (facts) from the case description. Each case is mapped to articles of the ECHR that were violated (if any).

ECtHR (Task B)

SCOTUS

The US Supreme Court (SCOTUS) is the highest federal court in the United States of America and generally hears only the most controversial or otherwise complex cases which have not been sufficiently well solved by lower courts. This is a single-label multi-class classification task, where given a document (court opinion), the task is to predict the relevant issue areas. The 14 issue areas cluster 278 issues whose focus is on the subject matter of the controversy (dispute).

EUR-LEX

European Union (EU) legislation is published in EUR-Lex portal. All EU laws are annotated by EU's Publications Office with multiple concepts from the EuroVoc thesaurus, a multilingual thesaurus maintained by the Publications Office. The current version of EuroVoc contains more than 7k concepts referring to various activities of the EU and its Member States (e.g., economics, health-care, trade). Given a document, the task is to predict its EuroVoc labels (concepts).

LEDGAR

LEDGAR dataset aims contract provision (paragraph) classification. The contract provisions come from contracts obtained from the US Securities and Exchange Commission (SEC) filings, which are publicly available from EDGAR. Each label represents the single main topic (theme) of the corresponding contract provision.

UNFAIR-ToS

The UNFAIR-ToS dataset contains 50 Terms of Service (ToS) from on-line platforms (e.g., YouTube, Ebay, Facebook, etc.). The dataset has been annotated on the sentence-level with 8 types of unfair contractual terms (sentences), meaning terms that potentially violate user rights according to the European consumer law.

CaseHOLD

The CaseHOLD (Case Holdings on Legal Decisions) dataset includes multiple choice questions about holdings of US court cases from the Harvard Law Library case law corpus. Holdings are short summaries of legal rulings accompany referenced decisions relevant for the present case. The input consists of an excerpt (or prompt) from a court decision, containing a reference to a particular case, while the holding statement is masked out. The model must identify the correct (masked) holding statement from a selection of five choices.

Leaderboard

Dataset	ECtHR Task A	ECtHR Task B	SCOTUS	EUR-LEX	LEDGAR	UNFAIR-ToS	CaseHOLD
Model	μ-F1 / m-F1	μ-F1 / m-F1	μ-F1 / m-F1	μ-F1 / m-F1	μ-F1 / m-F1	μ-F1 / m-F1	μ-F1 / m-F1
BERT (Devlin et al., 2018)	71.4 / 64.0	87.6 / 77.8	70.5 / 60.9	71.6 / 55.6	87.7 / 82.2	87.5 / 81.0	70.7
RoBERTa (Liu et al., 2019)	69.5 / 60.7	87.2 / 77.3	70.8 / 61.2	71.8 / 57.5	87.9 / 82.1	87.7 / 81.5	71.7
DeBERTa (He et al., 2021)	69.1 / 61.2	87.4 / 77.3	70.0 / 60.0	72.3 / 57.2	87.9 / 82.0	87.2 / 78.8	72.1
Longformer (Beltagy et al., 2020)	69.6 / 62.4	88.0 / 77.8	72.2 / 62.5	71.9 / 56.7	87.7 / 82.3	87.7 / 80.1	72.0
BigBird (Zaheer et al., 2021)	70.5 / 63.8	88.1 / 76.6	71.7 / 61.4	71.8 / 56.6	87.7 / 82.1	87.7 / 80.2	70.4
Legal-BERT (Chalkidis et al., 2020)	71.2 / 64.6	88.0 / 77.2	76.2 / 65.8	72.2 / 56.2	88.1 / 82.7	88.6 / 82.3	75.1
CaseLaw-BERT (Zheng et al., 2021)	71.2 / 64.2	88.0 / 77.5	76.4 / 66.2	71.0 / 55.9	88.0 / 82.3	88.3 / 81.0	75.6

Frequently Asked Questions (FAQ)

Where are the datasets?

We provide access to LexGLUE on Hugging Face Datasets (Lhoest et al., 2021) at https://huggingface.co/datasets/lex_glue.

For example to load the SCOTUS Spaeth et al. (2020) dataset, you first simply install the datasets python library and then make the following call:

from datasets import load_dataset 
dataset = load_dataset("lex_glue", "scotus")

How to run experiments?

Furthermore, to make reproducing the results for the already examined models or future models even easier, we release our code in this repository. In folder /experiments, there are Python scripts, relying on the Hugging Face Transformers library, to run and evaluate any Transformer-based model (e.g., BERT, RoBERTa, LegalBERT, and their hierarchical variants, as well as, Longforrmer, and BigBird). We also provide bash scripts in folder /scripts to replicate the experiments for each dataset with 5 randoms seeds, as we did for the reported results for the original leaderboard.

For example to replicate the results for RoBERTa (Liu et al., 2019) on UNFAIR-ToS Lippi et al. (2019), you have to configure the relevant bash script (run_unfair_tos.sh):

> nano run_unfair_tos.sh
GPU_NUMBER=1
MODEL_NAME='roberta-base'
LOWER_CASE='False'
BATCH_SIZE=8
ACCUMULATION_STEPS=1
TASK='unfair_tos'

and then run it:

> sh run_unfair_tos.sh

How to participate?

We are currently still lacking some technical infrastructure, e.g., an integrated submission environment comprised of an automated evaluation and an automatically updated leaderboard. We plan to develop the necessary publicly available web infrastructure extend the public infrastructure of LexGLUE in the near future.

In the mean-time, we ask participants to re-use and expand our code to submit new results, if possible, and raise a new issue in our repository (https://github.com/coastalcph/lex-glue/issues/new) presenting their results, providing the auto-generated result logs and the relevant publication (or pre-print), if available, accompanied with a pull request including the code amendments that are needed to reproduce their experiments. Upon reviewing your results, we'll update the public leaderboard accordingly.

I still have open questions...

Please post your question on Discussions section or communicate with the corresponding author via e-mail.

Comments

Bias in TF-IDF + SVM results for SCOTUS

You probably have realised that the results with TF-IDF + SVM approach for SCOTUS are pretty high, well, I think they have a bias. I think that the testing metrics are being computed after a retraining of the Pipeline with both training and validation sets combined, while the other Language Models are only fine-tuned with the training set. This is because sklearn.model_selection.GridSearchCV has the parameter "refit" equal True as default, which ends up in a biased comparison.

Training with only the training set for the best hyperparameters found in the validation set the microf1 score is closer to 74.0 and the macrof1 to 64.4

Reference: https://github.com/coastalcph/lex-glue/blob/5109aebadbfafe8eb8860851559cdbd719b88aa6/models/tfidf_svm.py#L84

opened by danigoju 7
Fast tokenizer for CaseHOLD

FYI: https://github.com/reglab/casehold/issues/2

The bug with the fast tokenizer should be fixed now, so it is possible to use it.

https://github.com/coastalcph/lex-glue/blob/dfee272e8d6f9997173736317529ba2f3a8b881e/experiments/case_hold.py#L161-L166

opened by atreyasha 6

About loading ecthr_a

Hi,

I tried run_ecthr.sh, but it failed to load dataset.

The error is from line 236 in experiments/ecthr.py

train_dataset = load_dataset("lex_glue", name=data_args.task, split="train", data_dir='data', cache_dir=model_args.cache_dir)

Error info:

Traceback (most recent call last):
  File "main_ecthr.py", line 505, in <module>
    main()
  File "main_ecthr.py", line 236, in main
    train_dataset = load_dataset("lex_glue", name=data_args.task, split="train", data_dir='data', cache_dir=model_args.cache_dir)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1723, in load_dataset
    builder_instance = load_dataset_builder(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1500, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1168, in dataset_module_factory
    return LocalDatasetModuleFactoryWithoutScript(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 691, in get_module
    else get_data_patterns_locally(base_path)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/data_files.py", line 451, in get_data_patterns_locally
    raise FileNotFoundError(f"The directory at {base_path} doesn't contain any data file") from None
FileNotFoundError: The directory at lex_glue/data doesn't contain any data file

If I delete data_dir='data', the error will turn to be:

08/02/2022 11:48:43 - INFO - datasets.data_files - Some files matched the pattern 'lex_glue/**[-._ 0-9/]train[-._ 0-9]*' at /workspace/MaxPlain/lex_glue but don't have valid data file extensions: [PosixPath('/workspace/MaxPlain/lex_glue/statistics/report_train_time.py')]
Traceback (most recent call last):
  File "main_ecthr.py", line 505, in <module>
    main()
  File "main_ecthr.py", line 236, in main
    train_dataset = load_dataset("lex_glue", name=data_args.task, split="train", cache_dir=model_args.cache_dir)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1723, in load_dataset
    builder_instance = load_dataset_builder(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1500, in load_dataset_builder
    dataset_module = dataset_module_factory(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 1168, in dataset_module_factory
    return LocalDatasetModuleFactoryWithoutScript(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/load.py", line 695, in get_module
    data_files = DataFilesDict.from_local_or_remote(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/data_files.py", line 786, in from_local_or_remote
    DataFilesList.from_local_or_remote(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/data_files.py", line 754, in from_local_or_remote
    data_files = resolve_patterns_locally_or_by_urls(base_path, patterns, allowed_extensions)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/datasets/data_files.py", line 359, in resolve_patterns_locally_or_by_urls
    raise FileNotFoundError(error_msg)
FileNotFoundError: Unable to resolve any data file that matches '['**[-._ 0-9/]train[-._ 0-9]*', 'train[-._ 0-9]*', '**[-._ 0-9/]training[-._ 0-9]*', 'training[-._ 0-9]*']' at /workspace/MaxPlain/lex_glue with any supported extension ['csv', 'tsv', 'json', 'jsonl', 'parquet', 'txt', 'blp', 'bmp', 'dib', 'bufr', 'cur', 'pcx', 'dcx', 'dds', 'ps', 'eps', 'fit', 'fits', 'fli', 'flc', 'ftc', 'ftu', 'gbr', 'gif', 'grib', 'h5', 'hdf', 'png', 'apng', 'jp2', 'j2k', 'jpc', 'jpf', 'jpx', 'j2c', 'icns', 'ico', 'im', 'iim', 'tif', 'tiff', 'jfif', 'jpe', 'jpg', 'jpeg', 'mpg', 'mpeg', 'msp', 'pcd', 'pxr', 'pbm', 'pgm', 'ppm', 'pnm', 'psd', 'bw', 'rgb', 'rgba', 'sgi', 'ras', 'tga', 'icb', 'vda', 'vst', 'webp', 'wmf', 'emf', 'xbm', 'xpm', 'BLP', 'BMP', 'DIB', 'BUFR', 'CUR', 'PCX', 'DCX', 'DDS', 'PS', 'EPS', 'FIT', 'FITS', 'FLI', 'FLC', 'FTC', 'FTU', 'GBR', 'GIF', 'GRIB', 'H5', 'HDF', 'PNG', 'APNG', 'JP2', 'J2K', 'JPC', 'JPF', 'JPX', 'J2C', 'ICNS', 'ICO', 'IM', 'IIM', 'TIF', 'TIFF', 'JFIF', 'JPE', 'JPG', 'JPEG', 'MPG', 'MPEG', 'MSP', 'PCD', 'PXR', 'PBM', 'PGM', 'PPM', 'PNM', 'PSD', 'BW', 'RGB', 'RGBA', 'SGI', 'RAS', 'TGA', 'ICB', 'VDA', 'VST', 'WEBP', 'WMF', 'EMF', 'XBM', 'XPM', 'zip']

Is there anything wrong in loading dataset?

opened by RichardHGL 4

Key Error while loading case_hold dataset

Hi,

I face with a problem while loading case_hold dataset. It reports Key Error: 'question'. Could you please solve it or give me some advice on loading that dataset?

Thank you very much!

dataset = load_dataset("lex_glue", "case_hold", revision="1.15.1")

Downloading and preparing dataset lex_glue/case_hold (download: 29.01 MiB, generated: 255.06 MiB, post-processed: Unknown size, total: 284.08 MiB) to C:\XXXXX Traceback (most recent call last): File "", line 1, in File "D:\Anaconda3\lib\site-packages\datasets\load.py", line 1632, in load_dataset builder_instance.download_and_prepare( File "D:\Anaconda3\lib\site-packages\datasets\builder.py", line 607, in download_and_prepare self._download_and_prepare( File "D:\Anaconda3\lib\site-packages\datasets\builder.py", line 697, in _download_and_prepare self._prepare_split(split_generator, **prepare_split_kwargs) File "D:\Anaconda3\lib\site-packages\datasets\builder.py", line 1103, in _prepare_split example = self.info.features.encode_example(record) File "D:\Anaconda3\lib\site-packages\datasets\features\features.py", line 1033, in encode_example return encode_nested_example(self, example) File "D:\Anaconda3\lib\site-packages\datasets\features\features.py", line 808, in encode_nested_example return { File "D:\Anaconda3\lib\site-packages\datasets\features\features.py", line 808, in return { File "D:\Anaconda3\lib\site-packages\datasets\utils\py_utils.py", line 108, in zip_dict yield key, tuple(d[key] for d in dicts) File "D:\Anaconda3\lib\site-packages\datasets\utils\py_utils.py", line 108, in yield key, tuple(d[key] for d in dicts) KeyError: 'question'

opened by HUIYINXUE 4

Script and results on eurlex

Hello! Thanks for this great repository. I have tried experiments on many of its subtasks and it works beautifully.

Now a problem is, when I am trying to reproduce the results on EUR-LEX, using run_eurlex.sh, it fails to give results similar to (or somewhere near) the ones in paper.

                       VALIDATION                                      | TEST
bert-base-uncased: MICRO-F1: 69.7      ± 0.1  MACRO-F1: 32.8   ± 0.4   | MICRO-F1: 63.1       MACRO-F1: 30.8

( I tried to change the model to legal-base-uncased, or change the number of epochs from 2 to 20, but these attempts failed too)

Can you help to have a look into this and give some suggestions?

A more detailed log for one of the 5 seeds are as follows:

...
[INFO|trainer.py:1419] 2022-06-27 05:09:06,003 >> ***** Running training *****
[INFO|trainer.py:1420] 2022-06-27 05:09:06,003 >>   Num examples = 55000
[INFO|trainer.py:1421] 2022-06-27 05:09:06,003 >>   Num Epochs = 2
[INFO|trainer.py:1422] 2022-06-27 05:09:06,003 >>   Instantaneous batch size per device = 8
[INFO|trainer.py:1423] 2022-06-27 05:09:06,003 >>   Total train batch size (w. parallel, distributed & accumulation) = 8
[INFO|trainer.py:1424] 2022-06-27 05:09:06,003 >>   Gradient Accumulation steps = 1
[INFO|trainer.py:1425] 2022-06-27 05:09:06,003 >>   Total optimization steps = 13750
{'loss': 0.1809, 'learning_rate': 2.890909090909091e-05, 'epoch': 0.07}
{'loss': 0.1112, 'learning_rate': 2.7818181818181818e-05, 'epoch': 0.15}
{'loss': 0.0966, 'learning_rate': 2.6727272727272728e-05, 'epoch': 0.22}
{'loss': 0.0857, 'learning_rate': 2.5636363636363635e-05, 'epoch': 0.29}
{'loss': 0.0784, 'learning_rate': 2.454545454545455e-05, 'epoch': 0.36}
{'loss': 0.072, 'learning_rate': 2.3454545454545456e-05, 'epoch': 0.44}
{'loss': 0.0676, 'learning_rate': 2.2363636363636366e-05, 'epoch': 0.51}
{'loss': 0.0663, 'learning_rate': 2.1272727272727273e-05, 'epoch': 0.58}
{'loss': 0.0632, 'learning_rate': 2.0181818181818183e-05, 'epoch': 0.65}
{'loss': 0.0603, 'learning_rate': 1.909090909090909e-05, 'epoch': 0.73}
{'loss': 0.0593, 'learning_rate': 1.8e-05, 'epoch': 0.8}
{'loss': 0.0571, 'learning_rate': 1.6909090909090907e-05, 'epoch': 0.87}
{'loss': 0.0551, 'learning_rate': 1.5818181818181818e-05, 'epoch': 0.95}
 50%|███████████████████████████████████████████████████████████████████                                                                   | 6875/13750 [14:19<14:12,  8.07it/s]
[INFO|trainer.py:622] 2022-06-27 05:23:25,910 >> The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and
have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2590] 2022-06-27 05:23:25,913 >> ***** Running Evaluation *****
[INFO|trainer.py:2592] 2022-06-27 05:23:25,914 >>   Num examples = 5000
[INFO|trainer.py:2595] 2022-06-27 05:23:25,914 >>   Batch size = 8
{'eval_loss': 0.06690910458564758, 'eval_macro-f1': 0.26581931249101237, 'eval_micro-f1': 0.6573569918647109, 'eval_runtime': 25.2148, 'eval_samples_per_second': 198.296, 'eval
_steps_per_second': 24.787, 'epoch': 1.0}
INFO|trainer.py:2340] 2022-06-27 05:23:51,131 >> Saving model checkpoint to logs/062605_eurlex_original/eurlex/bert-base-uncased/se
ed_5/checkpoint-6875
[INFO|configuration_utils.py:446] 2022-06-27 05:23:51,134 >> Configuration saved in logs/062605_eurlex_original/eurlex/bert-base-un
cased/seed_5/checkpoint-6875/config.json
[INFO|modeling_utils.py:1542] 2022-06-27 05:23:52,343 >> Model weights saved in logs/062605_eurlex_original/eurlex/bert-base-uncase
d/seed_5/checkpoint-6875/pytorch_model.bin
[INFO|tokenization_utils_base.py:2108] 2022-06-27 05:23:52,345 >> tokenizer config file saved in logs/062605_eurlex_original/eurlex
/bert-base-uncased/seed_5/checkpoint-6875/tokenizer_config.json
[INFO|tokenization_utils_base.py:2114] 2022-06-27 05:23:52,346 >> Special tokens file saved in logs/062605_eurlex_original/eurlex/b
ert-base-uncased/seed_5/checkpoint-6875/special_tokens_map.json
{'loss': 0.0546, 'learning_rate': 1.4727272727272728e-05, 'epoch': 1.02}
{'loss': 0.0531, 'learning_rate': 1.3636363636363637e-05, 'epoch': 1.09}
{'loss': 0.0518, 'learning_rate': 1.2545454545454545e-05, 'epoch': 1.16}
{'loss': 0.0521, 'learning_rate': 1.1454545454545455e-05, 'epoch': 1.24}
{'loss': 0.0497, 'learning_rate': 1.0363636363636364e-05, 'epoch': 1.31}
{'loss': 0.0481, 'learning_rate': 9.272727272727273e-06, 'epoch': 1.38}
{'loss': 0.0487, 'learning_rate': 8.181818181818181e-06, 'epoch': 1.45}
{'loss': 0.0488, 'learning_rate': 7.090909090909091e-06, 'epoch': 1.53}
{'loss': 0.0477, 'learning_rate': 6e-06, 'epoch': 1.6}
{'loss': 0.0476, 'learning_rate': 4.90909090909091e-06, 'epoch': 1.67}
{'loss': 0.047, 'learning_rate': 3.818181818181818e-06, 'epoch': 1.75}
{'loss': 0.0471, 'learning_rate': 2.7294545454545455e-06, 'epoch': 1.82}
{'loss': 0.0462, 'learning_rate': 1.6385454545454545e-06, 'epoch': 1.89}
{'loss': 0.0466, 'learning_rate': 5.476363636363636e-07, 'epoch': 1.96}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13750/13750 [29:11<00:00,  7.96it/s]
[INFO|trainer.py:622] 2022-06-27 05:38:17,987 >> The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and
have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2590] 2022-06-27 05:38:17,989 >> ***** Running Evaluation *****
[INFO|trainer.py:2592] 2022-06-27 05:38:17,989 >>   Num examples = 5000
[INFO|trainer.py:2595] 2022-06-27 05:38:17,989 >>   Batch size = 8
{'eval_loss': 0.06163998320698738, 'eval_macro-f1': 0.3223906812379972, 'eval_micro-f1': 0.6903704623792815, 'eval_runtime': 24.2671, 'eval_samples_per_second': 206.041, 'eval_
steps_per_second': 25.755, 'epoch': 2.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13750/13750 [29:36<00:00,  7.96it/s[
INFO|trainer.py:2340] 2022-06-27 05:38:42,258 >> Saving model checkpoint to logs/062605_eurlex_original/eurlex/bert-base-uncased/se
ed_5/checkpoint-13750
[INFO|configuration_utils.py:446] 2022-06-27 05:38:42,261 >> Configuration saved in logs/062605_eurlex_original/eurlex/bert-base-un
cased/seed_5/checkpoint-13750/config.json
[INFO|modeling_utils.py:1542] 2022-06-27 05:38:43,511 >> Model weights saved in logs/062605_eurlex_original/eurlex/bert-base-uncase
d/seed_5/checkpoint-13750/pytorch_model.bin
[INFO|tokenization_utils_base.py:2108] 2022-06-27 05:38:43,513 >> tokenizer config file saved in logs/062605_eurlex_original/eurlex
/bert-base-uncased/seed_5/checkpoint-13750/tokenizer_config.json
[INFO|tokenization_utils_base.py:2114] 2022-06-27 05:38:43,513 >> Special tokens file saved in logs/062605_eurlex_original/eurlex/b
ert-base-uncased/seed_5/checkpoint-13750/special_tokens_map.json
[INFO|trainer.py:1662] 2022-06-27 05:38:46,057 >>

Training completed. Do not forget to share your model on huggingface.co/models =)


[INFO|trainer.py:1727] 2022-06-27 05:38:46,057 >> Loading best model from logs/062605_eurlex_original/eurlex/bert-base-uncased/seed
_5/checkpoint-13750 (score: 0.6903704623792815).
{'train_runtime': 1781.228, 'train_samples_per_second': 61.755, 'train_steps_per_second': 7.719, 'train_loss': 0.06421310944990678, 'epoch': 2.0}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 13750/13750 [29:41<00:00,  7.72it/s]
[INFO|trainer.py:2340] 2022-06-27 05:38:47,236 >> Saving model checkpoint to logs/062605_eurlex_original/eurlex/bert-base-uncased/s
eed_5
[INFO|configuration_utils.py:446] 2022-06-27 05:38:47,261 >> Configuration saved in logs/062605_eurlex_original/eurlex/bert-base-un
cased/seed_5/config.json
[INFO|modeling_utils.py:1542] 2022-06-27 05:38:48,560 >> Model weights saved in logs/062605_eurlex_original/eurlex/bert-base-uncase
d/seed_5/pytorch_model.bin
[INFO|tokenization_utils_base.py:2108] 2022-06-27 05:38:48,562 >> tokenizer config file saved in logs/062605_eurlex_original/eurlex
/bert-base-uncased/seed_5/tokenizer_config.json
[INFO|tokenization_utils_base.py:2114] 2022-06-27 05:38:48,563 >> Special tokens file saved in logs/062605_eurlex_original/eurlex/b
ert-base-uncased/seed_5/special_tokens_map.json
***** train metrics *****
  epoch                    =        2.0
  train_loss               =     0.0642
  train_runtime            = 0:29:41.22
  train_samples            =      55000
  train_samples_per_second =     61.755
  train_steps_per_second   =      7.719
06/27/2022 05:38:48 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:622] 2022-06-27 05:38:48,611 >> The following columns in the evaluation set don't have a corresponding argument in `BertForSequenceClassification.forward` and
have been ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2590] 2022-06-27 05:38:48,620 >> ***** Running Evaluation *****
[INFO|trainer.py:2592] 2022-06-27 05:38:48,620 >>   Num examples = 5000
[INFO|trainer.py:2595] 2022-06-27 05:38:48,620 >>   Batch size = 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 625/625 [00:22<00:00, 27.85it/s]
***** eval metrics *****
  epoch                   =        2.0
  eval_loss               =     0.0616
  eval_macro-f1           =     0.3224
  eval_micro-f1           =     0.6904
  eval_runtime            = 0:00:22.48
  eval_samples            =       5000
  eval_samples_per_second =    222.372
  eval_steps_per_second   =     27.796
06/27/2022 05:39:11 - INFO - __main__ - *** Predict ***
[INFO|trainer.py:622] 2022-06-27 05:39:11,101 >> The following columns in the test set don't have a corresponding argument in `BertForSequenceClassification.forward` and have b
een ignored: text. If text are not expected by `BertForSequenceClassification.forward`,  you can safely ignore this message.
[INFO|trainer.py:2590] 2022-06-27 05:39:11,106 >> ***** Running Prediction *****
[INFO|trainer.py:2592] 2022-06-27 05:39:11,106 >>   Num examples = 5000
[INFO|trainer.py:2595] 2022-06-27 05:39:11,106 >>   Batch size = 8
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 625/625 [00:22<00:00, 27.64it/s]
***** predict metrics *****
  predict_loss               =     0.0712
  predict_macro-f1           =     0.2969
  predict_micro-f1           =     0.6196
  predict_runtime            = 0:00:22.44
  predict_samples            =       5000
  predict_samples_per_second =    222.741
  predict_steps_per_second   =     27.843
...

opened by Glaciohound 3

Number of Target Fields in the SCOTUS dataset on HuggingFace
The SCOTUS dataset available as part of the LexGlue corpus mentions 14 classes within the dataset. Upon verification over the HuggingFace SCOTUS dataset, we only get 13 classes through this method.

from datasets import load_dataset # !pip install datasets import numpy as np scotus = load_dataset('lex_glue', 'scotus') labels = list(scotus['train']['label']) classes = np.unique(labels) print(classes, len(classes)) scotus = load_dataset('lex_glue', 'scotus') labels = list(scotus['test']['label']) classes = np.unique(labels) print(classes, len(classes))

The results display on 13 unique classes instead of 14, as shown below.

Is there an issue in which we're extracting the data, if so we'd greatly appreciate any help.
opened by AmanPriyanshu 2

Bug about fp16 in experiment run_ecthr.sh

When I run the run_ecthr.sh script in experiments folder Such error occurs:

Traceback (most recent call last):
  File "main_ecthr.py", line 505, in <module>
    main()
  File "main_ecthr.py", line 454, in main
    train_result = trainer.train(resume_from_checkpoint=checkpoint)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 1498, in train
    return inner_training_loop(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 1832, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, model, trial, epoch, ignore_keys_for_eval)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 2038, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 2758, in evaluate
    output = eval_loop(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 2936, in evaluation_loop
    loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/trainer.py", line 3177, in prediction_step
    loss, outputs = self.compute_loss(model, inputs, return_outputs=True)
  File "/workspace/MaxPlain/lexglue/experiments/trainer.py", line 8, in compute_loss
    outputs = model(**inputs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/transformers/models/bert/modeling_bert.py", line 1556, in forward
    outputs = self.bert(
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/MaxPlain/lexglue/models/hierbert.py", line 100, in forward
    seg_encoder_outputs = self.seg_encoder(encoder_outputs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 238, in forward
    output = mod(output, src_mask=mask, src_key_padding_mask=src_key_padding_mask)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/envs/OmniXAI/lib/python3.8/site-packages/torch/nn/modules/transformer.py", line 437, in forward
    return torch._transformer_encoder_layer_fwd(
RuntimeError: expected scalar type Half but found Float

I try to debug it. And find that it maybe due to the trainer fail to put model to dtype=torch.fp16. I also tried the evaluation. It will fail and report the same error.

# Evaluation
    if training_args.do_eval:
        logger.info("*** Evaluate ***")
        metrics = trainer.evaluate(eval_dataset=eval_dataset)

        max_eval_samples = data_args.max_eval_samples if data_args.max_eval_samples is not None else len(eval_dataset)
        metrics["eval_samples"] = min(max_eval_samples, len(eval_dataset))

        trainer.log_metrics("eval", metrics)
        trainer.save_metrics("eval", metrics)

After I remove --fp16 --fp16_full_eval in the run_ecthr.sh, it works as expected.

opened by RichardHGL 2

Hyper-parameters of DeBERTa for EUR-LEX

Hi, my reproduced results for EUR-LEX are quite far from the reported ones. Could you provide the hyper-parameters of DeBERTa for EUR-LEX? And which version of DeBERTa is used, V2/V3, Base/Large?

Looking forward to your reply. Thanks!

opened by cooelf 2
scotus: ValueError: expected sequence of length 64 at dim 2 (got 128)

Hi, thanks for the awesome repo!

I have encountered an issue when running the scripts for scotus.

[INFO|trainer.py:1164] 2022-05-31 13:09:08,068 >> ***** Running training ***** [INFO|trainer.py:1165] 2022-05-31 13:09:08,068 >> Num examples = 100 [INFO|trainer.py:1166] 2022-05-31 13:09:08,068 >> Num Epochs = 10 [INFO|trainer.py:1167] 2022-05-31 13:09:08,068 >> Instantaneous batch size per device = 8 [INFO|trainer.py:1168] 2022-05-31 13:09:08,068 >> Total train batch size (w. parallel, distributed & accumulation) = 64 [INFO|trainer.py:1169] 2022-05-31 13:09:08,068 >> Gradient Accumulation steps = 1 [INFO|trainer.py:1170] 2022-05-31 13:09:08,068 >> Total optimization steps = 20 0%| | 0/20 [00:00<?, ?it/s]Traceback (most recent call last): File "/home/cooelf/lex-glue-main/scotus.py", line 490, in main() File "/home/cooelf/lex-glue-main/scotus.py", line 439, in main train_result = trainer.train(resume_from_checkpoint=checkpoint) File "/home/cooelf/.local/lib/python3.7/site-packages/transformers/trainer.py", line 1254, in train for step, inputs in enumerate(epoch_iterator): File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 521, in next data = self._next_data() File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 561, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 52, in fetch return self.collate_fn(data) File "/home/cooelf/.local/lib/python3.7/site-packages/transformers/data/data_collator.py", line 81, in default_data_collator batch[k] = torch.tensor([f[k] for f in features]) ValueError: expected sequence of length 64 at dim 2 (got 128) 0%| | 0/20 [00:00<?, ?it/s]

Process finished with exit code 1

It seems to be some problem about the data processing. I have checked the dimension of the features but failed to find anything strange.

Could you give some hints to solve it?

Thanks!

opened by cooelf 2
Loading Hierarchical models

Hi, i used the scripts and everything worked fine, i was able to train the models without any trouble. The results shown with the testing after training are also coherent.

But the issue (at the end of the message) occurred when i tried to load the model it order to test to predict other samples. It is not possible to load the model because there is a difference between the names of the layers expected and the layers in the file. As we can see in the error message (at the end), there are double occurences of "encoder" in some layer names of the saved file. When loading, the model does not use those layer names.

This problem happens with ECtHR (A & B) and Scotus tasks (maybe even others) with Bert models, it seems that the issue occurs when using hierarchical variant. When not using hierarchical, we dont have any problem to load the models after saving them. But the results are not as performant as they should be.

Do you have the same issue ? I am using Ubuntu 20.04 with python 3.8.

[WARNING|modeling_utils.py:1501] 2022-03-28 18:01:33,192 >> Some weights of the model checkpoint at /home/X/Xs/lex-glue/seed_1 were not used when initializing BertForSequenceClassification: ['bert.encoder.encoder.layer.4.attention.self.query.weight', 'bert.seg_encoder.layers.1.self_attn.out_proj.weight', 'bert.encoder.encoder.layer.8.attention.self.query.bias', 'bert.seg_encoder.layers.1.norm2.weight', 'bert.encoder.encoder.layer.10.output.dense.bias', 'bert.encoder.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.0.attention.self.key.weight', 'bert.encoder.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.seg_encoder.layers.1.self_attn.out_proj.bias', 'bert.seg_encoder.layers.0.norm1.bias', 'bert.encoder.encoder.layer.10.attention.self.query.bias', 'bert.encoder.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.encoder.layer.7.attention.self.value.weight', 'bert.seg_encoder.layers.1.norm2.bias', 'bert.encoder.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.embeddings.token_type_embeddings.weight', 'bert.encoder.embeddings.word_embeddings.weight', 'bert.encoder.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.encoder.layer.11.intermediate.dense.weight', 'bert.seg_encoder.layers.0.self_attn.out_proj.bias', 'bert.seg_encoder.layers.1.norm1.weight', 'bert.encoder.encoder.layer.10.output.dense.weight', 'bert.seg_encoder.layers.0.norm1.weight', 'bert.encoder.encoder.layer.8.attention.self.value.weight', 'bert.encoder.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.embeddings.LayerNorm.weight', 'bert.encoder.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.9.output.dense.weight', 'bert.encoder.encoder.layer.6.output.dense.weight', 'bert.encoder.encoder.layer.2.output.dense.weight', 'bert.encoder.encoder.layer.11.output.LayerNorm.bias', 'bert.seg_encoder.layers.1.norm1.bias', 'bert.encoder.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.encoder.layer.11.attention.self.key.bias', 'bert.encoder.encoder.layer.3.attention.output.dense.bias', 'bert.seg_encoder.layers.0.linear2.weight', 'bert.encoder.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.3.attention.self.query.weight', 'bert.encoder.encoder.layer.3.output.dense.weight', 'bert.seg_encoder.norm.weight', 'bert.encoder.encoder.layer.8.output.dense.bias', 'bert.seg_encoder.layers.1.linear2.weight', 'bert.encoder.embeddings.position_ids', 'bert.encoder.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.encoder.layer.4.attention.self.key.weight', 'bert.encoder.encoder.layer.3.attention.self.query.bias', 'bert.encoder.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.encoder.layer.10.attention.self.key.weight', 'bert.encoder.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.attention.self.query.bias', 'bert.encoder.encoder.layer.10.attention.self.value.weight', 'bert.encoder.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.encoder.layer.5.output.dense.bias', 'bert.encoder.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.2.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.encoder.layer.9.attention.self.key.weight', 'bert.encoder.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.encoder.layer.9.output.LayerNorm.weight', 'bert.encoder.encoder.layer.8.attention.self.key.bias', 'bert.encoder.encoder.layer.4.attention.self.value.weight', 'bert.encoder.encoder.layer.3.attention.self.value.bias', 'bert.encoder.encoder.layer.9.attention.self.value.bias', 'bert.encoder.encoder.layer.9.attention.self.key.bias', 'bert.encoder.encoder.layer.0.attention.self.value.weight', 'bert.encoder.encoder.layer.7.output.dense.weight', 'bert.encoder.encoder.layer.7.attention.self.query.weight', 'bert.seg_encoder.layers.0.self_attn.in_proj_weight', 'bert.encoder.encoder.layer.6.attention.self.value.weight', 'bert.encoder.encoder.layer.11.attention.self.query.bias', 'bert.seg_encoder.layers.0.self_attn.out_proj.weight', 'bert.encoder.encoder.layer.2.output.dense.bias', 'bert.seg_encoder.layers.1.self_attn.in_proj_weight', 'bert.seg_encoder.layers.1.linear2.bias', 'bert.encoder.encoder.layer.0.attention.self.key.bias', 'bert.encoder.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.encoder.layer.4.attention.self.value.bias', 'bert.seg_encoder.layers.0.self_attn.in_proj_bias', 'bert.encoder.encoder.layer.6.attention.self.query.bias', 'bert.encoder.embeddings.position_embeddings.weight', 'bert.encoder.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.pooler.dense.weight', 'bert.encoder.encoder.layer.2.output.LayerNorm.bias', 'bert.encoder.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.encoder.layer.1.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.5.attention.self.query.weight', 'bert.encoder.encoder.layer.1.attention.output.dense.weight', 'bert.encoder.encoder.layer.1.output.dense.weight', 'bert.encoder.encoder.layer.0.output.dense.weight', 'bert.encoder.encoder.layer.3.attention.self.key.weight', 'bert.encoder.encoder.layer.2.attention.self.value.weight', 'bert.encoder.encoder.layer.5.attention.self.query.bias', 'bert.encoder.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.encoder.layer.9.attention.self.query.bias', 'bert.encoder.encoder.layer.1.attention.self.key.bias', 'bert.encoder.encoder.layer.7.attention.self.key.bias', 'bert.encoder.encoder.layer.11.attention.self.value.bias', 'bert.encoder.encoder.layer.1.attention.self.query.weight', 'bert.encoder.encoder.layer.1.attention.output.dense.bias', 'bert.encoder.encoder.layer.9.attention.self.query.weight', 'bert.encoder.encoder.layer.5.output.dense.weight', 'bert.encoder.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.attention.self.value.bias', 'bert.seg_encoder.layers.1.self_attn.in_proj_bias', 'bert.encoder.encoder.layer.3.attention.self.value.weight', 'bert.encoder.encoder.layer.11.output.dense.weight', 'bert.encoder.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.encoder.layer.0.output.LayerNorm.bias', 'bert.seg_encoder.layers.0.linear1.bias', 'bert.encoder.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.encoder.layer.0.attention.self.query.weight', 'bert.encoder.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.embeddings.LayerNorm.bias', 'bert.encoder.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.encoder.layer.6.attention.self.key.bias', 'bert.encoder.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.encoder.layer.8.attention.self.value.bias', 'bert.encoder.encoder.layer.11.output.dense.bias', 'bert.encoder.encoder.layer.11.intermediate.dense.bias', 'bert.seg_encoder.norm.bias', 'bert.encoder.encoder.layer.1.attention.self.value.weight', 'bert.encoder.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.encoder.layer.7.attention.self.query.bias', 'bert.encoder.encoder.layer.10.attention.self.query.weight', 'bert.encoder.encoder.layer.0.attention.output.LayerNorm.weight', 'bert.seg_encoder.layers.1.linear1.weight', 'bert.encoder.encoder.layer.0.attention.self.value.bias', 'bert.encoder.encoder.layer.3.attention.self.key.bias', 'bert.encoder.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.encoder.layer.2.attention.output.dense.weight', 'bert.encoder.encoder.layer.7.attention.self.key.weight', 'bert.encoder.encoder.layer.6.output.LayerNorm.weight', 'bert.encoder.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.4.output.dense.weight', 'bert.encoder.encoder.layer.7.attention.self.value.bias', 'bert.encoder.encoder.layer.7.output.dense.bias', 'bert.encoder.encoder.layer.5.attention.self.value.bias', 'bert.encoder.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.encoder.layer.10.intermediate.dense.bias', 'bert.seg_encoder.layers.0.linear2.bias', 'bert.seg_encoder.layers.0.linear1.weight', 'bert.encoder.encoder.layer.11.attention.self.query.weight', 'bert.encoder.encoder.layer.2.attention.self.query.weight', 'bert.encoder.encoder.layer.5.attention.self.value.weight', 'bert.encoder.encoder.layer.4.output.dense.bias', 'bert.encoder.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.encoder.layer.0.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.11.attention.self.value.weight', 'bert.encoder.encoder.layer.5.attention.self.key.bias', 'bert.encoder.encoder.layer.11.attention.self.key.weight', 'bert.encoder.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.encoder.layer.1.output.dense.bias', 'bert.encoder.encoder.layer.2.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.7.attention.output.dense.weight', 'bert.encoder.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.encoder.layer.6.attention.self.key.weight', 'bert.encoder.encoder.layer.2.attention.output.dense.bias', 'bert.encoder.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.2.attention.self.key.weight', 'bert.encoder.pooler.dense.bias', 'bert.encoder.encoder.layer.2.attention.self.query.bias', 'bert.encoder.encoder.layer.0.output.dense.bias', 'bert.encoder.encoder.layer.6.attention.self.query.weight', 'bert.encoder.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.encoder.layer.0.attention.output.dense.bias', 'bert.encoder.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.encoder.layer.0.attention.self.query.bias', 'bert.encoder.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.encoder.layer.8.attention.self.query.weight', 'bert.encoder.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.encoder.layer.8.output.dense.weight', 'bert.encoder.encoder.layer.10.attention.self.value.bias', 'bert.encoder.encoder.layer.3.attention.output.dense.weight', 'bert.seg_encoder.layers.0.norm2.bias', 'bert.encoder.encoder.layer.9.attention.self.value.weight', 'bert.encoder.encoder.layer.8.attention.self.key.weight', 'bert.encoder.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.0.attention.output.dense.weight', 'bert.encoder.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.encoder.layer.9.output.dense.bias', 'bert.encoder.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.encoder.layer.6.output.dense.bias', 'bert.encoder.encoder.layer.1.attention.self.key.weight', 'bert.encoder.encoder.layer.5.attention.output.dense.bias', 'bert.seg_pos_embeddings.weight', 'bert.encoder.encoder.layer.2.attention.self.key.bias', 'bert.encoder.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.encoder.layer.2.intermediate.dense.bias', 'bert.encoder.encoder.layer.3.output.dense.bias', 'bert.encoder.encoder.layer.10.attention.self.key.bias', 'bert.encoder.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.encoder.layer.9.output.LayerNorm.bias', 'bert.seg_encoder.layers.0.norm2.weight', 'bert.encoder.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.encoder.layer.4.attention.self.query.bias', 'bert.encoder.encoder.layer.5.attention.self.key.weight', 'bert.encoder.encoder.layer.6.attention.self.value.bias', 'bert.seg_encoder.layers.1.linear1.bias', 'bert.encoder.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.encoder.layer.2.attention.self.value.bias', 'bert.encoder.encoder.layer.4.attention.self.key.bias', 'bert.encoder.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.encoder.layer.7.attention.output.LayerNorm.bias'] This IS expected if you are initializing BertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). This IS NOT expected if you are initializing BertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). [WARNING|modeling_utils.py:1512] 2022-03-28 18:01:33,192 >> Some weights of BertForSequenceClassification were not initialized from the model checkpoint at /home/X/X/lex-glue/seed_1 and are newly initialized: ['bert.encoder.layer.0.output.LayerNorm.weight', 'bert.encoder.layer.4.output.dense.weight', 'bert.embeddings.LayerNorm.bias', 'bert.encoder.layer.11.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.key.bias', 'bert.encoder.layer.4.output.LayerNorm.weight', 'bert.encoder.layer.11.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.value.bias', 'bert.encoder.layer.8.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.key.bias', 'bert.encoder.layer.6.attention.self.query.weight', 'bert.encoder.layer.3.attention.self.key.weight', 'bert.encoder.layer.2.output.dense.bias', 'bert.encoder.layer.11.output.dense.weight', 'bert.encoder.layer.6.output.dense.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.attention.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.query.bias', 'bert.encoder.layer.6.attention.self.key.bias', 'bert.encoder.layer.3.attention.self.key.bias', 'bert.encoder.layer.6.attention.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.key.bias', 'bert.encoder.layer.10.attention.output.LayerNorm.bias', 'bert.encoder.layer.7.intermediate.dense.bias', 'bert.encoder.layer.2.intermediate.dense.weight', 'bert.encoder.layer.3.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.weight', 'bert.encoder.layer.4.intermediate.dense.weight', 'bert.encoder.layer.0.intermediate.dense.bias', 'bert.encoder.layer.9.attention.output.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.output.dense.weight', 'bert.encoder.layer.2.attention.output.dense.bias', 'bert.encoder.layer.3.intermediate.dense.bias', 'bert.encoder.layer.0.attention.self.key.weight', 'bert.encoder.layer.6.output.LayerNorm.bias', 'bert.encoder.layer.4.intermediate.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.bias', 'bert.encoder.layer.7.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.output.dense.weight', 'bert.encoder.layer.8.attention.self.value.weight', 'bert.encoder.layer.5.output.dense.weight', 'bert.encoder.layer.4.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.attention.self.query.bias', 'bert.encoder.layer.9.output.dense.weight', 'bert.encoder.layer.1.attention.output.dense.weight', 'bert.encoder.layer.6.intermediate.dense.bias', 'bert.encoder.layer.0.attention.output.dense.weight', 'bert.encoder.layer.10.attention.output.LayerNorm.weight', 'bert.encoder.layer.3.output.dense.bias', 'bert.encoder.layer.5.output.LayerNorm.weight', 'bert.encoder.layer.1.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.intermediate.dense.weight', 'bert.encoder.layer.2.attention.output.dense.weight', 'bert.encoder.layer.7.attention.self.key.bias', 'bert.encoder.layer.1.attention.self.value.weight', 'bert.encoder.layer.11.attention.self.key.weight', 'bert.encoder.layer.11.attention.self.value.bias', 'bert.encoder.layer.8.attention.self.value.bias', 'bert.encoder.layer.9.intermediate.dense.bias', 'bert.encoder.layer.10.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.weight', 'bert.encoder.layer.0.output.LayerNorm.bias', 'bert.encoder.layer.11.intermediate.dense.weight', 'bert.encoder.layer.5.intermediate.dense.weight', 'bert.encoder.layer.5.attention.self.value.bias', 'bert.encoder.layer.11.attention.output.dense.bias', 'bert.encoder.layer.5.intermediate.dense.bias', 'bert.encoder.layer.3.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.value.weight', 'bert.encoder.layer.10.attention.output.dense.weight', 'bert.encoder.layer.2.output.dense.weight', 'bert.encoder.layer.2.attention.self.query.bias', 'bert.encoder.layer.7.output.LayerNorm.weight', 'bert.encoder.layer.2.attention.self.query.weight', 'bert.encoder.layer.0.output.dense.weight', 'bert.encoder.layer.11.attention.self.query.weight', 'bert.encoder.layer.11.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.bias', 'bert.encoder.layer.8.output.dense.weight', 'bert.encoder.layer.7.attention.output.dense.weight', 'bert.pooler.dense.weight', 'bert.encoder.layer.6.attention.output.dense.bias', 'bert.encoder.layer.1.intermediate.dense.bias', 'bert.encoder.layer.5.attention.self.query.weight', 'bert.encoder.layer.1.output.dense.weight', 'bert.encoder.layer.7.attention.self.key.weight', 'bert.encoder.layer.4.attention.self.value.bias', 'bert.encoder.layer.0.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.query.weight', 'bert.encoder.layer.2.attention.self.value.bias', 'bert.encoder.layer.6.attention.self.key.weight', 'bert.encoder.layer.10.intermediate.dense.bias', 'bert.encoder.layer.2.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.query.weight', 'bert.encoder.layer.1.attention.self.key.bias', 'bert.encoder.layer.1.output.LayerNorm.bias', 'bert.encoder.layer.9.output.LayerNorm.bias', 'bert.encoder.layer.3.attention.output.dense.bias', 'bert.encoder.layer.6.attention.self.value.bias', 'bert.encoder.layer.7.attention.output.LayerNorm.bias', 'bert.encoder.layer.1.attention.self.query.weight', 'bert.encoder.layer.9.attention.self.value.weight', 'bert.encoder.layer.11.intermediate.dense.bias', 'bert.encoder.layer.8.intermediate.dense.weight', 'bert.encoder.layer.5.attention.output.dense.bias', 'bert.encoder.layer.0.intermediate.dense.weight', 'bert.encoder.layer.10.attention.output.dense.bias', 'bert.encoder.layer.4.attention.self.query.weight', 'bert.encoder.layer.10.output.dense.weight', 'bert.encoder.layer.1.attention.self.query.bias', 'bert.encoder.layer.5.attention.self.value.weight', 'bert.embeddings.position_embeddings.weight', 'bert.encoder.layer.7.output.LayerNorm.bias', 'bert.encoder.layer.8.intermediate.dense.bias', 'bert.encoder.layer.10.attention.self.key.weight', 'bert.encoder.layer.0.output.dense.bias', 'bert.encoder.layer.8.output.LayerNorm.bias', 'bert.encoder.layer.11.attention.self.value.weight', 'bert.encoder.layer.5.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.bias', 'bert.encoder.layer.10.intermediate.dense.weight', 'bert.encoder.layer.0.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.key.bias', 'bert.encoder.layer.4.attention.self.value.weight', 'bert.encoder.layer.4.attention.output.dense.bias', 'bert.encoder.layer.10.attention.self.query.bias', 'bert.encoder.layer.1.attention.self.key.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.weight', 'bert.encoder.layer.1.output.dense.bias', 'bert.encoder.layer.1.attention.output.LayerNorm.weight', 'bert.encoder.layer.4.output.LayerNorm.bias', 'bert.encoder.layer.5.attention.self.query.bias', 'bert.encoder.layer.1.attention.output.dense.bias', 'bert.encoder.layer.3.attention.self.value.bias', 'bert.encoder.layer.9.attention.output.dense.bias', 'bert.encoder.layer.8.attention.self.key.weight', 'bert.encoder.layer.2.attention.output.LayerNorm.weight', 'bert.encoder.layer.9.attention.self.query.weight', 'bert.encoder.layer.8.attention.self.key.bias', 'bert.encoder.layer.9.output.LayerNorm.weight', 'bert.embeddings.token_type_embeddings.weight', 'bert.encoder.layer.4.output.dense.bias', 'bert.encoder.layer.6.intermediate.dense.weight', 'bert.encoder.layer.6.output.dense.weight', 'bert.encoder.layer.8.attention.output.LayerNorm.bias', 'bert.encoder.layer.8.output.dense.bias', 'bert.encoder.layer.9.attention.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.value.bias', 'bert.encoder.layer.11.attention.self.query.bias', 'bert.encoder.layer.10.output.LayerNorm.weight', 'bert.encoder.layer.0.attention.self.query.weight', 'bert.encoder.layer.5.attention.self.key.weight', 'bert.encoder.layer.3.intermediate.dense.weight', 'bert.encoder.layer.11.attention.output.LayerNorm.bias', 'bert.encoder.layer.9.output.dense.bias', 'bert.encoder.layer.7.attention.output.dense.bias', 'bert.encoder.layer.8.attention.output.dense.weight', 'bert.encoder.layer.2.attention.self.key.weight', 'bert.encoder.layer.3.attention.self.query.bias', 'bert.encoder.layer.1.attention.self.value.bias', 'bert.encoder.layer.5.output.dense.bias', 'bert.encoder.layer.7.output.dense.bias', 'bert.encoder.layer.2.intermediate.dense.bias', 'bert.embeddings.word_embeddings.weight', 'bert.encoder.layer.9.attention.output.LayerNorm.weight', 'bert.encoder.layer.7.intermediate.dense.weight', 'bert.encoder.layer.0.attention.output.LayerNorm.bias', 'bert.encoder.layer.6.attention.self.query.bias', 'bert.encoder.layer.2.output.LayerNorm.bias', 'bert.embeddings.LayerNorm.weight', 'bert.encoder.layer.6.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.query.bias', 'bert.encoder.layer.10.output.LayerNorm.bias', 'bert.encoder.layer.10.attention.self.value.weight', 'bert.encoder.layer.2.attention.self.value.weight', 'bert.encoder.layer.3.attention.output.dense.weight', 'bert.encoder.layer.9.attention.self.value.bias', 'bert.encoder.layer.3.attention.self.query.weight', 'bert.encoder.layer.7.output.dense.weight', 'bert.encoder.layer.1.attention.output.LayerNorm.bias', 'bert.encoder.layer.1.output.LayerNorm.weight', 'bert.encoder.layer.8.attention.output.dense.bias', 'bert.encoder.layer.7.attention.self.query.bias', 'bert.encoder.layer.2.output.LayerNorm.weight', 'bert.encoder.layer.4.attention.self.key.weight', 'bert.encoder.layer.0.attention.output.LayerNorm.weight', 'bert.encoder.layer.5.attention.self.key.bias', 'bert.encoder.layer.7.attention.self.value.weight', 'bert.encoder.layer.11.attention.output.dense.weight', 'bert.encoder.layer.3.attention.self.value.weight', 'bert.encoder.layer.9.attention.self.key.weight', 'bert.encoder.layer.9.attention.self.key.bias', 'bert.encoder.layer.6.output.LayerNorm.weight', 'bert.pooler.dense.bias'] <

opened by Just-Strato 2
Shell script for `ecthr_b`

Hello all, thank you for releasing this repository!

I am currently working on reproducing some of the results in this repository. In the readme, benchmarked results are presented for all tasks including ECtHR Task A and ECtHR Task B. However, the shell script run_ecthr.sh only encodes one task; namely ecthr_a:

https://github.com/coastalcph/lex-glue/blob/dfee272e8d6f9997173736317529ba2f3a8b881e/scripts/run_ecthr.sh#L6

Is there a reason for this, or is it implied to run this script another time after changing the TASK variable to ecthr_b?

opened by atreyasha 2

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English

Related tags

Overview

LexGLUE: A Benchmark Dataset for Legal Language Understanding in English ⚖️ 🏆 🧑‍🎓 👩‍⚖️

Dataset Summary

Supported Tasks

ECtHR (Task A)

ECtHR (Task B)

SCOTUS

EUR-LEX

LEDGAR

UNFAIR-ToS

CaseHOLD

Leaderboard

Frequently Asked Questions (FAQ)

Where are the datasets?

How to run experiments?

How to participate?

I still have open questions...

Comments

Owner

Source code and dataset for ACL2021 paper: "ERICA: Improving Entity and Relation Understanding for Pre-trained Language Models via Contrastive Learning".

LoveDA: A Remote Sensing Land-Cover Dataset for Domain Adaptive Semantic Segmentation (NeurIPS2021 Benchmark and Dataset Track)

OpenMMLab's Next Generation Video Understanding Toolbox and Benchmark

[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

DUE: End-to-End Document Understanding Benchmark

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

LTR_CrossEncoder: Legal Text Retrieval Zalo AI Challenge 2021

PyTorch code for BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation

The source code for Generating Training Data with Language Models: Towards Zero-Shot Language Understanding.

Hypersim: A Photorealistic Synthetic Dataset for Holistic Indoor Scene Understanding

The source code for the Cutoff data augmentation approach proposed in this paper: "A Simple but Tough-to-Beat Data Augmentation Approach for Natural Language Understanding and Generation".

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

[ICML 2021] Towards Understanding and Mitigating Social Biases in Language Models

CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation

YouRefIt: Embodied Reference Understanding with Language and Gesture

The dataset and source code for our paper: "Did You Ask a Good Question? A Cross-Domain Question IntentionClassification Benchmark for Text-to-SQL"

HDR Video Reconstruction: A Coarse-to-fine Network and A Real-world Benchmark Dataset (ICCV 2021)

A novel benchmark dataset for Monocular Layout prediction