⚖️ A Statutory Article Retrieval Dataset in French.

Maastricht Law & Tech Lab

Last update: Nov 17, 2022

Related tags

Overview

A Statutory Article Retrieval Dataset in French

This repository contains the Belgian Statutory Article Retrieval Dataset (BSARD), as well as the code to reproduce the experimental results from the associated paper by A. Louis, G. Spanakis, and G. Van Dijck.

Abstract. Statutory article retrieval is the task of automatically retrieving law articles relevant to a legal question. While recent advances in natural language processing have sparked considerable interest in many legal tasks, statutory article retrieval remains primarily untouched due to the scarcity of large-scale and high-quality annotated datasets. To address this bottleneck, we introduce the Belgian Statutory Article Retrieval Dataset (BSARD), which consists of 1,100+ French native legal questions labeled by experienced jurists with relevant articles from a corpus of 22,600+ Belgian law articles. Using BSARD, we benchmark several unsupervised information retrieval methods based on term weighting and pooled embeddings. Our best performing baseline achieves 50.8% R@100, which is promising for the feasibility of the task and indicates that there is still substantial room for improvement. By the specificity of the data domain and addressed task, BSARD presents a unique challenge problem for future research on legal information retrieval.

Documentation

Detailed documentation on the dataset and how to reproduce the main experimental results can be found here.

Citation

For attribution in academic contexts, please cite this work as:

@article{louis2021statutory,
  title = {A Statutory Article Retrieval Dataset in French},
  author = {Louis, Antoine and Spanakis, Gerasimos and Van Dijck, Gijs},
  journal = {arXiv preprint arXiv:2108.11792},
  year = {2021},
}

License

This repository is licensed under the terms of the CC BY-NC-SA 4.0 license.

You might also like...

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

26 Apr 29, 2021

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

This repo provides the code of the following papers: (GAR) "Generation-Augmented Retrieval for Open-domain Question Answering", ACL 2021 (RIDER) "Read

49 Dec 26, 2022

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

CIRPLANT This repository contains the code and pre-trained models for Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT) For d

29 Nov 17, 2022

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Instance-level Image Retrieval using Reranking Transformers Fuwen Tan, Jiangbo Yuan, Vicente Ordonez, ICCV 2021. Abstract Instance-level image retriev

86 Dec 28, 2022

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation Official Code Repository for the paper "Unsupervised Documen

2 Oct 26, 2021

Question and answer retrieval in Turkish with BERT

trfaq Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! 🎉 What is this? At this repo, I'm

13 Oct 10, 2022

Legal text retrieval for python

legal-text-retrieval Overview This system contains 2 steps: generate training data containing negative sample found by mixture score of cosine(tfidf)

22 Dec 6, 2022

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Memorizing Transformers - Pytorch Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memori

364 Jan 6, 2023

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

textgenrnn Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code, or quickly tr

4.8k Dec 30, 2022

Comments

transformers library version should be specified

Hi,

As specified in the documentation, I made the environment from the yml file: time conda env create -n bsard -f environment.yml

However, I run into an error when doing the grid search:

Traceback (most recent call last):                                                                                                                                                   
  File "scripts/search_hyperparameters.py", line 11, in <module>
    from retriever import Word2vecRetriever, FasttextRetriever, BERTRetriever
  File "/u/salaunol/Documents/_2021_automne/bsard/scripts/retriever.py", line 16, in <module>
    from transformers import (CamembertModel, CamembertTokenizer,
ImportError: cannot import name 'CamembertModel' from 'transformers' (/u/salaunol/anaconda3/envs/bsard/lib/python3.8/site-packages/transformers/__init__.py)

The transformers version was the following: transformers 2.1.1 pyhd3eb1b0_0

I fixed it with pip install transformers==2.5.0 but it would be preferable to specify the version for each library in environment.yml

Edit: issue submitted too fast

opened by oliviersalaun 1

Releases(v1.0)

v1.0(Aug 26, 2021)

The Belgian Statutory Article Retrieval Dataset (BSARD) v1.0 is a French native corpus for studying statutory article retrieval. BSARD consists of more than 22,600 statutory articles from Belgian law and about 1,100 legal questions posed by Belgian citizens and labeled by experienced jurists with relevant articles from the corpus.
Source code(tar.gz)
Source code(zip)

Owner

Maastricht Law & Tech Lab

The Lab aims to offer innovative education and to build a creative community of researchers at the intersections of law, technology and data science.

GitHub

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 4, 2023

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Coreferee Author: Richard Paul Hudson, Explosion AI 1. Introduction 1.1 The basic idea 1.2 Getting started 1.2.1 English 1.2.2 French 1.2.3 German 1.2

70 Dec 12, 2022

Simple GUI where you can enter an article and get a crisp summarized version.

Text-Summarization-using-TextRank-BART Simple GUI where you can enter an article and get a crisp summarized version. How to run: Clone the repo Instal

4 Sep 28, 2022

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"

114 Dec 29, 2022

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

740 Dec 24, 2022

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

Delta Reading Comprehension Dataset 台達閱讀理解資料集 Delta Reading Comprehension Dataset (DRCD) 屬於通用領域繁體中文機器閱讀理解資料集。本資料集期望成為適用於遷移學習之標準中文閱讀理解資料集。本資料集從2,108篇

272 Dec 15, 2022

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

- 基于标题的大规模商品实体检索top1 一、任务介绍 CCKS 2020：基于标题的大规模商品实体检索，任务为对于给定的一个商品标题，参赛系统需要匹配到该标题在给定商品库中的对应商品实体。输入：输入文件包括若干行商品标题。输出：输出文本每一行包括此标题对应的商品实体，即给定知识库中商品 ID，

43 Nov 11, 2022

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning This is the PyTorch companion code for the paper: A

69 Jan 3, 2023

Scene Text Retrieval via Joint Text Detection and Similarity Learning

This is the code of "Scene Text Retrieval via Joint Text Detection and Similarity Learning". For more details, please refer to our CVPR2021 paper.

79 Nov 29, 2022

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"

The implementation of paper CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval. CLIP4Clip is a video-text retrieval model based

456 Jan 6, 2023

⚖️ A Statutory Article Retrieval Dataset in French.

Related tags

Overview

A Statutory Article Retrieval Dataset in French

Documentation

Citation

License

You might also like...

🏆 The 1st Place Submission to AICity Challenge 2021 Natural Language-Based Vehicle Retrieval Track (Alibaba-UTS submission)

Code for papers "Generation-Augmented Retrieval for Open-Domain Question Answering" and "Reader-Guided Passage Reranking for Open-Domain Question Answering", ACL 2021

Composed Image Retrieval using Pretrained LANguage Transformers (CIRPLANT)

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Unsupervised Document Expansion for Information Retrieval with Stochastic Text Generation

Question and answer retrieval in Turkish with BERT

Legal text retrieval for python

Implementation of Memorizing Transformers (ICLR 2022), attention net augmented with indexing and retrieval of memories using approximate nearest neighbors, in Pytorch

Easily train your own text-generating neural network of any size and complexity on any text dataset with a few lines of code.

Comments

transformers library version should be specified

Releases(v1.0)

v1.0(Aug 26, 2021)

Owner

Maastricht Law & Tech Lab

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

Coreference resolution for English, French, German and Polish, optimised for limited training data and easily extensible for further languages

Simple GUI where you can enter an article and get a crisp summarized version.

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

A 30000+ Chinese MRC dataset - Delta Reading Comprehension Dataset

CCKS-Title-based-large-scale-commodity-entity-retrieval-top1

Code for CVPR 2021 paper: Revamping Cross-Modal Recipe Retrieval with Hierarchical Transformers and Self-supervised Learning

Scene Text Retrieval via Joint Text Detection and Similarity Learning

An official implementation for "CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval"