Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Sapienza NLP group

Last update: Sep 9, 2022

Related tags

Overview

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources

Description

This is the repository for the paper Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources, to be presented at NAACL 2021 by Simone Conia, Andrea Bacciu and Roberto Navigli.

Abstract

While cross-lingual techniques are finding increasing success in a wide range of Natural Language Processing tasks, their application to Semantic Role Labeling (SRL) has been strongly limited by the fact that each language adopts its own linguistic formalism, from PropBank for English to AnCora for Spanish and PDT-Vallex for Czech, inter alia. In this work, we address this issue and present a unified model to perform cross-lingual SRL over heterogeneous linguistic resources. Our model implicitly learns a high-quality mapping for different formalisms across diverse languages without resorting to word alignment and/or translation techniques. We find that, not only is our cross-lingual system competitive with the current state of the art but that it is also robust to low-data scenarios. Most interestingly, our unified model is able to annotate a sentence in a single forward pass with all the inventories it was trained with, providing a tool for the analysis and comparison of linguistic theories across different languages.

Download

You can download a copy of all the files in this repository by cloning the git repository:

git clone https://github.com/SapienzaNLP/unify-srl.git

or download a zip archive.

Model Checkpoint

Link to Drive

To install

To install you can use the environment.yml.
To use the model with NVIDIA CUDA remember to install the torch-scatter package made for CUDA as described in the documentation.

Cite this work

@inproceedings{conia-etal-2021-unify-srl,
    title = "Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources",
    author = "Conia, Simone  and
      Bacciu, Andrea  and
      Navigli, Roberto",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = jun,
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2021.naacl-main.31",
    pages = "338--351",
}

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

R²SQL The PyTorch implementation of paper Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing. (AAAI 2021) Requirement

60 Dec 31, 2022

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks arXiv link: upcoming To be published in Findings of NA

16 Nov 12, 2022

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

MCSE: Multimodal Contrastive Learning of Sentence Embeddings This repository contains code and pre-trained models for our NAACL-2022 paper MCSE: Multi

Saarland University Spoken Language Systems Group

39 Nov 15, 2022

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

Crosslingual Coreference Coreference is amazing but the data required for training a model is very scarce. In our case, the available training for non

71 Jan 4, 2023

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

2.3k Dec 29, 2022

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

2k Feb 9, 2021

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

One Stop Anomaly Shop (OSAS) Quick start guide Step 1: Get/build the docker image Option 1: Use precompiled image (might not reflect latest changes):

148 Dec 26, 2022

Constituency Tree Labeling Tool

Constituency Tree Labeling Tool The purpose of this package is to solve the constituency tree labeling problem. Look from the dataset labeled by NLTK,

6 Dec 20, 2022

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

GuwenModels: 古文自然语言处理模型合集, 收录互联网上的古文相关模型及资源. A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

66 Dec 26, 2022

Comments

Clarifying question about the model architecture

Hello,

I would like to clarify the model architecture being used here. I thought this code performed predicate identification, predicate sense disambiguation, argument identification and argument classification in a single forward pass, but it looks like the "forward" function takes the indices of the predicate already identified as input. ie, see line 94 of srl/models/model.py.

I was hoping to emulate this model architecture for a similar task, but can't figure out how you go about predicate identification. Is the predicate identified before you encode and decode the sense and predicate-argument representations? If so, how?

Many thanks

opened by e-spaulding 2
1-shot data preprocessing

I was trying to replicate the results for the one shot setting but not sure which sentences are used in the training set. Can you release the data preprocessing script for the 1-shot setting? or is it just following the order of the dataset and remove the sentences with replicated predicate sense? Thank you.

opened by edchengg 1
evaluation script for cz
I was trying to run the evaluation script for cz and I got this error.

python evaluate_conll2009.py --scorer scorer_conll2009.pl --processor xlmr_ft_full_all/processor_config.json --model xlmr_ft_full_all/checkpoint_epoch=026-val_f1=0.9028.ckpt --config ../config/full/cz_config.json --output_dir xlmr_ft_full_all/

Evaluation on CZ

Traceback (most recent call last): File "evaluate_conll2009.py", line 114, in <module> if output_senses[i] != '_': IndexError: list index out of range

other languages work fine. not sure what happened to this one.
opened by edchengg 1
ResolvePackageNotfound

I was trying to create a new conda env with the command conda env create -f environment.yml, but I am getting ResolvePackageNotfound for some of the packages. I googled on it but didn't find a concrete solution. Please help me resolve the issue

opened by saithrinath 1

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Related tags

Overview

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources

Description

Abstract

Download

Model Checkpoint

To install

Cite this work

You might also like...

:hot_pepper: R²SQL: "Dynamic Hybrid Relation Network for Cross-Domain Context-Dependent Semantic Parsing." (AAAI 2021)

Code for the Findings of NAACL 2022(Long Paper): AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

NAACL 2022: MCSE: Multimodal Contrastive Learning of Sentence Embeddings

A multi-lingual approach to AllenNLP CoReference Resolution along with a wrapper for spaCy.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

One Stop Anomaly Shop: Anomaly detection using two-phase approach: (a) pre-labeling using statistics, Natural Language Processing and static rules; (b) anomaly scoring using supervised and unsupervised machine learning.

Constituency Tree Labeling Tool

A collection of Classical Chinese natural language processing models, including Classical Chinese related models and resources on the Internet.

Comments

Clarifying question about the model architecture

1-shot data preprocessing

evaluation script for cz

ResolvePackageNotfound

Owner

Sapienza NLP group

PyTorch original implementation of Cross-lingual Language Model Pretraining.

Meta learning algorithms to train cross-lingual NLI (multi-task) models

SASE : Self-Adaptive noise distribution network for Speech Enhancement with heterogeneous data of Cross-Silo Federated learning

Code for paper "Role-oriented Network Embedding Based on Adversarial Learning between Higher-order and Local Features"

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to Speech Systems

TLA - Twitter Linguistic Analysis

This repository contains Python scripts for extracting linguistic features from Filipino texts.

This code is the implementation of Text Emotion Recognition (TER) with linguistic features