GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Xinyan Zhao

Last update: Dec 26, 2022

Related tags

Deep Learning GLaRA

Overview

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

This paper is the code release of the paper GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition, which is accepted at EACL-2021.

This work aims at improving weakly supervised named entity reconigtion systems by automatically finding new rules that are helpful at identifying entities from data. The idea is, as shown in the following figure, if we know rule1: associated with->Disease is an accurate rule and it is semantically related to rule2: cause of->Disease, we should be able use rule2 as another accurate rule for identifying Disease entities.

The overall workflow is illustrated as below, for a specific type of rules, we frist extract a large set of possible rule candidates from unlabeled data. Then the rule candidates are constructed into a graph where each node represents a candidate and edges are built based on the semantic similarties of the node pairs. Next, by manually identifying a small set of nodes as seeding rules, we use a graph-based neural network to find new rules by propaging the labeling confidence from seeding rules to other candidates. Finally, with the newly learned rules, we follow weak supervision to create weakly labeled dataset by creating a labeling matrix on unlabeled data and training a generative model. Finally, we train our final NER system with a discriminative model.

Installation

Install required libraries

Install LinkedHMM[1] by running pip -r requirements.txt in command line, or from the official repo: https://github.com/BatsResearch/safranchik-aaai20-code.
Install Pytorch at https://pytorch.org/
Install Transformers at https://huggingface.co/transformers/installation.html
Install pytorch-geometric at https://pytorch-geometric.readthedocs.io/en/latest/notes/installation.html

Download dataset
- Once LinkedHMM is successfully installed, move all the files in "data" fold under LinkedHMM directory to the "datasets" folder in the currect directory.
- Download pretrained sciBERT embeddings here: https://huggingface.co/allenai/scibert_scivocab_uncased, and move it to the folder pretrained-model.

For saving the time of reading data, we cache all datasets into picked objects: python cache_datasets.py

Run experiments

The experiments on the three data sets are independently conducted. To run experiments for one task, (i.e NCBI), please go to folder code-NCBI. For the experiments on other datasets, namely BC5CDR and LaptopReview, please go to folder code-BC5CDR and code-LaptopReview and run the same commands.

Extract candidate rules for each type and cache embeddings, edges, seeds, etc.

run python prepare_candidates_and_embeddings.py --dataset NCBI --rule_type SurfaceForm to cache candidate rules, embeddings, edges, etc., for SurfaceForm rule.
other rule types are Suffix, Prefix, InclusivePreNgram, ExclusivePreNgram, InclusivePostNgram, ExclusivePostNgram, and Dependency.
all cached data will be save into the folder cached_seeds_and_embeddings.

Train propogation and find new rules.

run python propagate.py --dataset NCBI --rule_type SurfaceForm to learn SurfaceForm rules.
other rules are Suffix, Prefix, InclusivePreNgram, ExclusivePreNgram, InclusivePostNgram, ExclusivePostNgram, and Dependency.

Train LinkedHMM generative model

run python train_generative_model.py --dataset NCBI --use_SurfaceForm --use_Suffix --use_Prefix --use_InclusivePostNgram --use_Dependency.
The argument --use_[TYPE] is used to activate a specific type of rules.

Train discriminative model

run create_dataset_for_bert_tagger.py to prepare dataset for training the tagging model. (make sure to change the dataset and data_name variables in the file first.)
run train_discriminative_model.py

References

[1] Esteban Safranchik, Shiying Luo, Stephen H. Bach. Weakly Supervised Sequence Tagging from Noisy Rules.

Chinese clinical named entity recognition using pre-trained BERT model

Chinese clinical named entity recognition (CNER) using pre-trained BERT model Introduction Code for paper Chinese clinical named entity recognition wi

109 Dec 14, 2022

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition", accepted at ACL 2021. For details of the model and experiments, please see our paper.

87 Dec 16, 2022

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 8, 2022

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

structshot Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arz

47 Dec 27, 2022

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

12 Dec 14, 2022

Comments

TypeError when reading input files

Hi, How do you convert NCBI txt files to pickle? Can you please share the code to create pickle files? Thanks in advance,

GLaRA/code-NCBI$ python prepare_candidates_and_embeddings.py --dataset LaptopReview --rule_type SurfaceForm Train: 14472 Dev: 3901 Test: 3901 Running on the GPU Traceback (most recent call last): File "prepare_candidates_and_embeddings.py", line 81, in pos_cnt += collect_POS(sent, label='I') File "/home/sophie/Documents/Thesis/My_Expe/NER/GLaRA/code-NCBI/utils_prepare_data.py", line 34, in collect_POS tokens, tags = sentence['tokens'], sentence['tags'] TypeError: string indices must be integers

opened by JiahuiSophieHU 1

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Related tags

Overview

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Installation

Run experiments

References

You might also like...

Chinese clinical named entity recognition using pre-trained BERT model

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Hybrid CenterNet - Hybrid-supervised object detection / Weakly semi-supervised object detection

Chinese named entity recognization with BiLSTM using Keras

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

Comments

TypeError when reading input files

Owner

Xinyan Zhao

Weakly supervised medical named entity classification

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

Weakly Supervised Dense Event Captioning in Videos, i.e. generating multiple sentence descriptions for a video in a weakly-supervised manner.

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

An integration of several popular automatic augmentation methods, including OHL (Online Hyper-Parameter Learning for Auto-Augmentation Strategy) and AWS (Improving Auto Augment via Augmentation Wise Weight Sharing) by Sensetime Research.

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

Source Code For Template-Based Named Entity Recognition Using BART

A weakly-supervised scene graph generation codebase. The implementation of our CVPR2021 paper ``Linguistic Structures as Weak Supervision for Visual Scene Graph Generation''

Context Decoupling Augmentation for Weakly Supervised Semantic Segmentation