The source code for 'Noisy-Labeled NER with Confidence Estimation' accepted by NAACL 2021

Last update: Nov 12, 2022

Related tags

Deep Learning Noisy-NER-Confidence-Estimation

Overview

Kun Liu*, Yao Fu*, Chuanqi Tan, Mosha Chen, Ningyu Zhang, Songfang Huang, Sheng Gao. Noisy-Labeled NER with Confidence Estimation. NAACL 2021. [arxiv]

Requirements

pip install -r requirements.txt

Data

The format of datasets includes three columns, the first column is word, the second column is noisy labels and the third column is gold labels. For datasets without golden labels, you could set the third column the same as the second column. We provide the CoNLL 2003 English with recall 0.5 and precision 0.9 in './data/eng_r0.5p0.9'

Confidence Estimation Strategies

Local Strategy

python confidence_estimation_local.py --dataset eng_r0.5p0.9 --embedding_file ${PATH_TO_EMBEDDING} --embedding_dim ${DIM_OF_EMBEDDING} --neg_noise_rate ${NOISE_RATE_OF_NEGATIVES} --pos_noise_rate ${NOISE_RATE_OF_POSITIVES}

For '--neg_noise_rate' and '--pos_noise_rate', you can set them as -1.0 to use golden noise rate (experiment 12 in Table 1 For En), or you can set them as other values (i.e., --neg_noise_rate 0.09 --pos_noise_rate 0.14 for experiment 10, En)

Global Strategy

python confidence_estimation_global.py --dataset eng_r0.5p0.9 --embedding_file ${PATH_TO_EMBEDDING} --embedding_dim ${DIM_OF_EMBEDDING} --neg_noise_rate ${NOISE_RATE_OF_NEGATIVES} --pos_noise_rate ${NOISE_RATE_OF_POSITIVES}

For 'neg_noise_rate' and 'pos_noise_rate', you can set them as -1.0 to use golden noise rate (experiment 13 in Table 1 for En), or you can set them as other values (i.e., --neg_noise_rate 0.1 --pos_noise_rate 0.13 for experiment 11, En)

Key Implementation

equation (3) is implemented in ./model/linear_partial_crf_inferencer.py, line 79-85.

equation (4) is implemented in ./model/neuralcrf_small_loss_constrain_local.py, line 139.

equation (5) is implemented in ./confidence_estimation_local.py, line 74-87 or ./confidence_estimation_global.py, line 75-85.

equation (6) and (7) are implemented in ./model/neuralcrf_small_loss_constrain_global.py, line 188-194 or ./model/neuralcrf_small_loss_constrain_local.py, line 188-197.

For global strategy, equation (8) is implemented in ./model/neuralcrf_small_loss_constrain_global.py, line 195-214 and ./model/linear_partial_crf_inferencer.py, line 36-48. For local strategy, equation (8) is implemented in ./model/neuralcrf_small_loss_constrain_local.py, line 198-215 and ./model/linear_crf_inferencer.py, line 36-48.

the code for paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration"

EOW-Softmax This code is for the paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration". Accepted by ICCV21. Usage Commnd exa

36 Dec 2, 2022

Code for "The Box Size Confidence Bias Harms Your Object Detector"

The Box Size Confidence Bias Harms Your Object Detector - Code Disclaimer: This repository is for research purposes only. It is designed to maintain r

24 Dec 7, 2022

[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

DoDNet This repo holds the pytorch implementation of DoDNet: DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datase

116 Dec 12, 2022

The toolkit to generate auto labeled datasets

Ozeu Ozeu is the toolkit to autolabal dataset for instance segmentation. You can generate datasets labaled with segmentation mask and bounding box fro

28 Mar 28, 2022

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

DatasetGAN This is the official code and data release for: DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort Yuxuan Zhang*, Huan Li

302 Jan 5, 2023

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

Published by SpaceML • About SpaceML • Quick Colab Example Self-Supervised Learner The Self-Supervised Learner can be used to train a classifier with

92 Nov 30, 2022

Comments

Source Code

Hello, your work is very interesting, but I still can't understand some parts of the article. I was wondering if you could share the code for us to learn, thank you.

opened by Jeremy-07 2
About using pretrained LM.

Hello, I read your paper and so interesting. I am interested in the results of using the pretrained LM for your method. However, the result was not listed. (Ours with Pretrained LM in the general noise in the table1.) Is it possible to commit the code that works with pretrained LM as well as Bi-LSTM?

opened by yhstam 0

The source code for 'Noisy-Labeled NER with Confidence Estimation' accepted by NAACL 2021

Related tags

Overview

Requirements

Data

Confidence Estimation Strategies

Local Strategy

Global Strategy

Key Implementation

You might also like...

the code for paper "Energy-Based Open-World Uncertainty Modeling for Confidence Calibration"

Code for "The Box Size Confidence Bias Harms Your Object Detector"

[CVPR2021] DoDNet: Learning to segment multi-organ and tumors from multiple partially labeled datasets

The toolkit to generate auto labeled datasets

DatasetGAN: Efficient Labeled Data Factory with Minimal Human Effort

The Self-Supervised Learner can be used to train a classifier with fewer labeled examples needed using self-supervised learning.

BESS: Balanced Evolutionary Semi-Stacking for Disease Detection via Partially Labeled Imbalanced Tongue Data

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

Codes for NAACL 2021 Paper "Unsupervised Multi-hop Question Answering by Question Generation"

Comments

Source Code

About using pretrained LM.

Owner

Source code of NeurIPS 2021 Paper ''Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration''

source code and pre-trained/fine-tuned checkpoint for NAACL 2021 paper LightningDOT

Source code for NAACL 2021 paper "TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference"

Python and C++ implementation of "MarkerPose: Robust real-time planar target tracking for accurate stereo pose estimation". Accepted at LXCV @ CVPR 2021.

Source code for paper "ATP: AMRize Than Parse! Enhancing AMR Parsing with PseudoAMRs" @NAACL-2022

Re-implementation of the Noise Contrastive Estimation algorithm for pyTorch, following "Noise-contrastive estimation: A new estimation principle for unnormalized statistical models." (Gutmann and Hyvarinen, AISTATS 2010)

Official code of our work, Unified Pre-training for Program Understanding and Generation [NAACL 2021].

Code for NAACL 2021 full paper "Efficient Attentions for Long Document Summarization"

Official repository with code and data accompanying the NAACL 2021 paper "Hurdles to Progress in Long-form Question Answering" (https://arxiv.org/abs/2103.06332).

Source codes for the paper "Local Additivity Based Data Augmentation for Semi-supervised NER"