Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

ASAPP Research

Last update: Dec 27, 2022

Related tags

Deep Learning structshot

Overview

structshot

Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arzoo Katiyar, in EMNLP 2020.

Data

Due to license reason, we are only able to release the full CoNLL 2003 and WNUT 2017 dataset. We also release the support sets that we sampled from the CoNLL/WNUT/I2B2 dev sets to enable the reproducing of our evaluation results.

CoNLL 2003

The CoNLL 2003 NER train/dev/test datasets are data/train.txt, data/dev.txt, and data/test.txt respectively. The labels are available in data/labels.txt.

WNUT 2017

The WNUT 2017 NER dev/test datasets are data/dev-wnut.txt and data/test-wnut.txt respectively. The labels are available in data/labels-wnut.txt.

Support sets for CoNLL 2003, WNUT 2017, and I2B2 2014

The one-shot and five-shot support sets used in the paper are available in data/support-* folders.

Usage

Due to data license limitation, we will show how to do five-shot transfer learning from the CoNLL 2003 dataset to the WNUT 2017 dataset, instead of transfering from the OntoNotes 5 dataset, as presented in our paper.

The first step is to install the package and cd into the structshot directory:

pip install -e .
cd structshot

Pretrain BERT-NER model

The marjority of the code is copied from the HuggingFace transformers repo, which is used to pretrain a BERT-NER model:

# Pretrain a conventional BERT-NER model on CoNLL 2003 
bash run_pl.sh

In our paper, we actually merged B- and I- tags together for pretraining as well.

Few-shot NER with NNShot

Given the pretrained model located at output-model/checkpointepoch=2.ckpt, we now can perform five-shot NER transfer on the WNUT test set:

# Five-shot NER with NNShot
bash run_pred.sh output-model/checkpointepoch=2.ckpt NNShot

We use the IO tagging scheme rather than the BIO tagging scheme due to its simplicity and better performance. I obtained 22.8 F1 score.

Few-shot NER with StructShot

Given the same pretrained model, simply run:

# Five-shot NER with StructShot
bash run_pred.sh output-model/checkpointepoch=2.ckpt StructShot

I obtained 29.5 F1 score. You can tune the parameter tau in the run_pred.sh script based on dev set performance.

Notes

There are a few differences between this implementation and the one reported in the paper due to data license reason etc.:

This implementation pretrains the BERT-NER model with the BIO tagging scheme, while in our paper we uses the IO tagging scheme.
This implementation performs five-shot transfer learning from CoNLL 2003 to WNUT 2017, while in our paper we perform five-shot transfer learning from OntoNotes 5 to CoNLL'03/WNUT'17/I2B2'14.

If you can access OntoNotes 5 and I2B2'14, reproducing the results of the paper should be trivial.

Comments

Do I need GPU to run this code?

I meet an exception when I run this code: pytorch_lightning.utilities.exceptions.MisconfigurationException: You requested GPUs: [0] But your machine only has: []

opened by CaptionwWaterfall 1
Can't load the model
$ ./run_pl.sh Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForTokenClassification: ['cls.predictions.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.dense.bias', 'cls.predictions.decoder.weight', 'cls.seq_relationship.weight', 'cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.transform.LayerNorm.bias']

This IS expected if you are initializing BertForTokenClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPretraining model).

This IS NOT expected if you are initializing BertForTokenClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of BertForTokenClassification were not initialized from the model checkpoint at bert-base-cased and are newly initialized: ['classifier.weight', 'classifier.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
opened by CaptionwWaterfall 1
self.token_classification_task: TokenClassificationTask = token_classification_task_clazz()？

I run the run_pl_ner.py and meet an error in： File "run_pl_ner.py", line 34

self.token_classification_task: TokenClassificationTask = token_classification_task_clazz()

SyntaxError: invalid syntax

How should I deal with it？

opened by CaptionwWaterfall 1
The development set in tag set extension experiment

As described in the paper, the training set and test set are modified in the tag set extension experiment. But what about the development set? If it is masked the same as the training set, then the support set cannot be sampled when testing. If it is masked the same as the test set, then how to pick up checkpoints during training? I am confused with this setting, could you please give me some advice and I would be very grateful.

opened by taonlp 0
Inference on arbitrary texts

Hi! Thank you for this repository.

I have a question: is it possible use this code to predict (extract) entities in unlabeled data? I'm building an application and I would like to use few-shot NER to extract entities based on a small support set. However, since it's a real world app, the "test" data won't have any annotations.

If you could give me some guidance and examples about this possibility I would really appreciate it.

Thank you!

opened by fillipefbr 0
Code associated with Figure 3 in the paper

From the original paper

We project token-level representations obtained from the BERT embedders onto a 2-dimentional space using t-SNE.

And the paper claims that Figure 3 shows the usefulness of pretraining on OntoNotes by showing more compact clusters. However, as the word embeddings returned by transformer model are contextualized, I am wondering how you get the embeddings of individual tokens in the test set and then apply the t-SNE technique. Do you obtain all of the embeddings and then do the average?

Additionally, I could not find the associated code for visualizing embeddings. Would it be possible the code to obtain Figure 3 provided?

opened by guanqun-yang 0
Reproduce performance of structshot

Hi,

Thank you for publishing the source code. I have access to OntoNotes and try to reproduce the results in your paper.

In the case of 1-shot learning with NNshot, I could obtain the same average score, i.e., 33.2, although the scores of each dataset are different from yours. However, I'm struggling with Structshot. My scores on CoNLL is much lower than yours, resulting in a lower average score on the three datasets.

In the case of 5-shot learning, I couldn't obtain as high scores as yours both with NNshot and Structshot. In average, Structshot produced worse performance than NNshot.

I wonder if you use a different set of parameters than the one you publish here. I hope you can help me to shed some light here.

Thank you very much for your time!

Best regards, Nhung

opened by nguyennth 2

Owner

ASAPP Research

AI for Enterprise

GitHub

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Efficient implementations of Product Quantization and its variants using Pytorch and CUDA

146 Dec 28, 2022

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

57 Jan 1, 2023

Optimal space decomposition based-product quantization for approximate nearest neighbor search

Optimal space decomposition based-product quantization for approximate nearest neighbor search Abstract Product quantization(PQ) is an effective neare

1 Nov 19, 2021

K-Nearest Neighbor in Pytorch

Pytorch KNN CUDA 2019/11/02 This repository will no longer be maintained as pytorch supports sort() and kthvalue on tensors. git clone https://github.

65 Dec 1, 2022

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

GPU implementation of kNN and SNN GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors Supported by numba cuda and faiss library E

7 Nov 23, 2022

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

DeepNER An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models. This repository contains complex Deep

9 May 30, 2022

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

CLNER The code is for our ACL-IJCNLP 2021 paper: Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning CLNER is a

71 Dec 8, 2022

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

RoSTER The source code used for Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training, p

60 Dec 30, 2022

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

12 Dec 14, 2022

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

Related tags

Overview

structshot

Data

Usage

Pretrain BERT-NER model

Few-shot NER with NNShot

Few-shot NER with StructShot

Notes

Comments

Do I need GPU to run this code?

Can't load the model

self.token_classification_task: TokenClassificationTask = token_classification_task_clazz()？

The development set in tag set extension experiment

Inference on arbitrary texts

Code associated with Figure 3 in the paper

Reproduce performance of structshot

Owner

ASAPP Research

TorchPQ is a python library for Approximate Nearest Neighbor Search (ANNS) and Maximum Inner Product Search (MIPS) on GPU using Product Quantization (PQ) algorithm.

Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)

Optimal space decomposition based-product quantization for approximate nearest neighbor search

K-Nearest Neighbor in Pytorch

GPU implementation of $k$-Nearest Neighbors and Shared-Nearest Neighbors

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Chinese clinical named entity recognition using pre-trained BERT model

Source Code For Template-Based Named Entity Recognition Using BART

“Data Augmentation for Cross-Domain Named Entity Recognition” (EMNLP 2021)

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

Few-NERD: Not Only a Few-shot NER Dataset

Weakly supervised medical named entity classification