ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts

Anastasia Zhukova

Last update: Oct 7, 2022

Related tags

Deep Learning ANEA

Overview

ANEA

The goal of Automatic (Named) Entity Annotation is to create a small annotated dataset for NER extracted from German domain-specific texts.

Installation and execution

Python 3.8 Required approx. 8Gb of hard memory, 16Gb RAM

Download "numberbatch_voc.txt" from https://drive.google.com/file/d/1Ag3gQUBtmqB-WAGXk67nJwUvMiZ1DdQG/view?usp=sharing and place to

resources/numberbatch

You can either use your own documents stored as a list of strings in a json file, or use a key-word for searching in Wikipedia to get articles to annotate. Place your file into data folder.

Then execute

pip install -r requirements.txt

python -m spacy download de_core_news_sm

run_anea.py

Follow the instructions to choose a folder with your topic to annotate.

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

RoSTER The source code used for Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training, p

60 Dec 30, 2022

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

29 Dec 26, 2022

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker This repository contai

12 Dec 14, 2022

Chinese named entity recognization with BiLSTM using Keras

Chinese named entity recognization (Bilstm with Keras) Project Structure ./ ├── README.md ├── data │ ├── README.md │ ├── data 数据集 │ │ ├─

1 Dec 17, 2021

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

Named-Entity-Recognition-NER-Papers by Pengfei Liu, Jinlan Fu and other contributors. An elaborate and exhaustive paper list for Named Entity Recognit

388 Dec 18, 2022

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

MuVER This repo contains the code and pre-trained model for our EMNLP 2021 paper: MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity

24 May 30, 2022

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning Warning: This is a rapidly evolving research prototype.

190 Dec 27, 2022

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

1.3k Dec 30, 2022

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos. By adopting a unified pipeline-based API design, PyKale enforces standardization and minimalism, via reusing existing resources, reducing repetitions and redundancy, and recycling learning models across areas.

370 Dec 27, 2022

Comments

Error: The word vector model is being loaded.

Hi,

I wanted to try the model but an error occurred:

The word vector model is being loaded. Traceback (most recent call last): File "run_anea.py", line 141, in run() File "run_anea.py", line 124, in run graph = Graph(noun_terms, topN_terms=topN) File "/content/ANEA/category_identificator/ANEA_annotator/graph/graph.py", line 35, in init self._domain_types_identif() File "/content/ANEA/category_identificator/ANEA_annotator/graph/graph.py", line 247, in _domain_types_identif self.model = get_model() File "/content/ANEA/utils/wordvectors.py", line 39, in get_model model = WordEmbeddings() File "/content/ANEA/utils/wordvectors.py", line 17, in init self._model = load_facebook_model(wordvectors[we_name]) File "/usr/local/lib/python3.7/dist-packages/gensim/models/fasttext.py", line 1142, in load_facebook_model return _load_fasttext_format(path, encoding=encoding, full_model=True) File "/usr/local/lib/python3.7/dist-packages/gensim/models/fasttext.py", line 1222, in _load_fasttext_format m = gensim.models._fasttext_bin.load(fin, encoding=encoding, full_model=full_model) File "/usr/local/lib/python3.7/dist-packages/gensim/models/_fasttext_bin.py", line 341, in load raw_vocab, vocab_size, nwords, ntokens = _load_vocab(fin, new_format, encoding=encoding) File "/usr/local/lib/python3.7/dist-packages/gensim/models/_fasttext_bin.py", line 194, in _load_vocab raise NotImplementedError("Supervised fastText models are not supported") NotImplementedError: Supervised fastText models are not supported

opened by arossbach10 1

Owner

Anastasia Zhukova

Doctoral Researcher at the Data & Knowledge Exploration Group

GitHub

ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts

Related tags

Overview

ANEA

Installation and execution

You might also like...

[EMNLP 2021] Distantly-Supervised Named Entity Recognition with Noise-Robust Learning and Language Model Augmented Self-Training

GLaRA: Graph-based Labeling Rule Augmentation for Weakly Supervised Named Entity Recognition

Example Of Fine-Tuning BERT For Named-Entity Recognition Task And Preparing For Cloud Deployment Using Flask, React, And Docker

Chinese named entity recognization with BiLSTM using Keras

An elaborate and exhaustive paper list for Named Entity Recognition (NER)

[EMNLP 2021] MuVER: Improving First-Stage Entity Retrieval with Multi-View Entity Representations

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

TorchGeo is a PyTorch domain library, similar to torchvision, that provides datasets, transforms, samplers, and pre-trained models specific to geospatial data.

PyKale is a PyTorch library for multimodal learning and transfer learning as well as deep learning and dimensionality reduction on graphs, images, texts, and videos

Comments

Error: The word vector model is being loaded.

Owner

Anastasia Zhukova

Variational Attention: Propagating Domain-Specific Knowledge for Multi-Domain Learning in Crowd Counting (ICCV, 2021)

Implementation for "Domain-Specific Bias Filtering for Single Labeled Domain Generalization"

Weakly supervised medical named entity classification

An Easy-to-use, Modular and Prolongable package of deep-learning based Named Entity Recognition Models.

Chinese clinical named entity recognition using pre-trained BERT model

Code for Two-stage Identifier: "Locate and Label: A Two-stage Identifier for Nested Named Entity Recognition"

[ACL-IJCNLP 2021] Improving Named Entity Recognition by External Context Retrieving and Cooperative Learning

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

Source Code For Template-Based Named Entity Recognition Using BART