DaCy: The State of the Art Danish NLP pipeline using SpaCy

Kenneth Enevoldsen

Last update: Jan 6, 2023

Related tags

Overview

DaCy: A SpaCy NLP Pipeline for Danish

DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Art performance on all Benchmark tasks for Danish. This repository contains code for reproducing DaCy. To download the models use the DaNLP package (request pending), SpaCy (request pending) or downloading the project directly here.

Reproduction

the folder DaCy contains a SpaCy project which will allow for a reproduction of the results. This folder also includes the evaluation metrics on DaNE.

Usage

To load in the project using the direct download simple place the downloaded "packages" folder in your directory load the model using SpaCy:

import spacy
nlp = spacy.load("da_dacy_large_tft-0.0.0")

More explicitly from the unpacked folder it is:

nlp = spacy.load("da_dacy_large_tft-0.0.0/da_dacy_large_tft/da_dacy_large_tft-0.0.0")

Thus if you get an error you might be loading from the outer folder called da_dacy_large_tft-0.0.0 rather than the inner.

To obtains SOTA performance in lemmatization as well you should add this lemmatization pipeline as well:

import lemmy.pipe

pipe = lemmy.pipe.load('da')

# Add the component to the spaCy pipeline.
nlp.add_pipe(pipe, after='tagger')

# Lemmas can now be accessed using the `._.lemmas` attribute on the tokens.
nlp("akvariernes")[0]._.lemmas

This requires you install the package beforehand, this is done easily using:

pip install lemmy

Performance and Training

The following table show the performance on DaNE when compared to other models. Highest scores are highlighted with bold and second highest is underlined

Want to learn more about how the model was trained, check out this blog post.

Issues and Usage Q&A

To ask questions, report issues or request features 🤔 , please use the GitHub Issue Tracker. Question related to SpaCy is referred to the SpaCy GitHub or forum.

Acknowledgements

This is really an acknowledgement of great open-source software and contributors. This wouldn't have been possible with the work by the SpaCy team which developed an integrated the software. Huggingface for developing Transformers and making model sharing convenient. BotXO for training and sharing the Danish BERT model and Malte Bertelsen for making it easily available. DaNLP has made it extremely easy to get access to Danish resources to train on and even supplied some of the tagged data themselves and does a great job of actually developing these datasets.

References

If you use this library in your research, please kindly cite:

@inproceedings{enevoldsen2020dacy,
    title={DaCy: A SpaCy NLP Pipeline for Danish},
    author={Enevoldsen, Kenneth},
    year={2021}
}

LICENSE

DaCy is released under the Apache License, Version 2.0. See the LICENSE file for more details.

Comments

Make cache dir configurable

I would like to make the default cache dir configurable with an environmental variable. This is a simple PR to allow one to do that with the variable DACY_CACHE_DIR.

opened by dhpollack 9
Remove protobuf dependency

dacy has a very tight version bound on some auxiliary libraries like protobuf. It's not apparent why this is required as it does not appear to be a library used internally, but it could of course be intentional. But the version is lagging enough that it is starting to cause compatibility problems with other libraries, so if it can be relaxed that would be very helpful.
enhancement

opened by Bonnevie 4
Add Tutorials: "Extracting text statistics and readability metrics using DaCy and Textdescriptives"

After removing readability it would be nice with a tutorial on: "Extracting text statistics and readability metrics using DaCy and Textdescriptives"

Potentially using the packages to describe the examining the language complexity between conversational data and legal documents on DAGW or a similar task using a publicly available dataset.
enhancement

opened by KennethEnevoldsen 4
loosen requirements

The requirements of this package are unnecessarily strict. Specifically, I am having issues with tqdm. I have a more in-depth explaination in the issue that I create centre-for-humanities-computing/DaCy#75. There are also a few optimizations to your setup.py file. I notice that the requirements.txt file is not used, which could cause a mismatch when doing pip install -r requirements.txt and pip install .

opened by dhpollack 4
[pre-commit.ci] pre-commit autoupdate
updates:

github.com/asottile/pyupgrade: v2.38.0 → v3.3.1

github.com/asottile/add-trailing-comma: v2.3.0 → v2.4.0

github.com/PyCQA/docformatter: v1.5.0 → v1.5.1

github.com/psf/black: 22.8.0 → 22.12.0

github.com/charliermarsh/ruff-pre-commit: v0.0.194 → v0.0.206
opened by pre-commit-ci[bot] 3
ContextualVersionConflict Traceback (most recent call last)
Moved from #133, originally posted by @EaLindhardt

I've tried to download dacy through anaconda, both with pip and conda install and the different ways of installing: https://centre-for-humanities-computing.github.io/DaCy/installation.html

when running

import dacy

i get the following

`--------------------------------------------------------------------------- ContextualVersionConflict Traceback (most recent call last) Input In [14], in <cell line: 1>() ----> 1 import dacy

File ~\AppData\Roaming\Python\Python39\site-packages\dacy_init_.py:4, in 1 from dacy.hate_speech import make_offensive_transformer # noqa 2 from dacy.sentiment import make_emotion_transformer # noqa ----> 4 from .about import download_url, title, version # noqa 5 from .download import download_model # noqa 6 from .load import load, models, where_is_my_dacy

File ~\AppData\Roaming\Python\Python39\site-packages\dacy\about.py:3, in 1 import pkg_resources ----> 3 version = pkg_resources.get_distribution("dacy").version 4 title = "dacy" 5 download_url = "https://github.com/centre-for-humanities-computing/DaCy"

File ~\Anaconda3\lib\site-packages\pkg_resources_init_.py:477, in get_distribution(dist) 475 dist = Requirement.parse(dist) 476 if isinstance(dist, Requirement): --> 477 dist = get_provider(dist) 478 if not isinstance(dist, Distribution): 479 raise TypeError("Expected string, Requirement, or Distribution", dist)

File ~\Anaconda3\lib\site-packages\pkg_resources_init_.py:353, in get_provider(moduleOrReq) 351 """Return an IResourceProvider for the named module or requirement""" 352 if isinstance(moduleOrReq, Requirement): --> 353 return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0] 354 try: 355 module = sys.modules[moduleOrReq]

File ~\Anaconda3\lib\site-packages\pkg_resources_init_.py:897, in WorkingSet.require(self, *requirements) 888 def require(self, *requirements): 889 """Ensure that distributions matching requirements are activated 890 891 requirements must be a string or a (possibly-nested) sequence (...) 895 included, even if they were already activated in this working set. 896 """ --> 897 needed = self.resolve(parse_requirements(requirements)) 899 for dist in needed: 900 self.add(dist)

File ~\Anaconda3\lib\site-packages\pkg_resources_init_.py:788, in WorkingSet.resolve(self, requirements, env, installer, replace_conflicting, extras) 785 if dist not in req: 786 # Oops, the "best" so far conflicts with a dependency 787 dependent_req = required_by[req] --> 788 raise VersionConflict(dist, req).with_context(dependent_req) 790 # push the new requirements onto the stack 791 new_requirements = dist.requires(req.extras)[::-1]

ContextualVersionConflict: (spacy 3.3.1 (c:\users\au576018\anaconda3\lib\site-packages), Requirement.parse('spacy<3.3.0,>=3.2.0'), {'dacy'})`

How do I solve this?

@EaLindhardt will you please add the following information:

DaCy Version Used:

Operating System:

Python Version Used:

spaCy Version Used:

Environment Information:

you can also type python -m spacy info --markdown and copy-paste the result here along with the DaCy version, which you can get using python -c "import dacy; print(dacy.__version__)"
bug Stale
opened by KennethEnevoldsen 3
Update WandbLogger in configs to v2

Update WandbLogger in configs to v2. This version has the same experiment tracking features as v1 but also has model checkpointing and dataset versioning possibilities.

opened by scottire 3
Augmentation
[x] Entity augmentation

[x] Gender augmentation (awareness of gender)

[x] Second order person augmentation (Lastname, Firstname)

[ ] Usernames (autogenerates e.g. WhiteTruffle101 or Kenneth Enevoldsen -> KennethEnevoldsen)

[ ] Mispellings Augmentations, se e.g. this repo

[x] Keystroke error based on keyboard distance

[ ] Historic augmentations

[x] æ->ae, å -> aa (and a), ø->oe

[ ] uppercasing of nouns

[ ] Social media

[ ] Adding hashtags augmentation

[ ] Others, potentially see this tweet or this kaggle summary

enhancement
opened by KennethEnevoldsen 3
:arrow_up: Update sphinxext-opengraph requirement from <0.7.0,>=0.6.3 to >=0.6.3,<0.8.0
Updates the requirements on sphinxext-opengraph to permit the latest version.

Release notes

Sourced from sphinxext-opengraph's releases.

v0.7.4

What's Changed

Use Sphinx Builder method to get page url by @attakei in wpilibsuite/sphinxext-opengraph#89

New Contributors

@attakei made their first contribution in wpilibsuite/sphinxext-opengraph#89

Full Changelog: https://github.com/wpilibsuite/sphinxext-opengraph/compare/v0.7.3...v0.7.4

Commits

5558484 Use Sphinx Builder method to get page url (#89)

7469f2b Take default og:site_name from sphinx project config value (#83)

fc41303 Allow dirhtml builder without ogp_site_url (#84)

82c3eb4 docs: fix override names (#85)

b9f1dea Also publish sdist to PyPI (#82)

94518c9 Don't run CI on tag and push (#80)

d6eec0d Create wheel with version number not "main" (#79)

c47439c Do not append index with dirhtml (#78)

a2d9acc Add support for meta description (#72)

93148a6 ci: Pin PyPI publish action to v1 (#75)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies python
opened by dependabot[bot] 2
[pre-commit.ci] pre-commit autoupdate
updates:

github.com/asottile/pyupgrade: v2.34.0 → v2.37.3

github.com/myint/docformatter: v1.3.1 → v1.5.0

github.com/psf/black: 22.3.0 → 22.8.0

github.com/PyCQA/flake8: 4.0.1 → 5.0.4
opened by pre-commit-ci[bot] 2
:arrow_up: Bump schneegans/dynamic-badges-action from 1.2.0 to 1.3.0
Bumps schneegans/dynamic-badges-action from 1.2.0 to 1.3.0.

Release notes

Sourced from schneegans/dynamic-badges-action's releases.

Dynamic Badges v1.3.0

This release adds the possibility to auto-generate the badge color. You can read the full changelog.

Changelog

Sourced from schneegans/dynamic-badges-action's changelog.

Dynamic Badges Action 1.3.0

Release Date: 2022-04-18

Changes

Added the possibility to generate the badge color automatically between red and green based on a numerical value and its bounds. Thanks to @LucasWolfgang for this contribution!

Commits

a6775a6 :memo: Add changelog entry

7ce4e74 :wrench: USe color range for example badge

a3f7e7f :memo: Improve documentation

6511e52 :memo: Tweak documentation

e43bdee :sparkles: Tweak formatting of the code

3dd7c22 :sparkles: Apply clang-format

ee32073 :wrench: Fix typo

9bce11b :Thanks again! : Merge pull request #11 from LucasWolfgang/master

53c821a :tada: Added saturation and lightness parameters

6363528 :tada: Added saturation and lightness parameters

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies github_actions
opened by dependabot[bot] 2

Address cuda warnings and spaCy version warning.

When running:

import dacy

for model in dacy.models():
    print(model)

dacy_nlp = dacy.load('medium')

doc = dacy_nlp("DaCy er en hurtig og effektiv pipeline til dansk sprogprocessering bygget i SpaCy.")

print('hej')

I get the following warning:


da_dacy_small_tft-0.0.0
da_dacy_medium_tft-0.0.0
da_dacy_large_tft-0.0.0
da_dacy_small_trf-0.1.0
da_dacy_medium_trf-0.1.0
da_dacy_large_trf-0.1.0
/venv/lib/python3.9/site-packages/spacy/util.py:833: UserWarning: [W095] Model 'da_dacy_medium_trf' (0.1.0) was trained with spaCy v3.1 and may not be 100% compatible with the current version (3.2.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
/venv/lib/python3.9/site-packages/spacy/util.py:833: UserWarning: [W095] Model 'da_dacy_small_trf' (0.1.0) was trained with spaCy v3.1 and may not be 100% compatible with the current version (3.2.4). If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spaCy version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
/venv/lib/python3.9/site-packages/spacy_transformers/pipeline_component.py:406: UserWarning: Automatically converting a transformer component from spacy-transformers v1.0 to v1.1+. If you see errors or degraded performance, download a newer compatible model or retrain your custom model with the current spacy-transformers version. For more details and available updates, run: python -m spacy validate
  warnings.warn(warn_msg)
/venv/lib/python3.9/site-packages/torch/amp/autocast_mode.py:198: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
  warnings.warn('User provided device_type of \'cuda\', but CUDA is not available. Disabling')
/venv/lib/python3.9/site-packages/spacy/pipeline/attributeruler.py:150: UserWarning: [W036] The component 'matcher' does not have any patterns defined.
  matches = self.matcher(doc, allow_missing=True, as_spans=False)
hej

Notably this this includes three warning, including SpaCy version, cuda device and matcher object (see also #72)

originally version sent to me by mail

Note: While this is a warning there, DaCy still works as intended. The version of spaCy does not influence model performance.

opened by KennethEnevoldsen 2

Releases(v2.3.1)

v2.3.1(Jan 9, 2023)
Fix

Update about.py (#178) (e548c82)

Documentation

Fixed markdown link in .rst file (fe424be)

Added news (d648ac1)

Source code(tar.gz)
Source code(zip)
dacy-2.3.1-py3-none-any.whl(53.23 KB)
dacy-2.3.1.tar.gz(4.58 MB)
v2.3.0(Jan 5, 2023)
Feature

Added scandi-ner model to dacy (dc2a514)

Added scandi-ner model to dacy (a2bec3b)

Documentation

Added news (f5b64da)

Added news (ed991b8)

Model was trained by dan nielsen (be4471b)

Remove rubbish from changelog (35adc3b)

Fix minor (b363a4e)

Source code(tar.gz)
Source code(zip)
dacy-2.3.0.tar.gz(4.52 MB)
dacy-2.3.0-py3-none-any.whl(53.22 KB)
v2.2.9(Jan 3, 2023)
Fix

Update huggingface name for wrapped models (cbb3f3b)

Source code(tar.gz)
Source code(zip)
dacy-2.2.9.tar.gz(4.52 MB)
dacy-2.2.9-py3-none-any.whl(52.34 KB)
v2.2.8(Jan 3, 2023)
Fix

Cleaning up ci for final test of semantic release (9ac1320)

Source code(tar.gz)
Source code(zip)
dacy-2.2.8.tar.gz(4.52 MB)
dacy-2.2.8-py3-none-any.whl(52.33 KB)
v2.2.7(Jan 3, 2023)
Fix

Semantic release (1b35ee4)

Source code(tar.gz)
Source code(zip)
dacy-2.2.7.tar.gz(4.52 MB)
dacy-2.2.7-py3-none-any.whl(52.33 KB)
v2.0.0(Aug 8, 2022)
2.0.0

Added models for hate-speech detection and classification

A large part of DaCy is now moved to separate packages to allow for more versatility:

Now uses spacy-wrap <https://github.com/KennethEnevoldsen/spacy-wrap>__ for including existing models in DaCy. Fixes #71

Removed augmenters, they are now available through the external package, augmenty. Fixes #59

Removed the rule-based sentiment pipeline instead we recommend using asent. Fixes #60

Removed support for multiple installs, thus pip install dacy[all] and dacy[large] is no longer required. This should simplify installation and lead to less errors.

Documentation

New tutorial on using the sentiment models, including emotions detection, subjectivity detection and polarity classifcation.

New tutorial on using the hate speech classification and detection.

Multiple updated on function and package documentation

Multiple bugfixes

Source code(tar.gz)
Source code(zip)
v1.0.0(Jul 10, 2021)
DaCy version 1.0.0 releases as the first version to pypi! 📦

Including a series of augmenters with a few specifically designed for Danish

Code for behavioral tests of NLP pipelines

A new tutorial for both 📖

The first paper on DaCy; check it out as a preprint and code for reproducing it here! 🌟

A new beautiful hand-drawn logo 🤩

A test for biases and robustness in Danish NLP pipelines 🧐

DaCy is now officially supported by the Centre for Humanities Computing at Aarhus University

And more

Source code(tar.gz)
Source code(zip)
v0.0.0(Mar 15, 2021)

The initial release of DaCy. There will be many more to come.
Source code(tar.gz)
Source code(zip)
da_dacy_large_tft-0.0.0.zip(1734.38 MB)
da_dacy_medium_tft-0.0.0.tar.gz(393.73 MB)

Owner

Kenneth Enevoldsen

Student and Instructor at Cognitive Science Aarhus University Student Programmer at CHCAA, Junior Waste management consultant at JHN Processor

GitHub

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

3 Feb 27, 2022

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

12.3k Dec 31, 2022

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

10k Feb 18, 2021

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

1.4k Feb 18, 2021

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

12.3k Jan 2, 2023

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

New State-of-the-Art in Preposition Sense Disambiguation Supervisor: Prof. Dr. Alexander Mehler Alexander Henlein Institutions: Goethe University TTLa

4 Apr 6, 2022

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

1.3k Jan 3, 2023

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

831 Feb 17, 2021

NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

2k Jan 4, 2023

NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

1.6k Feb 10, 2021

NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

1.6k Feb 17, 2021

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Text-Summarization-using-NLP Text Summarization using NLP to fetch BBC News Arti

21 Aug 6, 2022

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

1.6k Dec 27, 2022

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

1.5k Feb 11, 2021

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

1.5k Feb 17, 2021

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

NLP-Summarizer Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5 This project aimed to provide in

1 Feb 7, 2022

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

6.4k Jan 9, 2023

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

4 Jul 20, 2022

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages

EasyNMT - Easy to use, state-of-the-art Neural Machine Translation This package provides easy to use, state-of-the-art machine translation for more th

748 Jan 6, 2023

DaCy: The State of the Art Danish NLP pipeline using SpaCy

Related tags

Overview

DaCy: A SpaCy NLP Pipeline for Danish

Reproduction

Usage

Performance and Training

Issues and Usage Q&A

Acknowledgements

References

LICENSE

Comments

v0.7.4

What's Changed

New Contributors

Dynamic Badges v1.3.0

Changes

Releases(v2.3.1)

v2.3.1(Jan 9, 2023)

Fix

Documentation

v2.3.0(Jan 5, 2023)

Feature

Documentation

v2.2.9(Jan 3, 2023)

Fix

v2.2.8(Jan 3, 2023)

Fix

v2.2.7(Jan 3, 2023)

Fix

v2.0.0(Aug 8, 2022)

v1.0.0(Jul 10, 2021)

v0.0.0(Mar 15, 2021)

Owner

Kenneth Enevoldsen

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art Natural Language Processing (NLP)

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

A very simple framework for state-of-the-art Natural Language Processing (NLP)

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

A full spaCy pipeline and models for scientific/biomedical documents.

A full spaCy pipeline and models for scientific/biomedical documents.

NLP, before and after spaCy

NLP, before and after spaCy

NLP, before and after spaCy

Text-Summarization-using-NLP - Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

Natural language processing summarizer using 3 state of the art Transformer models: BERT, GPT2, and T5

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Study German declensions (dER nettE Mann, ein nettER Mann, mit dEM nettEN Mann, ohne dEN nettEN Mann ...) Generate as many exercises as you want using the incredible power of SPACY!

Easy to use, state-of-the-art Neural Machine Translation for 100+ languages