SpikeX - SpaCy Pipes for Knowledge Extraction

Erre Quadro Srl

Last update: Dec 12, 2022

Related tags

Text Data & NLP nlp wikipedia clustering spacy named-entity-recognition entity-linking noun-phrase-extract sentence-splitting abbreviations-detection acronym-recognition spacy-pipes verb-phrase-extract wikigraph wikipedia-graph

Overview

SpikeX - SpaCy Pipes for Knowledge Extraction

SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.

What's new in SpikeX 0.5.0

WikiGraph has never been so lightning fast:

🌕 Performance mooning, thanks to the adoption of a sparse adjacency matrix to handle pages graph, instead of using igraph
🚀 Memory optimization, with a consumption cut by ~40% and a compressed size cut by ~20%, introducing new bidirectional dictionaries to manage data
📖 New APIs for a faster and easier usage and interaction
🛠 Overall fixes, for a better graph and a better pages matching

Pipes

WikiPageX links Wikipedia pages to chunks in text
ClusterX picks noun chunks in a text and clusters them based on a revisiting of the Ball Mapper algorithm, Radial Ball Mapper
AbbrX detects abbreviations and acronyms, linking them to their long form. It is based on scispacy's one with improvements
LabelX takes labelings of pattern matching expressions and catches them in a text, solving overlappings, abbreviations and acronyms
PhraseX creates a Doc's underscore extension based on a custom attribute name and phrase patterns. Examples are NounPhraseX and VerbPhraseX, which extract noun phrases and verb phrases, respectively
SentX detects sentences in a text, based on Splitta with refinements

Tools

WikiGraph with pages as leaves linked to categories as nodes
Matcher that inherits its interface from the spaCy's one, but built using an engine made of RegEx which boosts its performance

Install SpikeX

Some requirements are inherited from spaCy:

spaCy version: 2.3+
Operating system: macOS / OS X · Linux · Windows (Cygwin, MinGW, Visual Studio)
Python version: Python 3.6+ (only 64 bit)
Package managers: pip

Some dependencies use Cython and it needs to be installed before SpikeX:

pip install cython

Remember that a virtual environment is always recommended, in order to avoid modifying system state.

pip

At this point, installing SpikeX via pip is a one line command:

pip install spikex

Usage

Prerequirements

SpikeX pipes work with spaCy, hence a model its needed to be installed. Follow official instructions here. The brand new spaCy 3.0 is supported!

WikiGraph

A WikiGraph is built starting from some key components of Wikipedia: pages, categories and relations between them.

Auto

Creating a WikiGraph can take time, depending on how large is its Wikipedia dump. For this reason, we provide wikigraphs ready to be used:

Date	WikiGraph	Lang	Size (compressed)	Size (memory)
2021-04-01	enwiki_core	EN	1.1GB	5.9GB
2021-04-01	simplewiki_core	EN	19MB	120MB
2021-04-01	itwiki_core	IT	189MB	1.1GB
More coming...

SpikeX provides a command to shortcut downloading and installing a WikiGraph (Linux or macOS, Windows not supported yet):

spikex download-wikigraph simplewiki_core

Manual

A WikiGraph can be created from command line, specifying which Wikipedia dump to take and where to save it:

spikex create-wikigraph \
  <YOUR-OUTPUT-PATH> \
  --wiki <WIKI-NAME, default: en> \
  --version <DUMP-VERSION, default: latest> \
  --dumps-path <DUMPS-BACKUP-PATH> \

Then it needs to be packed and installed:

spikex package-wikigraph \
  <WIKIGRAPH-RAW-PATH> \
  <YOUR-OUTPUT-PATH>

Follow the instructions at the end of the packing process and install the distribution package in your virtual environment. Now your are ready to use your WikiGraph as you wish:

from spikex.wikigraph import load as wg_load

wg = wg_load("enwiki_core")
page = "Natural_language_processing"
categories = wg.get_categories(page, distance=1)
for category in categories:
    print(category)

>>> Category:Speech_recognition
>>> Category:Artificial_intelligence
>>> Category:Natural_language_processing
>>> Category:Computational_linguistics

Matcher

The Matcher is identical to the spaCy's one, but faster when it comes to handle many patterns at once (order of thousands), so follow official usage instructions here.

A trivial example:

from spikex.matcher import Matcher
from spacy import load as spacy_load

nlp = spacy_load("en_core_web_sm")
matcher = Matcher(nlp.vocab)
matcher.add("TEST", [[{"LOWER": "nlp"}]])
doc = nlp("I love NLP")
for _, s, e in matcher(doc):
  print(doc[s: e])

>>> NLP

WikiPageX

The WikiPageX pipe uses a WikiGraph in order to find chunks in a text that match Wikipedia page titles.

from spacy import load as spacy_load
from spikex.wikigraph import load as wg_load
from spikex.pipes import WikiPageX

nlp = spacy_load("en_core_web_sm")
doc = nlp("An apple a day keeps the doctor away")
wg = wg_load("simplewiki_core")
wpx = WikiPageX(wg)
doc = wpx(doc)
for span in doc._.wiki_spans:
  print(span._.wiki_pages)

>>> ['An']
>>> ['Apple', 'Apple_(disambiguation)', 'Apple_(company)', 'Apple_(tree)']
>>> ['A', 'A_(musical_note)', 'A_(New_York_City_Subway_service)', 'A_(disambiguation)', 'A_(Cyrillic)')]
>>> ['Day']
>>> ['The_Doctor', 'The_Doctor_(Doctor_Who)', 'The_Doctor_(Star_Trek)', 'The_Doctor_(disambiguation)']
>>> ['The']
>>> ['Doctor_(Doctor_Who)', 'Doctor_(Star_Trek)', 'Doctor', 'Doctor_(title)', 'Doctor_(disambiguation)']

ClusterX

The ClusterX pipe takes noun chunks in a text and clusters them using a Radial Ball Mapper algorithm.

from spacy import load as spacy_load
from spikex.pipes import ClusterX

nlp = spacy_load("en_core_web_sm")
doc = nlp("Grab this juicy orange and watch a dog chasing a cat.")
clusterx = ClusterX(min_score=0.65)
doc = clusterx(doc)
for cluster in doc._.cluster_chunks:
  print(cluster)

>>> [this juicy orange]
>>> [a cat, a dog]

AbbrX

The AbbrX pipe finds abbreviations and acronyms in the text, linking short and long forms together:

from spacy import load as spacy_load
from spikex.pipes import AbbrX

nlp = spacy_load("en_core_web_sm")
doc = nlp("a little snippet with an abbreviation (abbr)")
abbrx = AbbrX(nlp.vocab)
doc = abbrx(doc)
for abbr in doc._.abbrs:
  print(abbr, "->", abbr._.long_form)

>>> abbr -> abbreviation

LabelX

The LabelX pipe matches and labels patterns in text, solving overlappings, abbreviations and acronyms.

from spacy import load as spacy_load
from spikex.pipes import LabelX

nlp = spacy_load("en_core_web_sm")
doc = nlp("looking for a computer system engineer")
patterns = [
  [{"LOWER": "computer"}, {"LOWER": "system"}],
  [{"LOWER": "system"}, {"LOWER": "engineer"}],
]
labelx = LabelX(nlp.vocab, ("TEST", patterns), validate=True, only_longest=True)
doc = labelx(doc)
for labeling in doc._.labelings:
  print(labeling, f"[{labeling.label_}]")

>>> computer system engineer [TEST]

PhraseX

The PhraseX pipe creates a custom Doc's underscore extension which fulfills with matches from phrase patterns.

from spacy import load as spacy_load
from spikex.pipes import PhraseX

nlp = spacy_load("en_core_web_sm")
doc = nlp("I have Melrose and McIntosh apples, or Williams pears")
patterns = [
  [{"LOWER": "mcintosh"}],
  [{"LOWER": "melrose"}],
]
phrasex = PhraseX(nlp.vocab, "apples", patterns)
doc = phrasex(doc)
for apple in doc._.apples:
  print(apple)

>>> Melrose
>>> McIntosh

SentX

The SentX pipe splits sentences in a text. It modifies tokens' is_sent_start attribute, so it's mandatory to add it before parser pipe in the spaCy pipeline:

from spacy import load as spacy_load
from spikex.pipes import SentX
from spikex.defaults import spacy_version

if spacy_version >= 3:
  from spacy.language import Language

    @Language.factory("sentx")
    def create_sentx(nlp, name):
        return SentX()

nlp = spacy_load("en_core_web_sm")
sentx_pipe = SentX() if spacy_version < 3 else "sentx"
nlp.add_pipe(sentx_pipe, before="parser")
doc = nlp("A little sentence. Followed by another one.")
for sent in doc.sents:
  print(sent)

>>> A little sentence.
>>> Followed by another one.

That's all folks

Feel free to contribute and have fun!

Comments

abbreviation difference from scispacy

Hi! scispacy developer here. Could you share what changes you made to our abbreviation detector? I am curious what issues you encountered/fixed (obviously not bothered at all that you based yours off of ours).

opened by dakinggg 2
fixes 'too many values to unpack' on tuple

Hello!

This PR is about fixing issue #6 that I submitted (basically as explained here). Please let me know if it helps (this chunk of documentation is now reproducible).

opened by hp0404 1

Abbrv pipeline errors out

spikex version: spikex-0.4.0.dev2 from source / spacy 2.3.5
Python version: 3.6
Operating System: OSX

Description

Describe what you were trying to get done.

I was trying to test the abbrv pipeline

Tell us what happened, what went wrong, and what you expected to happen.

Copied the example from README

What I Did

import spacy
from spikex.pipes import AbbrX

nlp = spacy.load("en_core_web_sm")

abbrx = AbbrX(nlp)
nlp.add_pipe(abbrx)
doc = abbrx(nlp("a little snippet with abbreviations (abbrs)"))
doc._.abbrs

205         return (
    206             self.vocab.strings.add(key)
--> 207             if key not in self.vocab.strings
    208             else self.vocab.strings[key]
    209         )

AttributeError: 'English' object has no attribute 'strings'

opened by trisongz 1

labelX pipeline errors out

spikex version: 0.5.2
Python version: 3.7.11
Operating System: google.cloud

Description

Describe what you were trying to get done.

I was trying to test the labelX pipeline

Tell us what happened, what went wrong, and what you expected to happen.

Copied the example from README

What I Did

from spacy import load as spacy_load
from spikex.pipes import LabelX

nlp = spacy_load("en_core_web_sm")
doc = nlp("looking for a computer system engineer")
patterns = [
  [{"LOWER": "computer"}, {"LOWER": "system"}],
  [{"LOWER": "system"}, {"LOWER": "engineer"}],
]
labelx = LabelX(vocab=nlp.vocab, labelings=("TEST", patterns), validate=True, only_longest=True)
doc = labelx(doc)
for labeling in doc._.labelings:
  print(labeling, f"[{labeling.label_}]")

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-da0684755206> in <module>()
      9   [{"LOWER": "system"}, {"LOWER": "engineer"}],
     10 ]
---> 11 labelx = LabelX(vocab=nlp.vocab, labelings=("TEST", patterns), validate=True, only_longest=True)
     12 doc = labelx(doc)
     13 for labeling in doc._.labelings:

/usr/local/lib/python3.7/dist-packages/spikex/pipes/labels.py in __init__(self, vocab, labelings, validate, only_longest)
     32         if not labelings or labelings is None:
     33             return
---> 34         for label, patterns in labelings:
     35             self.add(label, patterns)
     36 

ValueError: too many values to unpack (expected 2)

opened by hp0404 0

Cannot download "enwiki_core"

Very insightful packages and explanation, thanks a lot!

Now I encountered a problem. The spikex provides wikigraphs ready to be used, including enwiki_core, simplewiki_core, and itwiki_core. However, it seems that the website that stores these packages has gone (404 not found) so that we can not download "enwiki_core". How can I access to the wikigraph package (i.e., enwiki_core) at this moment? I would be very grateful if someone could help me download these packages.

opened by Hannah123567 0
Incomplete list of categories
spikex version: 0.5.2

Python version: 3.9.7

Operating System: Windows 10

Description

I want to get all categories of a page, but most categories are missing

What I Did

from spikex.wikigraph import load as wg_load page = "Peking_2022" categories = wg.get_categories(page, distance=1)

What I get: ['Category:Olympische_Winterspiele_2022'] The output I expect: ['Austragung der Olympischen Winterspiele', 'Olympische Winterspiele 2022', 'Sport (Hebei)', 'Sportveranstaltung 2022', 'Sportveranstaltung in Peking', 'Wikipedia:Veraltet nach Jahr 2022', 'Zukünftige Sportveranstaltung'] Prove: https://de.wikipedia.org/wiki/Olympische_Winterspiele_2022

I created a categorylinks dictionary from the categorylinks.sql.gz, so that the keys are the page_ids and under each key is the list of categories. I used your functions to get the page_id: page_id = self.get_pageid(self.redirect(page)) and my categorylinks dictionary . With this method I get the expected output. If this behaviour is not desired, I would like to think that there is a problem with the processing of categorylinks.sql.gz on your side.
opened by Fetzii 1
Umlauts
spikex version: 0.5.2

Python version: 3.9.7

Operating System: Windows 10

Description

Getting categories for a page with umlauts from my dewiki_core (Cem Özdemir: https://de.wikipedia.org/wiki/Cem_%C3%96zdemir) It crashes, what shouldn't happen. There is also an english wiki page for him (https://en.wikipedia.org/wiki/Cem_%C3%96zdemir)

What I Did

from spikex.wikigraph import load as wg_load wg = wg_load("dewiki_core") page = "Cem_Özdemir" categories = wg.get_categories(page, distance=1) TypeError: unsupported operand type(s) for +: 'NoneType' and 'int'
opened by Fetzii 2

Creating dewiki_core

spikex version: 0.5.2
Python version: 3.9.7
Operating System: Windows 10

Description

I tryed to create a german wikigraph and got a type error for the compression_wrapper()

What I Did

spikex create-wikigraph de_wiki_graph --wiki de --dumps-path de_wiki_dumps

Traceback (most recent call last):
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\Scripts\spikex.exe\__main__.py", line 7, in <module>
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\spikex\__main__.py", line 23, in main
    typer.run(commands[command])
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\typer\main.py", line 859, in run
    app()
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\typer\main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\click\core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\click\core.py", line 782, in main
    rv = self.invoke(ctx)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\click\core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\click\core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\typer\main.py", line 497, in wrapper
    return callback(**use_params)  # type: ignore
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\spikex\cli\create.py", line 62, in create_wikigraph
    wg = WikiGraph.build(**kwargs)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\spikex\wikigraph\wikigraph.py", line 61, in build
    p, r, d, c, cl = _make_graph_components(**kwargs)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\spikex\wikigraph\wikigraph.py", line 278, in _make_graph_components
    pprops = _get_pprops(**kwargs)
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\spikex\wikigraph\wikigraph.py", line 316, in _get_pprops
    for pageid, prop, value in iter_pprops_data:
  File "C:\Users\Friedrich.Schmidt\Anaconda3\envs\ml\lib\site-packages\spikex\wikigraph\dumptools.py", line 211, in _parse_wiki_sql_dump
    ) as pbar, compression_wrapper(compress_obj, "rb") as decompress_obj:
TypeError: compression_wrapper() missing 1 required positional argument: 'compression'

Additonal Information

https://github.com/RaRe-Technologies/smart_open/blob/develop/smart_open/compression.py -> Line 106: def compression_wrapper(file_obj, mode, compression):

The current "compression_wrapper" function actually expects another argument called "compression", which is not passed at the moment: compression_wrapper(compress_obj, "rb")

I fixed the error locally by adding the missing argument: compression_wrapper(compress_obj, "rb", 'infer_from_extension')

opened by Fetzii 1

How to speed up the progress of adding patterns
spikex version: 0.5.0

Python version:

Operating System: linux

Description

Hey, guys. I found your tool is very powerful, thx for sharing. I met a problem that the time cost is huge, when I was trying to add 30 thousands patterns to initialize LabelX. And this progress is much slower than the spacy, so that I wonder if any solution you guys can propose?
opened by Hunter-Leo 1
spikex download-wikigraph simplewiki_core

Hello,

I tested this command :

spikex download-wikigraph simplewiki_core

In a Jupyter Notebook, it returns :

File "<ipython-input-7-d71a5d9ca149>", line 1 spikex download-wikigraph simplewiki_core ^ SyntaxError: invalid syntax

In Anaconda prompt, it returns:

` (base) C:\WINDOWS\system32>spikex download-wikigraph simplewiki_core Traceback (most recent call last): File "c:\users\ludovic\anaconda3\lib\runpy.py", line 194, in _run_module_as_main return run_code(code, main_globals, None, File "c:\users\ludovic\anaconda3\lib\runpy.py", line 87, in run_code exec(code, run_globals) File "C:\Users\Ludovic\anaconda3\Scripts\spikex.exe_main.py", line 7, in File "c:\users\ludovic\anaconda3\lib\site-packages\spikex_main.py", line 23, in main typer.run(commands[command]) File "c:\users\ludovic\anaconda3\lib\site-packages\typer\main.py", line 859, in run app() File "c:\users\ludovic\anaconda3\lib\site-packages\typer\main.py", line 214, in call return get_command(self)(*args, **kwargs) File "c:\users\ludovic\anaconda3\lib\site-packages\click\core.py", line 829, in call return self.main(*args, **kwargs) File "c:\users\ludovic\anaconda3\lib\site-packages\click\core.py", line 782, in main rv = self.invoke(ctx) File "c:\users\ludovic\anaconda3\lib\site-packages\click\core.py", line 1066, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\users\ludovic\anaconda3\lib\site-packages\click\core.py", line 610, in invoke return callback(*args, **kwargs) File "c:\users\ludovic\anaconda3\lib\site-packages\typer\main.py", line 497, in wrapper return callback(**use_params) # type: ignore File "c:\users\ludovic\anaconda3\lib\site-packages\spikex\cli\download.py", line 46, in download_wikigraph _run_command(f"wget --quiet --show-progress -O {wg_tar} {wg_url}") File "c:\users\ludovic\anaconda3\lib\site-packages\spikex\cli\download.py", line 54, in _run_command return run( File "c:\users\ludovic\anaconda3\lib\subprocess.py", line 493, in run with Popen(*popenargs, **kwargs) as process: File "c:\users\ludovic\anaconda3\lib\subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "c:\users\ludovic\anaconda3\lib\subprocess.py", line 1311, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] Le fichier spécifié est introuvable

(base) C:\WINDOWS\system32>`

What can I do ?

Thank you very much for your help !

opened by lbocken 1

Owner

Erre Quadro Srl

GitHub

Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

3 Feb 27, 2022

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

中文开放信息抽取系统, open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

7 Nov 2, 2022

NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

2k Jan 4, 2023

NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

1.6k Feb 10, 2021

✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

2.6k Jan 4, 2023

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

1.2k Jan 8, 2023

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

1.3k Jan 3, 2023

spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

342 Nov 21, 2022

NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

1.6k Feb 17, 2021

✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

2.2k Feb 18, 2021

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

903 Feb 17, 2021

A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

831 Feb 17, 2021

spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

327 Feb 18, 2021

DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

71 Jan 6, 2023

Augmenty is an augmentation library based on spaCy for augmenting texts.

Augmenty: The cherry on top of your NLP pipeline Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of high

124 Dec 29, 2022

A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

80 Jan 3, 2023

A Domain Specific Language (DSL) for building language patterns. These can be later compiled into spaCy patterns, pure regex, or any other format

RITA DSL This is a language, loosely based on language Apache UIMA RUTA, focused on writing manual language rules, which compiles into either spaCy co

60 Sep 26, 2022

🌸 fastText + Bloom embeddings for compact, full-coverage vectors with spaCy

floret: fastText + Bloom embeddings for compact, full-coverage vectors with spaCy floret is an extended version of fastText that can produce word repr

222 Dec 16, 2022

🧪 Cutting-edge experimental spaCy components and features

spacy-experimental: Cutting-edge experimental spaCy components and features This package includes experimental components and features for spaCy v3.x,

65 Dec 30, 2022