spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines
spaCy-wrap is minimal library intended for wrapping fine-tuned transformers from the Huggingface model hub in your spaCy pipeline allowing inclusion of existing models within SpaCy workflows.
As for as possible it follows a similar API as spacy-transformers.
Installation
Installing spacy-wrap is simple using pip:
pip install spacy_wrap
There is no reason to update from GitHub as the version on PyPI should always be the same as on GitHub.
Example
The following shows a simple example of how you can quickly add a fine-tuned transformer model from the Huggingface model hub. In this example we will use the sentiment model by Barbieri et al. (2020) for classifying whether a tweet is positive, negative or neutral. We will add this model to a blank English pipeline:
import spacy
import spacy_wrap
nlp = spacy.blank("en")
config = {
"doc_extension_trf_data": "clf_trf_data", # document extention for the forward pass
"doc_extension_prediction": "sentiment", # document extention for the prediction
"labels": ["negative", "neutral", "positive"],
"model": {
"name": "cardiffnlp/twitter-roberta-base-sentiment", # the model name or path of huggingface model
},
}
transformer = nlp.add_pipe("classification_transformer", config=config)
transformer.model.initialize()
doc = nlp("spaCy is a wonderful tool")
print(doc._.clf_trf_data)
# TransformerData(wordpieces=...
print(doc._.sentiment)
# 'positive'
print(doc._.sentiment_prob)
#{'prob': array([0.004, 0.028, 0.969], dtype=float32), 'labels': ['negative', 'neutral', 'positive']}
These pipelines can also easily be applied to multiple documents using the nlp.pipe
as one would expect from a spaCy component:
docs = nlp.pipe(
[
"I hate wrapping my own models",
"Isn't there a tool for this?",
"spacy-wrap is great for wrapping models",
]
)
for doc in docs:
print(doc._.sentiment)
# 'negative'
# 'neutral'
# 'positive'
More Examples
It is always nice to have more than one example. Here is another one where we add the Hate speech model for Danish to a blank Danish pipeline:
import spacy
import spacy_wrap
nlp = spacy.blank("da")
config = {
"doc_extension_trf_data": "clf_trf_data", # document extention for the forward pass
"doc_extension_prediction": "hate_speech", # document extention for the prediction
"labels": ["Not hate Speech", "Hate speech"],
"model": {
"name": "DaNLP/da-bert-hatespeech-detection", # the model name or path of huggingface model
},
}
transformer = nlp.add_pipe("classification_transformer", config=config)
transformer.model.initialize()
doc = nlp("Senile gamle idiot") # old senile idiot
doc._.clf_trf_data
# TransformerData(wordpieces=...
doc._.hate_speech
# "Hate speech"
doc._.hate_speech_prob
# {'prob': array([0.013, 0.987], dtype=float32), 'labels': ['Not hate Speech', 'Hate speech']}
📖
Documentation
Documentation | |
---|---|
|
Installation instructions for spacy-wrap. |
|
New additions, changes and version history. |
|
The reference for spacy-wrap's API. |
💬
Where to ask questions
Type | |
---|---|
|
FAQ |
|
GitHub Issue Tracker |
|
GitHub Issue Tracker |
|
GitHub Discussions |
|
GitHub Discussions |