Snips Python library to extract meaning from text

Snips

Last update: Dec 30, 2022

Related tags

Text Data & NLP python nlp bot machine-learning text-classification chatbot nlu ml information-extraction named-entity-recognition machine-learning-library ner snips slot-filling intent-classification intent-parser

Overview

Snips NLU

https://img.shields.io/pypi/v/snips-nlu.svg?branch=master

https://img.shields.io/pypi/pyversions/snips-nlu.svg?branch=master

https://img.shields.io/twitter/url/http/shields.io.svg?style=social

Snips NLU (Natural Language Understanding) is a Python library that allows to extract structured information from sentences written in natural language.

Summary

What is Snips NLU about ?
Getting Started
API Usage
- Sample code
- Command Line Interface
Sample datasets
Benchmarks
Documentation
Citing Snips NLU
FAQ & Community
Related content
How do I contribute ?
Licence

What is Snips NLU about ?

Behind every chatbot and voice assistant lies a common piece of technology: Natural Language Understanding (NLU). Anytime a user interacts with an AI using natural language, their words need to be translated into a machine-readable description of what they meant.

The NLU engine first detects what the intention of the user is (a.k.a. intent), then extracts the parameters (called slots) of the query. The developer can then use this to determine the appropriate action or response.

Let’s take an example to illustrate this, and consider the following sentence:

"What will be the weather in paris at 9pm?"

Properly trained, the Snips NLU engine will be able to extract structured data such as:

{
   "intent": {
      "intentName": "searchWeatherForecast",
      "probability": 0.95
   },
   "slots": [
      {
         "value": "paris",
         "entity": "locality",
         "slotName": "forecast_locality"
      },
      {
         "value": {
            "kind": "InstantTime",
            "value": "2018-02-08 20:00:00 +00:00"
         },
         "entity": "snips/datetime",
         "slotName": "forecast_start_datetime"
      }
   ]
}

In this case, the identified intent is searchWeatherForecast and two slots were extracted, a locality and a datetime. As you can see, Snips NLU does an extra step on top of extracting entities: it resolves them. The extracted datetime value has indeed been converted into a handy ISO format.

Check out our blog post to get more details about why we built Snips NLU and how it works under the hood. We also published a paper on arxiv, presenting the machine learning architecture of the Snips Voice Platform.

Getting Started

System requirements

Python 2.7 or Python >= 3.5
RAM: Snips NLU will typically use between 100MB and 200MB of RAM, depending on the language and the size of the dataset.

Installation

pip install snips-nlu

We currently have pre-built binaries (wheels) for snips-nlu and its dependencies for MacOS (10.11 and later), Linux x86_64 and Windows.

For any other architecture/os snips-nlu can be installed from the source distribution. To do so, Rust and setuptools_rust must be installed before running the pip install snips-nlu command.

Language resources

Snips NLU relies on external language resources that must be downloaded before the library can be used. You can fetch resources for a specific language by running the following command:

python -m snips_nlu download en

Or simply:

snips-nlu download en

The list of supported languages is available at this address.

API Usage

Command Line Interface

The easiest way to test the abilities of this library is through the command line interface.

First, start by training the NLU with one of the sample datasets:

snips-nlu train path/to/dataset.json path/to/output_trained_engine

Where path/to/dataset.json is the path to the dataset which will be used during training, and path/to/output_trained_engine is the location where the trained engine should be persisted once the training is done.

After that, you can start parsing sentences interactively by running:

snips-nlu parse path/to/trained_engine

Where path/to/trained_engine corresponds to the location where you have stored the trained engine during the previous step.

Sample code

Here is a sample code that you can run on your machine after having installed snips-nlu, fetched the english resources and downloaded one of the sample datasets:

>>> from __future__ import unicode_literals, print_function
>>> import io
>>> import json
>>> from snips_nlu import SnipsNLUEngine
>>> from snips_nlu.default_configs import CONFIG_EN
>>> with io.open("sample_datasets/lights_dataset.json") as f:
...     sample_dataset = json.load(f)
>>> nlu_engine = SnipsNLUEngine(config=CONFIG_EN)
>>> nlu_engine = nlu_engine.fit(sample_dataset)
>>> text = "Please turn the light on in the kitchen"
>>> parsing = nlu_engine.parse(text)
>>> parsing["intent"]["intentName"]
'turnLightOn'

What it does is training an NLU engine on a sample weather dataset and parsing a weather query.

Sample datasets

Here is a list of some datasets that can be used to train a Snips NLU engine:

Lights dataset: "Turn on the lights in the kitchen", "Set the light to red in the bedroom"
Beverage dataset: "Prepare two cups of cappucino", "Make me a cup of tea"
Flights dataset: "Book me a flight to go to boston this weekend", "book me some tickets from istanbul to moscow in three days"

Benchmarks

In January 2018, we reproduced an academic benchmark which was published during the summer 2017. In this article, authors assessed the performance of API.ai (now Dialogflow, Google), Luis.ai (Microsoft), IBM Watson, and Rasa NLU. For fairness, we used an updated version of Rasa NLU and compared it to the latest version of Snips NLU (both in dark blue).

In the figure above, F1 scores of both intent classification and slot filling were computed for several NLU providers, and averaged accross the three datasets used in the academic benchmark mentionned before. All the underlying results can be found here.

Documentation

To find out how to use Snips NLU please refer to the package documentation, it will provide you with a step-by-step guide on how to setup and use this library.

Citing Snips NLU

Please cite the following paper when using Snips NLU:

@article{coucke2018snips,
  title   = {Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces},
  author  = {Coucke, Alice and Saade, Alaa and Ball, Adrien and Bluche, Th{\'e}odore and Caulier, Alexandre and Leroy, David and Doumouro, Cl{\'e}ment and Gisselbrecht, Thibault and Caltagirone, Francesco and Lavril, Thibaut and others},
  journal = {arXiv preprint arXiv:1805.10190},
  pages   = {12--16},
  year    = {2018}
}

FAQ & Community

Please join the forum to ask your questions and get feedback from the community.

How do I contribute ?

Please see the Contribution Guidelines.

Licence

This library is provided by Snips as Open Source software. See LICENSE for more information.

Geonames Licence

The snips/city, snips/country and snips/region builtin entities rely on software from Geonames, which is made available under a Creative Commons Attribution 4.0 license international. For the license and warranties for Geonames please refer to: https://creativecommons.org/licenses/by/4.0/legalcode.

Comments

problem installing snips on windows

After installing Visual Studio 2017 and Rust, I finally ran pip install snips-nlu and the following error was displayed error[E0425]: cannot find function parse_crate in module syn --> C:\Users\jhg.cargo\registry\src\github.com-1ecc6299db9ec823\cbindgen-0.4.3\src\bindgen\parser.rs:167:30

What can I do ?

opened by hvaneylen 14

[Windows] Generating custom dataset fails

I'm following the tutorial, so I have three intent files, setTemperature.txt, turnLightsOn.txt, and turnLightsOff.txt, and one entity file, rooms.txt. Running

generate-dataset --language en --intent-files turnLightOn.txt turnLightOff.txt setTemperature.txt --entity-files room.txt > dataset.json

generates a dataset, but when I do

 with io.open("dataset.json") as f:
    dataset = json.load(f)
engine.fit(dataset)

I get the following error

Traceback (most recent call last):
  File "C:\Users\J90779\.spyder-py3\testSnipIdle.py", line 12, in <module>
    engine.fit(dataset)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\utils.py", line 259, in wrapped
    res = fn(*args, **kwargs)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\nlu_engine\nlu_engine.py", line 95, in fit
    recycled_parser.fit(dataset, force_retrain)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\utils.py", line 259, in wrapped
    res = fn(*args, **kwargs)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\intent_parser\probabilistic_intent_parser.py", line 84, in fit
    self.slot_fillers[intent_name].fit(dataset, intent_name)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\utils.py", line 259, in wrapped
    res = fn(*args, **kwargs)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py", line 133, in fit
    for sample in crf_samples]
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py", line 133, in <listcomp>
    for sample in crf_samples]
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py", line 203, in compute_features
    value = feature.compute(i, cache)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\feature.py", line 59, in compute
    value = self.function(tokens, token_index + self.offset)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\slot_filler\feature_factory.py", line 498, in builtin_entity_match
    text, self.language, scope=[builtin_entity], use_cache=True)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\builtin_entities.py", line 46, in get_builtin_entities
    return parser.parse(text, scope=scope, use_cache=use_cache)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu\builtin_entities.py", line 26, in parse
    parser_result = self.parser.parse(text, scope)
  File "C:\Users\J90779\AppData\Local\Programs\Python\Python36\lib\site-packages\snips_nlu_ontology\builtin_entities.py", line 159, in parse
    self._parser, text.encode("utf8"), scope, byref(ptr))
OSError: [WinError -529697949] Windows Error 0xe06d7363

Everything is in the same folder, and fitting the sample_dataset.json works just fine. I also tried to manually type out a dataset following the formatting in sample_dataset, but I get the same error. Is this an issue on my side or something else.

Also running snips version 0.14.0

bug

opened by Brannonj96 13

different & low scores in snips-nlu

Hello team I installed in Linux red hat, I able to train the data I just want intent classification. Say we have trained the model with a utterance like -> "Unable to edit the bill for my request, which is submitted for approval". And after training I asked in a different manner -> "submitted for approval but unable to edit the bill". Now I get very good confidence score (0.88) if my number of intents are very small say 2-3. When the number of intents are more say close to 30 and I ask the same question -> "submitted for approval but cannot to edit it's bills". then it's giving very low score close to 0.45.

Even I tried to tweak values "max_iter & random_state=0 which improved the score to .51 but still it's low w.r.t. first score. Can you tell me do I need to tweak anywhere else in config file or not & if yes, then is there any rules based on which we should calibrate those parameters?

opened by deepankar27 13
cache gazetter steaming words to faster load models that use NgramFactory features
Hi team,

In a project that I'm working on, we have a dynamic number of models, and we can't cache them all on each python process.

So we have to load them dynamically often. Loading a sample model, which has 6 features of NgramFactory, (290ms of 300ms total load time for model) is taken stemming common_words_gazetteer_name from gazetter. (using from_dict / to_dict )

So if we cache the stemmed words, loading models becomes extremly fast. This pull request in our case removes 60K calls to stem(lang, word) which takes 90% of domain load time for our model.

Is this something you'd be interested to have in core ?

ps:

I didn't even create the model and don't know how snips works, just profiled and fixed.

The code sucks, I'll make it better if you are interested.

Regards, Dorian
opened by ddorian 12

Snips doesn't work with pyinstaller

My team just spent the past week to get snips to work with pyinstaller. Pyinstaller lets you wrap your python code up into a convenient single file for distribution. The modifications to Snips were minor:

snips_nlu_parsers/utils.py: add the import for sys, and change the assignment of PACKAGE_PATH to the following

try:
    PACKAGE_PATH = Path(sys._MEIPASS)
except AttributeError:
    PACKAGE_PATH = Path(__file__).absolute().parent

_MEIPASS is a variable set by pyinstaller that records where the installation is in the environment it creates. The call to load the library a few lines down will break without this.

Add the hook-snips_nlu_parsers.py file:

from PyInstaller.compat import modname_tkinter, is_win
import os

hiddenimports = ['sklearn', 'sklearn.neighbors.typedefs', 'sklearn.neighbors.quad_tree', 'sklearn.tree._utils', 'snips_nlu', 'multiprocessing.get_context', 'sklearn.utils', 'pycrfsuite._dumpparser', 'pycrfsuite._logparser']

if is_win:
    binaries=[(os.environ['USERPROFILE'] + '\\AppData\\Local\\Programs\\python\\python36\\lib\\site-packages\\snips_nlu_parsers\\dylib\\libsnips_nlu_parsers_rs.cp36-win_amd64.pyd', 'dylib')]
else:
    binaries=[('/usr/local/lib/python3.7/site-packages/snips_nlu_parsers/dylib/libsnips_nlu_parsers_rs.cpython-37m-darwin.so', 'dylib')]

Documentation for pyinstaller hooks: https://pyinstaller.readthedocs.io/en/stable/hooks.html?highlight=collect_submodules

wontfix

opened by Shotgun167 10

Incremental Training

Is it possible to do incremental training? I build training sets that have between 10-20K training examples and training takes a long time. Would like to be able to add a new training example to a model incrementally without having to wait for hours to retrain the set. Are there any thoughts on how to approach this?

opened by timtutt 10
make proc_intent() fast again (on big model)

Hi,

I have a big-model, which parsing text is 3x slower compared to my normal model. With changes inside, it's only 1.5x slower. The normal model should be faster too, just didn't test.

The code isn't as nice as it could be, but if you agree with the changes we can fix it.

Attached is cProfile screenshot, before & after of single proc_intent().

Makes sense ?

opened by ddorian 9
error loading a saved engine

I did the tutorial and save the engine state. When I try to load it - I obtain ... snips_nlu.resources.MissingResource: Language resource 'en' not found. This may be solved by running 'snips-nlu download en'

But the resource language has been found for the saving done just before with this code snips.load_resources('snips_nlu_en') <-- this works but not with this code load_resources(u"en") During the language installation I have this message Creating a shortcut link for 'snips_nlu_en' didn't work, but you can still load the resources via its full package name: snips_nlu.load_resources('snips_nlu_en')

I guess that the loading engine code use the second instruction to load the resource.

How can I solve this problem - I need to be able to save / load trained engines.

Thanks

opened by hvaneylen 8
[Windows] error while fitting engine on custom dataset with multiple entites

Hi I'm using snips on windows10 anaconda I followed the tutorial but tried to generate my own dataset with multiple entities using the command given and while fitting the data i got this error -:

with io.open("dataset45.json") as f: dataset = json.load(f) engine.fit(dataset)

OSError Traceback (most recent call last) in () 1 with io.open("dataset45.json") as f: 2 dataset = json.load(f) ----> 3 engine.fit(dataset)

~\Anaconda3\lib\site-packages\snips_nlu\utils.py in wrapped(*args, **kwargs) 254 start = datetime.now() 255 msg_fmt = dict() --> 256 res = fn(*args, **kwargs) 257 if "elapsed_time" in output_msg: 258 msg_fmt["elapsed_time"] = datetime.now() - start

~\Anaconda3\lib\site-packages\snips_nlu\nlu_engine\nlu_engine.py in fit(self, dataset, force_retrain) 93 recycled_parser = build_processing_unit(parser_config) 94 if force_retrain or not recycled_parser.fitted: ---> 95 recycled_parser.fit(dataset, force_retrain) 96 parsers.append(recycled_parser) 97

~\Anaconda3\lib\site-packages\snips_nlu\utils.py in wrapped(*args, **kwargs) 254 start = datetime.now() 255 msg_fmt = dict() --> 256 res = fn(*args, **kwargs) 257 if "elapsed_time" in output_msg: 258 msg_fmt["elapsed_time"] = datetime.now() - start

~\Anaconda3\lib\site-packages\snips_nlu\intent_parser\probabilistic_intent_parser.py in fit(self, dataset, force_retrain) 82 slot_filler_config) 83 if force_retrain or not self.slot_fillers[intent_name].fitted: ---> 84 self.slot_fillers[intent_name].fit(dataset, intent_name) 85 logger.debug("Fitted slot fillers in %s", 86 elapsed_since(slot_fillers_start))

~\Anaconda3\lib\site-packages\snips_nlu\utils.py in wrapped(*args, **kwargs) 254 start = datetime.now() 255 msg_fmt = dict() --> 256 res = fn(*args, **kwargs) 257 if "elapsed_time" in output_msg: 258 msg_fmt["elapsed_time"] = datetime.now() - start

~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py in fit(self, dataset, intent) 130 # pylint: disable=C0103 131 X = [self.compute_features(sample[TOKENS], drop_out=True) --> 132 for sample in crf_samples] 133 # ensure ascii tags 134 Y = [[_encode_tag(tag) for tag in sample[TAGS]]

~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py in (.0) 130 # pylint: disable=C0103 131 X = [self.compute_features(sample[TOKENS], drop_out=True) --> 132 for sample in crf_samples] 133 # ensure ascii tags 134 Y = [[_encode_tag(tag) for tag in sample[TAGS]]

~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\crf_slot_filler.py in compute_features(self, tokens, drop_out) 200 if drop_out and random_state.rand() < f_drop_out: 201 continue --> 202 value = feature.compute(i, cache) 203 if value is not None: 204 token_features[feature.name] = value

~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\feature.py in compute(self, token_index, cache) 57 58 tokens = [c["token"] for c in cache] ---> 59 value = self.function(tokens, token_index + self.offset) 60 cache[token_index + self.offset][self.base_name] = value 61 return value

~\Anaconda3\lib\site-packages\snips_nlu\slot_filler\feature_factory.py in builtin_entity_match(tokens, token_index) 494 495 builtin_entities = get_builtin_entities( --> 496 text, self.language, scope=[builtin_entity], use_cache=True) 497 builtin_entities = [ent for ent in builtin_entities 498 if entity_filter(ent, start, end)]

~\Anaconda3\lib\site-packages\snips_nlu\builtin_entities.py in get_builtin_entities(text, language, scope, use_cache) 44 def get_builtin_entities(text, language, scope=None, use_cache=True): 45 parser = get_builtin_entity_parser(language) ---> 46 return parser.parse(text, scope=scope, use_cache=use_cache) 47 48

~\Anaconda3\lib\site-packages\snips_nlu\builtin_entities.py in parse(self, text, scope, use_cache) 24 cache_key = (text, str(scope)) 25 if cache_key not in self._cache: ---> 26 parser_result = self.parser.parse(text, scope) 27 self._cache[cache_key] = parser_result 28 return self._cache[cache_key]

~\Anaconda3\lib\site-packages\snips_nlu_ontology\builtin_entities.py in parse(self, text, scope) 157 with string_pointer(c_char_p()) as ptr: 158 exit_code = lib.snips_nlu_ontology_extract_entities_json( --> 159 self._parser, text.encode("utf8"), scope, byref(ptr)) 160 if exit_code: 161 raise ValueError("Something wrong happened while extracting "

OSError: [WinError -1073741795] Windows Error 0xc000001d

snips-nlu generate-dataset en intent_Lan.txt intent_ot.txt intent_reset.txt entity_LanID.txt entity_otp.txt entity_PasswordReset.txt > dataset.json

my .txt files were intent_ot.txt intent_reset.txt entity_ResetPassword.txt intent_Lan.txt entity_LanID.txt entity_otp.txt

opened by rohan-dot 8

[INSTALL] Setup.py does not install enum34 module

For some reason enum34 is not installed when I run python setup.py install even though it seems to be specified correctly in the setup.py:

setup(name="snips_nlu",
      version="0.0.1",
      description="",
      author="Clement Doumouro",
      author_email="[email protected]",
      url="",
      download_url="",
      license="MIT",
      install_requires=["enum34"],
      packages=["snips_nlu",
                "snips_nlu.entity_extractor",
                "snips_nlu.nlu_engine"],
      cmdclass={"install": SnipsNLUInstall},
      entry_points={},
      include_package_data=False,
      zip_safe=False)

bug

opened by adrienball 8

Inconsistencies in intent classification

Hi,

I have been working on the sample dataset and sample code posted in the https://snips-nlu.readthedocs.io/en/latest/quickstart.html.

I have also added a new intent "sampleTurnOffLight" to the same sample_dataset.json which looks like below

sample_dataset.json.zip

For a text - "turn lights in basement" I'm getting different classification every time. Note - I retrain(fit) every time before I call the parse I expect it to behave consistently with each re-train Could you please confirm the behavior? Run 1- { "input": "turn lights in basement", "slots": [], "intent": { "intentName": "sampleTurnOffLight", "probability": 0.6660875805168223 } } Run 2- { "input": "turn lights in basement", "slots": [], "intent": { "intentName": "sampleTurnOnLight", "probability": 0.6430405901353275 } }
question

opened by satnam2012 7
Is it possible to adapt the stemming and stop words file for a language?

For my use case, I need to remove some words from stop words file and add some to stemming list. Changing it locally in the python dependency works. But I need to share it and don't want to get it overwritten.

So, is there a way to import adapted language resource files?
question

opened by Corasonn 0

snips-nlu not getting installed on docker.

Describe the bug I am installing snips in docker with step 1 given in To Reproduce section. I am ending with No module named 'distutils.msvccompiler' error. I am running everything on Linux based system. is there alternaltive way to install snips-nlu in docker?

To Reproduce

Created installSnips.sh (following) file for installing rust, rust setup tool

pip3 install numpy
pip3 install scipy
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y
pip3 install setuptools-rust
source ~/.cargo/env
pip3 install snips-nlu

while running installSnips.sh pip3 install snip-nlu enter in following error

 Building wheel for scikit-learn (setup.py) ... error
  error: subprocess-exited-with-error
  
  × python setup.py bdist_wheel did not run successfully.
  │ exit code: 1
  ╰─> [26 lines of output]
      Partial import of sklearn during the build process.
      /tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py:123: DeprecationWarning:
      
        `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
        of the deprecation of `distutils` itself. It will be removed for
        Python >= 3.12. For older Python versions it will remain present.
        It is recommended to use `setuptools < 60.0` for those Python versions.
        For more details, see:
          https://numpy.org/devdocs/reference/distutils_status_migration.html
      
      
        from numpy.distutils.command.build_ext import build_ext  # noqa
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py", line 303, in <module>
          setup_package()
        File "/tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py", line 295, in setup_package
          from numpy.distutils.core import setup
        File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/core.py", line 24, in <module>
          from numpy.distutils.command import config, config_compiler, \
        File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/command/config.py", line 19, in <module>
          from numpy.distutils.mingw32ccompiler import generate_manifest
        File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/mingw32ccompiler.py", line 28, in <module>
          from distutils.msvccompiler import get_build_version as get_build_msvc_version
      ModuleNotFoundError: No module named 'distutils.msvccompiler'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for scikit-learn
  Running setup.py clean for scikit-learn
Failed to build scikit-learn
Installing collected packages: requests, pyaml, packaging, scikit-learn, deprecation
  Attempting uninstall: scikit-learn
    Found existing installation: scikit-learn 1.1.2
    Uninstalling scikit-learn-1.1.2:
      Successfully uninstalled scikit-learn-1.1.2
  Running setup.py install for scikit-learn ... error
  error: subprocess-exited-with-error
  
  × Running setup.py install for scikit-learn did not run successfully.
  │ exit code: 1
  ╰─> [26 lines of output]
      Partial import of sklearn during the build process.
      /tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py:123: DeprecationWarning:
      
        `numpy.distutils` is deprecated since NumPy 1.23.0, as a result
        of the deprecation of `distutils` itself. It will be removed for
        Python >= 3.12. For older Python versions it will remain present.
        It is recommended to use `setuptools < 60.0` for those Python versions.
        For more details, see:
          https://numpy.org/devdocs/reference/distutils_status_migration.html
      
      
        from numpy.distutils.command.build_ext import build_ext  # noqa
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py", line 303, in <module>
          setup_package()
        File "/tmp/pip-install-h9opcxf9/scikit-learn_fc42ebd0e5804549ad6f611dced79620/setup.py", line 295, in setup_package
          from numpy.distutils.core import setup
        File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/core.py", line 24, in <module>
          from numpy.distutils.command import config, config_compiler, \
        File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/command/config.py", line 19, in <module>
          from numpy.distutils.mingw32ccompiler import generate_manifest
        File "/home/drjslab/.local/lib/python3.10/site-packages/numpy/distutils/mingw32ccompiler.py", line 28, in <module>
          from distutils.msvccompiler import get_build_version as get_build_msvc_version
      ModuleNotFoundError: No module named 'distutils.msvccompiler'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  Rolling back uninstall of scikit-learn
  Moving to /home/drjslab/.local/lib/python3.10/site-packages/scikit_learn-1.1.2.dist-info/
   from /home/drjslab/.local/lib/python3.10/site-packages/~cikit_learn-1.1.2.dist-info
  Moving to /home/drjslab/.local/lib/python3.10/site-packages/scikit_learn.libs/
   from /home/drjslab/.local/lib/python3.10/site-packages/~cikit_learn.libs
  Moving to /home/drjslab/.local/lib/python3.10/site-packages/sklearn/
   from /home/drjslab/.local/lib/python3.10/site-packages/~klearn
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> scikit-learn

note: This is an issue with the package mentioned above, not pip.

Environment:

Base OS: Ubuntu 20.04
Base Python version: 3.8
snips-nlu version: Latest
Docker OS: Ubuntu 20.04
Docker Python: 3.10.4

bug

opened by jig4physics 0

SSLError "Bad Handshake" error during python -m snips_nlu download-language-entities en
The Bug I seem to be getting a bad handshake error while trying to download the built-in entities. SLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')],)"),))

To Reproduce The error comes up when trying to parse input, and then an error occurs saying "FileNotFoundError: No data found for the 'snips/city' builtin entity in language 'en'. You must download the corresponding resources by running 'python -m snips_nlu download-entity snips/city en' before you can use this built-in entity."

Then when trying to download the SSL Bad Handshake error occurs

If successful, the command should download and link the built-in entities, but I see no documentation about it and very little help online.

Environment:

OS: Mac OSX

Python version: 3.6.10 :: Anaconda, Inc.

snips-nlu version: 0.20.2

bug
opened by SamChadri 4

Problem with Installation on Windows

Hi, people!

I have python 3.8.6 and pip 22.0 installed on my Windows 10 machine. When I try to install Snips NLU via pip with command pip install snips-nlu, the following error occurs:

  Running `C:\Users\diego\AppData\Local\Temp\pip-install-krsta91s\snips-nlu-parsers_ccf393e870dc4dd38853696b80385053\ffi\target\release\build\rustling-ontology-514a7dc119c55141\build-script-build`
  error: failed to run custom build command for `rustling-ontology v0.19.3 (https://github.com/snipsco/rustling-ontology?tag=0.19.3#3bb1313d)`

  Caused by:
    process didn't exit successfully: `C:\Users\diego\AppData\Local\Temp\pip-install-krsta91s\snips-nlu-parsers_ccf393e870dc4dd38853696b80385053\ffi\target\release\build\rustling-ontology-514a7dc119c55141\build-script-build` (exit code: 101)
    --- stdout
    cargo:rerun-if-changed=grammar/de/src/

    --- stderr
    thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: ErrorMessage { msg: "example: \"letzten 2 jahren\" matched no rule" }', C:\Users\diego\.cargo\git\checkouts\rustling-ontology-a5f364cfd4d376e4\3bb1313\build.rs:45:86
    note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
  error: cargo failed with code: 101

  [end of output]

Can anyone help me with this? Thanks in advance!

question

opened by diegostefano 0

Confuguring the probabilistic parser to ignore stop words?

Question Hi friends In the deterministic and lookup intent parsers, we can define that we want to ignore the stop words Is it possible to do the same for the probabilistic parser? Regards Hicham
question

opened by hicham17 0

Releases(0.20.2)

0.20.2(Jan 15, 2020)
Added

Add intents filter parameter in parsing CLI #858

Add documentation about intents filters #858

Update dependencies for better python3.8 support #867

Source code(tar.gz)
Source code(zip)
0.20.1(Sep 5, 2019)
Added

Allow to bypass the model version check #830

Persist CustomEntityParser license when needed #832

Document metrics CLI #839

Allow to fit SnipsNLUEngine with a Dataset object #840

Changed

Update snips-nlu-parsers dependency upper bound to 0.5 #850

Fixed

Invalidate importlib caches after dynamically installing module #838

Automatically generate documentation for supported languages and builtin entities #841

Fix issue when cleaning up crfsuite files #843

Fix filemode of persisted crfsuite files #844

Source code(tar.gz)
Source code(zip)
0.20.0(Jul 16, 2019)
Added

Add new intent parser: LookupIntentParser #759

Changed

Replace DeterministicIntentParser by LookupIntentParser in default configs #829

Bumped snips-nlu-parsers to 0.3.x introducing new builtin entities:

snips/time

snips/timePeriod

snips/date

snips/datePeriod

snips/city

snips/country

snips/region

Source code(tar.gz)
Source code(zip)
0.19.8(Jul 10, 2019)
Added

Add filter for entity match feature #814

Add noise re-weight factor in LogRegIntentClassifier #815

Add warning logs and improve errors #821

Add random seed parameter in training CLI #819

Fixed

Fix non-deterministic behavior #817

Import modules lazily to speed up CLI startup time #819

Removed dependency on semantic_version to accept "subpatches" number #825

Source code(tar.gz)
Source code(zip)
0.19.7.1(Jul 10, 2019)
Fixed

Fix non-deterministic behavior #817

Source code(tar.gz)
Source code(zip)
0.19.7(Jun 20, 2019)
Changed

Re-score ambiguous DeterministicIntentParser results based on slots #791

Accept ambiguous results from DeterministicIntentParser when confidence score is above 0.5 #797

Avoid generating number variations when not needed #799

Moved the NLU random state from the config to the shared resources #801

Reduce custom entity parser footprint in training time #804

Bumped scikit-learn to >=0.21,<0.22 for python>=3.5 and >=0.20<0.21 for python<3.5 #801

Update dependencies #811

Fixed

Fixed a couple of bugs in the data augmentation which were making the NLU training non-deterministic #801

Remove deprecated code in dataset generation #803

Fix possible override of entity values when generating variations #808

Source code(tar.gz)
Source code(zip)
0.19.6(Apr 26, 2019)
Fixed

Raise an error when using unknown intents in intents filter #788

Fix issue with stop words in DeterministicIntentParser #789

Source code(tar.gz)
Source code(zip)
0.19.5(Apr 10, 2019)
Added

Advanced inference logging in the CRFSlotFiller #776

Improved failed linking error message after download of resources #774

Improve handling of ambiguous utterances in DeterministicIntentParser #773

Changed

Remove normalization of confidence scores in intent classification #782

Fixed

Fixed a crash due to missing resources when refitting the CRFSlotFiller #771

Fixed issue with egg fragments in download cli #769

Fixed an issue causing the None intent to be ignored when using the parse API in conjunction with intents and top_n #781

Source code(tar.gz)
Source code(zip)
0.19.4(Mar 6, 2019)
Added

Support for Portuguese: "pt_pt" and "pt_br"

Changed

Enhancement: leverage entity scopes of each intent in deterministic intent parser

Source code(tar.gz)
Source code(zip)
0.19.3(Mar 5, 2019)
Fixed

Issue with intent classification reducing classification accuracy

Issue resulting in a mutation of the CRFSlotFillerConfig

Wrong required resources of the DeterministicIntentParser

Issue with non ASCII characters when using the parsing CLI with Python2

Source code(tar.gz)
Source code(zip)
0.19.2(Feb 11, 2019)
Fixed

Fix an issue regarding the way builtin entities were handled by the CRFSlotFiller

Source code(tar.gz)
Source code(zip)
0.19.1(Feb 4, 2019)
Fixed

Bug causing an unnecessary reloading of shared resources

Source code(tar.gz)
Source code(zip)
0.19.0(Feb 4, 2019)
Added

Support for Python3.7

get_intents(text) API in SnipsNLUEngine to get the probabilities of all the intents

get_slots(text, intent) API in SnipsNLUEngine to extract slots when the intent is known

The DeterministicIntentParser can now ignore stop words through the new ignore_stop_words configuration parameter

Co-occurrence features can now be used in the LogRegIntentClassifier

Changed

The None intent is now handled as a regular intent in the parsing output, which means that:

{ "input": "foo bar", "intent": None, "slots": None }

is replaced with:

{ "input": "foo bar", "intent": { "intentName": None, "probability": 0.552122 }, "slots": [] }

Patterns of the DeterministicIntentParser are now deduplicated across intents in order to reduce ambiguity

Improve the use of custom ProcessingUnit through the use of Registrable pattern

Improve the use of default processing unit configurations

Improve logging

Replace snips-nlu-ontology with snips-nlu-parsers

Fixed

Issue when persisting resources

Issue when resolving custom entities

Issue with whitespaces when generating dataset from YAML and text files

Issue with unicode when using the CLI (Python 2)

Source code(tar.gz)
Source code(zip)
0.18.0(Nov 26, 2018)
Added

New YAML format to create dataset

Verbose mode in CLI

Changed

Bump snips-nlu-ontology to 0.62.0 to improve memory usage

Source code(tar.gz)
Source code(zip)
0.17.4(Nov 20, 2018)
Added

Add a --config argument in the metrics CLI

Changed

Replace "parser_threshold" by "matching_strictness" in dataset format

Optimize loading and inference runtime

Disable stemming for intent classification in default configs

Source code(tar.gz)
Source code(zip)
0.17.3(Oct 18, 2018)
Fixed

Crash with num2words and floats

Source code(tar.gz)
Source code(zip)
0.17.2(Oct 17, 2018)
Added

Support for builtin music entities in english

Source code(tar.gz)
Source code(zip)
0.17.1(Oct 9, 2018)
Fixed

DeterministicIntentParser now relies on the custom entity parser

Changed

Bump snips-nlu-ontology to 0.60

Source code(tar.gz)
Source code(zip)
0.17.0(Oct 5, 2018)
Added

Support for 3 new builtin entities in French: snips/musicAlbum, snips/musicArtist and snips/musicTrack

Minimal support for Italian

Changed

model version 0.16.0 => 0.17.0

Fixed

Bug with entity feature name in intent classification

Source code(tar.gz)
Source code(zip)
0.16.5(Sep 17, 2018)
[0.16.5] - 2018-0906

Fixed

Segfault in CRFSuite when the CRFSlotFiller is fitted only on empty utterances

Source code(tar.gz)
Source code(zip)
0.16.4(Sep 17, 2018)
[0.16.4] - 2018-08-30

Fixed

Issue with the CrfSlotFiller file names in the ProbabilisticIntentParser serialization

Source code(tar.gz)
Source code(zip)
0.16.3(Aug 22, 2018)
Fixed

Issue with synonyms when multiple synonyms have the same normalization

Source code(tar.gz)
Source code(zip)
0.16.2(Aug 8, 2018)
Added

automatically_extensible flag in dataset generation tool

System requirements

Reference to chatito tool in documentation

Changed

Bump snips-nlu-ontology to 0.57.3

versions of dependencies are now defined more loosely

Fixed

Issue with synonyms mapping

Issue with snips-nlu download-all-languages CLI command

Source code(tar.gz)
Source code(zip)
0.16.1(Jul 23, 2018)
Added

Every processing unit can be persisted into (and loaded from) a bytearray

Source code(tar.gz)
Source code(zip)
0.16.0(Jul 17, 2018)
Changed

The SnipsNLUEngine object is now persisted to (and loaded from) a directory, instead of a single json file.

The language resources are now persisted along with the SnipsNLUEngine, removing the need to download and load the resources when loading a trained engine.

The format of language resources has been optimized.

Added

Stemmed gazetteers, computed beforehand. It removes the need to stem gazetteers on the fly.

API to persist (and load) a SnipsNLUEngine object as a bytearray

Fixed

Issue in the DeterministicIntentParser when the same slot name was used in multiple intents while referring to different entities

Source code(tar.gz)
Source code(zip)
0.15.1(Jul 9, 2018)
Changed

Bump snips-nlu-ontology to 0.57.1

Fixed

Crash when parsing implicit years before 1970

Source code(tar.gz)
Source code(zip)
0.15.0(Jun 21, 2018)
Changed

Language resources are now packaged separately from the Snips NLU core library, and can be fetched using snips-nlu download <language>.

The CLI tool now consists in a single entry point, snips-nlu, which exposes several commands.

Added

CLI command to parse a query

Source code(tar.gz)
Source code(zip)
0.14.0(Jun 8, 2018)
Fixed

Issue due to caching of builtin entities at inference time

Changed

Improve builtin entities handling during intent classification

Improve builtin entities handling in DeterministicIntentParser

Reduce size of regex patterns in trained model file

Update model version to 0.15.0

Source code(tar.gz)
Source code(zip)
0.13.5(May 23, 2018)
Fixed

Fixed synonyms matching by using the normalized version of the tagged value

Fixed dataset augmentation by keep stripped values of entities

Fixed the string variations functions not to generate too many variations

Source code(tar.gz)
Source code(zip)
0.13.4(May 18, 2018)
Added

Documentation for the None intent

Changed

Improve calibration of intent classification probabilities

Update snips-nlu-ontology version to 0.55.0

Fixed

DeterministicIntentParser: Fix bug when deduplicating regexes

DeterministicIntentParser: Fix issue with incorrect ranges when parsing sentences with both builtin and custom slots

DeterministicIntentParser: Fix issue with builtin entities placeholders causing mismatches

Fix issue with engine-inference CLI script not loading resources correctly

Source code(tar.gz)
Source code(zip)

Snips Python library to extract meaning from text

Related tags

Overview

Snips NLU

Summary

What is Snips NLU about ?

Getting Started

System requirements

Installation

Language resources

API Usage

Command Line Interface

Sample code

Sample datasets

Benchmarks

Documentation

Citing Snips NLU

FAQ & Community

Related content

How do I contribute ?

Licence

Geonames Licence

Comments

Releases(0.20.2)

0.20.2(Jan 15, 2020)

Added

0.20.1(Sep 5, 2019)

Added

Changed

Fixed

0.20.0(Jul 16, 2019)

Added

Changed

0.19.8(Jul 10, 2019)

Added

Fixed

0.19.7.1(Jul 10, 2019)

Fixed

0.19.7(Jun 20, 2019)

Changed

Fixed

0.19.6(Apr 26, 2019)

Fixed

0.19.5(Apr 10, 2019)

Added

Changed

Fixed

0.19.4(Mar 6, 2019)

Added

Changed

0.19.3(Mar 5, 2019)

Fixed

0.19.2(Feb 11, 2019)

Fixed

0.19.1(Feb 4, 2019)

Fixed

0.19.0(Feb 4, 2019)

Added

Changed

Fixed

0.18.0(Nov 26, 2018)

Added

Changed

0.17.4(Nov 20, 2018)

Added

Changed

0.17.3(Oct 18, 2018)

Fixed

0.17.2(Oct 17, 2018)

Added

0.17.1(Oct 9, 2018)

Fixed

Changed

0.17.0(Oct 5, 2018)

Added

Changed

Fixed

0.16.5(Sep 17, 2018)

[0.16.5] - 2018-0906

Fixed