Repository for Project Insight: NLP as a Service

Abhishek Kumar Mishra

Last update: Dec 6, 2022

Related tags

Text Data & NLP nlp docker machine-learning natural-language-processing microservice transformer fastapi huggingface streamlit huggingface-transformer streamlit-webapp transformers-models

Overview

Project Insight

NLP as a Service

Introduction
- Features
Installation
- Setup and Documentation
Project Details
License

Introduction

Project Insight is designed to create NLP as a service with code base for both front end GUI (streamlit) and backend server (FastApi) the usage of transformers models on various downstream NLP task.

The downstream NLP tasks covered:

News Classification
Entity Recognition
Sentiment Analysis
Summarization
Information Extraction To Do

The user can select different models from the drop down to run the inference.

The users can also directly use the backend fastapi server to have a command line inference.

Features of the solution

Python Code Base: Built using Fastapi and Streamlit making the complete code base in Python.
Expandable: The backend is desinged in a way that it can be expanded with more Transformer based models and it will be available in the front end app automatically.
Micro-Services: The backend is designed with a microservices architecture, with dockerfile for each service and leveraging on Nginx as a reverse proxy to each independently running service.
- This makes it easy to update, manitain, start, stop individual NLP services.

Installation

Clone the Repo.
Run the Docker Compose to spin up the Fastapi based backend service.
Run the Streamlit app with the streamlit run command.

Setup and Documentation

Download the models
- Download the models from here
- Save them in the specific model folders inside the src_fastapi folder.
Running the backend service.
- Go to the src_fastapi folder
- Run the Docker Compose comnand
```
$ cd src_fastapi
src_fastapi:~$ sudo docker-compose up -d
```
Running the frontend app.
- Go to the src_streamlit folder
- Run the app with the streamlit run command
```
$ cd src_streamlit
src_streamlit:~$ streamlit run NLPfily.py
```
Access to Fastapi Documentation: Since this is a microservice based design, every NLP task has its own seperate documentation
- News Classification: http://localhost:8080/api/v1/classification/docs
- Sentiment Analysis: http://localhost:8080/api/v1/sentiment/docs
- NER: http://localhost:8080/api/v1/ner/docs
- Summarization: http://localhost:8080/api/v1/summary/docs

Project Details

Demonstration

Directory Details

Front End: Front end code is in the src_streamlit folder. Along with the Dockerfile and requirements.txt
Back End: Back End code is in the src_fastapi folder.
- This folder contains directory for each task: Classification, ner, summary...etc
- Each NLP task has been implemented as a microservice, with its own fastapi server and requirements and Dockerfile so that they can be independently mantained and managed.
- Each NLP task has its own folder and within each folder each trained model has 1 folder each. For example:
```
- sentiment
    > app
        > api
            > distilbert
                - model.bin
                - network.py
                - tokeniser files
            >roberta
                - model.bin
                - network.py
                - tokeniser files
```
- For each new model under each service a new folder will have to be added.
- Each folder model will need the following files:
  - Model bin file.
  - Tokenizer files
  - network.py Defining the class of the model if customised model used.
- config.json: This file contains the details of the models in the backend and the dataset they are trained on.

How to Add a new Model

Fine Tune a transformer model for specific task. You can leverage the transformers-tutorials
Save the model files, tokenizer files and also create a network.py script if using a customized training network.
Create a directory within the NLP task with directory_name as the model name and save all the files in this directory.
Update the config.json with the model details and dataset details.

Update the <service>pro.py with the correct imports and conditions where the model is imported. For example for a new Bert model in Classification Task, do the following:

Create a new directory in classification/app/api/. Directory name bert.

Update config.json with following:

"classification": {
"model-1": {
    "name": "DistilBERT",
    "info": "This model is trained on News Aggregator Dataset from UC Irvin Machine Learning Repository. The news headlines are classified into 4 categories: **Business**, **Science and Technology**, **Entertainment**, **Health**. [New Dataset](https://archive.ics.uci.edu/ml/datasets/News+Aggregator)"
},
"model-2": {
    "name": "BERT",
    "info": "Model Info"
}
}

Update classificationpro.py with the following snippets:

Only if customized class used

from classification.bert import BertClass

Section where the model is selected

if model == "bert":
    self.model = BertClass()
    self.tokenizer = BertTokenizerFast.from_pretrained(self.path)

License

This project is licensed under the GPL-3.0 License - see the LICENSE.md file for details

Comments

Failed to establish a new connection: [Errno 111] Connection refused

Hello! Nice work, it looks very good in the video. Unfortunately, I was not able to make it run. I get this error:

Insight JSONDecodeError: Expecting value: line 1 column 1 (char 0) Traceback: File "c:\users\mbene\miniconda3\lib\site-packages\streamlit\script_runner.py", line 324, in run_script exec(code, module.dict) File "C:\Users\mbene\Documents\GitHub\insight-master\insight-master\src_streamlit\NLPfiy.py", line 132, in main() File "C:\Users\mbene\Documents\GitHub\insight-master\insight-master\src_streamlit\NLPfiy.py", line 118, in main model_details = apicall.model_list(service=service) File "C:\Users\mbene\Documents\GitHub\insight-master\insight-master\src_streamlit\NLPfiy.py", line 24, in model_list return json.loads(models.text) File "c:\users\mbene\miniconda3\lib\json_init.py", line 348, in loads return _default_decoder.decode(s) File "c:\users\mbene\miniconda3\lib\json\decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "c:\users\mbene\miniconda3\lib\json\decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None

In any of the service I get the same message (only change the URL endpoint)

.- I already add the sreamlit to the docker network .- I already down and up the firewall .- Create an exception in the firewall .- Update docker and docker-compose .- In Linux Ubuntu and Windows 10 is the same error .- Installed the requirements.txt .- I can't access the localhost:8000/docs. .- And l already tried docker run the Dockerfile inside streamlit, and add it to the network

no success so far.

I would appreciate your help. Thank you!

Mauro
bug

opened by mbenetti 17
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Hello Abhishek I try today with the latest changes but still is not working for me. Now I get "Internal Server Error" using FastAPI Swagger UI (see picture)

If I request a GET info, I get the name and the description of the model (see next picture)

The front end present the same Json decoder error as yesterday.

I tried in windows and linux again.
bug

opened by mbenetti 9
No model.bin in the repo

Hi Abhi Mishra

Amazing work. Really like the idea of building NLP services. I'm highly interested in augmenting the models with custom features for sentiment analysis. Is it possible for you to upload the model files (pytorch_model.bin) to try different NLP pipelines for sentiment analysis?
question

opened by AnilB87 2
summarypro.py missing trailing slash in path
There's a missing / in the path on line 14:

self.path = f"./app/api/{model}"

Should be:

self.path = f"./app/api/{model}/"
bug
opened by asehmi 1
Bump fastapi from 0.45.0 to 0.65.2 in /src_fastapi/classification
Bumps fastapi from 0.45.0 to 0.65.2.

Release notes

Sourced from fastapi's releases.

0.65.2

Security fixes

🔒 Check Content-Type request header before assuming JSON. Initial PR #2118 by @patrickkwang.

This change fixes a CSRF security vulnerability when using cookies for authentication in path operations with JSON payloads sent by browsers.

In versions lower than 0.65.2, FastAPI would try to read the request payload as JSON even if the content-type header sent was not set to application/json or a compatible JSON media type (e.g. application/geo+json).

So, a request with a content type of text/plain containing JSON data would be accepted and the JSON data would be extracted.

But requests with content type text/plain are exempt from CORS preflights, for being considered Simple requests. So, the browser would execute them right away including cookies, and the text content could be a JSON string that would be parsed and accepted by the FastAPI application.

See CVE-2021-32677 for more details.

Thanks to Dima Boger for the security report! 🙇🔒

Internal

🔧 Update sponsors badge, course bundle. PR #3340 by @tiangolo.

🔧 Add new gold sponsor Jina 🎉. PR #3291 by @tiangolo.

🔧 Add new banner sponsor badge for FastAPI courses bundle. PR #3288 by @tiangolo.

👷 Upgrade Issue Manager GitHub Action. PR #3236 by @tiangolo.

0.65.1

Security fixes

📌 Upgrade pydantic pin, to handle security vulnerability CVE-2021-29510. PR #3213 by @tiangolo.

0.65.0

Breaking Changes - Upgrade

⬆️ Upgrade Starlette to 0.14.2, including internal UJSONResponse migrated from Starlette. This includes several bug fixes and features from Starlette. PR #2335 by @hanneskuettner.

Translations

🌐 Initialize new language Polish for translations. PR #3170 by @neternefer.

Internal

👷 Add GitHub Action cache to speed up CI installs. PR #3204 by @tiangolo.

⬆️ Upgrade setup-python GitHub Action to v2. PR #3203 by @tiangolo.

🐛 Fix docs script to generate a new translation language with overrides boilerplate. PR #3202 by @tiangolo.

✨ Add new Deta banner badge with new sponsorship tier 🙇. PR #3194 by @tiangolo.

👥 Update FastAPI People. PR #3189 by @github-actions[bot].

🔊 Update FastAPI People to allow better debugging. PR #3188 by @tiangolo.

0.64.0

Features

... (truncated)

Commits

4d91f97 🔖 Release version 0.65.2

aabe2c7 📝 Update release notes

377234a 🔒 Create Security Policy

38b7858 📝 Update release notes

fa7e3c9 🐛 Check Content-Type request header before assuming JSON (#2118)

90120dd 📝 Update release notes

3677254 🔧 Update sponsors badge, course bundle (#3340)

40bb0c5 📝 Update release notes

60918d2 🔧 Add new gold sponsor Jina 🎉 (#3291)

3afce2c 📝 Update release notes

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump fastapi from 0.45.0 to 0.65.2 in /src_fastapi/ner
Bumps fastapi from 0.45.0 to 0.65.2.

Release notes

Sourced from fastapi's releases.

0.65.2

Security fixes

🔒 Check Content-Type request header before assuming JSON. Initial PR #2118 by @patrickkwang.

This change fixes a CSRF security vulnerability when using cookies for authentication in path operations with JSON payloads sent by browsers.

In versions lower than 0.65.2, FastAPI would try to read the request payload as JSON even if the content-type header sent was not set to application/json or a compatible JSON media type (e.g. application/geo+json).

So, a request with a content type of text/plain containing JSON data would be accepted and the JSON data would be extracted.

But requests with content type text/plain are exempt from CORS preflights, for being considered Simple requests. So, the browser would execute them right away including cookies, and the text content could be a JSON string that would be parsed and accepted by the FastAPI application.

See CVE-2021-32677 for more details.

Thanks to Dima Boger for the security report! 🙇🔒

Internal

🔧 Update sponsors badge, course bundle. PR #3340 by @tiangolo.

🔧 Add new gold sponsor Jina 🎉. PR #3291 by @tiangolo.

🔧 Add new banner sponsor badge for FastAPI courses bundle. PR #3288 by @tiangolo.

👷 Upgrade Issue Manager GitHub Action. PR #3236 by @tiangolo.

0.65.1

Security fixes

📌 Upgrade pydantic pin, to handle security vulnerability CVE-2021-29510. PR #3213 by @tiangolo.

0.65.0

Breaking Changes - Upgrade

⬆️ Upgrade Starlette to 0.14.2, including internal UJSONResponse migrated from Starlette. This includes several bug fixes and features from Starlette. PR #2335 by @hanneskuettner.

Translations

🌐 Initialize new language Polish for translations. PR #3170 by @neternefer.

Internal

👷 Add GitHub Action cache to speed up CI installs. PR #3204 by @tiangolo.

⬆️ Upgrade setup-python GitHub Action to v2. PR #3203 by @tiangolo.

🐛 Fix docs script to generate a new translation language with overrides boilerplate. PR #3202 by @tiangolo.

✨ Add new Deta banner badge with new sponsorship tier 🙇. PR #3194 by @tiangolo.

👥 Update FastAPI People. PR #3189 by @github-actions[bot].

🔊 Update FastAPI People to allow better debugging. PR #3188 by @tiangolo.

0.64.0

Features

... (truncated)

Commits

4d91f97 🔖 Release version 0.65.2

aabe2c7 📝 Update release notes

377234a 🔒 Create Security Policy

38b7858 📝 Update release notes

fa7e3c9 🐛 Check Content-Type request header before assuming JSON (#2118)

90120dd 📝 Update release notes

3677254 🔧 Update sponsors badge, course bundle (#3340)

40bb0c5 📝 Update release notes

60918d2 🔧 Add new gold sponsor Jina 🎉 (#3291)

3afce2c 📝 Update release notes

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump pydantic from 1.5.1 to 1.6.2 in /src_fastapi/ner
Bumps pydantic from 1.5.1 to 1.6.2.

Release notes

Sourced from pydantic's releases.

v1.6.2 (2021-05-11)

Security fix: Fix date and datetime parsing so passing either 'infinity' or float('inf') (or their negative values) does not cause an infinite loop, see security advisory CVE-2021-29510.

v1.6.1 (2020-07-15)

See Changelog.

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

changes:

fix validation and parsing of nested models with default_factory, #1710 by @PrettyWood

v1.6 (2020-07-11)

See Changelog.

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

changes:

Modify validators for conlist and conset to not have always=True, #1682 by @samuelcolvin

add port check to AnyUrl (can't exceed 65536) ports are 16 insigned bits: 0 <= port <= 2**16-1 src: rfc793 header format, #1654 by @flapili

Document default regex anchoring semantics, #1648 by @yurikhan

Use chain.from_iterable in class_validators.py. This is a faster and more idiomatic way of using itertools.chain. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space, #1642 by @cool-RR

Add conset(), analogous to conlist(), #1623 by @patrickkwang

make pydantic errors (un)pickable, #1616 by @PrettyWood

Allow custom encoding for dotenv files, #1615 by @PrettyWood

Ensure SchemaExtraCallable is always defined to get type hints on BaseConfig, #1614 by @PrettyWood

Update datetime parser to support negative timestamps, #1600 by @mlbiche

Update mypy, remove AnyType alias for Type[Any], #1598 by @samuelcolvin

Adjust handling of root validators so that errors are aggregated from all failing root validators, instead of reporting on only the first root validator to fail, #1586 by @beezee

Make __modify_schema__ on Enums apply to the enum schema rather than fields that use the enum, #1581 by @therefromhere

Fix behavior of __all__ key when used in conjunction with index keys in advanced include/exclude of fields that are sequences, #1579 by @xspirus

Subclass validators do not run when referencing a List field defined in a parent class when each_item=True. Added an example to the docs illustrating this, #1566 by @samueldeklund

change schema.field_class_to_schema to support frozenset in schema, #1557 by @wangpeibao

Call __modify_schema__ only for the field schema, #1552 by @PrettyWood

Move the assignment of field.validate_always in fields.py so the always parameter of validators work on inheritance, #1545 by @dcHHH

Added support for UUID instantiation through 16 byte strings such as b'\x12\x34\x56\x78' * 4. This was done to support BINARY(16) columns in sqlalchemy, #1541 by @shawnwall

Add a test assertion that default_factory can return a singleton, #1523 by @therefromhere

Add NameEmail.__eq__ so duplicate NameEmail instances are evaluated as equal, #1514 by @stephen-bunn

Add datamodel-code-generator link in pydantic document site, #1500 by @koxudaxi

Added a "Discussion of Pydantic" section to the documentation, with a link to "Pydantic Introduction" video by Alexander Hultnér, #1499 by @hultner

Avoid some side effects of default_factory by calling it only once if possible and by not setting a default value in the schema, #1491 by @PrettyWood

Added docs about dumping dataclasses to JSON, #1487 by @mikegrima

Make BaseModel.__signature__ class-only, so getting __signature__ from model instance will raise AttributeError, #1466 by @MrMrRobat

include 'format': 'password' in the schema for secret types, #1424 by @atheuz

Modify schema constraints on ConstrainedFloat so that exclusiveMinimum and minimum are not included in the schema if they are equal to -math.inf and exclusiveMaximum and maximum are not included if they are equal to math.inf, #1417 by @vdwees

Squash internal __root__ dicts in .dict() (and, by extension, in .json()), #1414 by @patrickkwang

... (truncated)

Changelog

Sourced from pydantic's changelog.

v1.6.2 (2021-05-11)

Security fix: Fix date and datetime parsing so passing either 'infinity' or float('inf') (or their negative values) does not cause an infinite loop, See security advisory CVE-2021-29510

v1.6.1 (2020-07-15)

fix validation and parsing of nested models with default_factory, #1710 by @PrettyWood

v1.6 (2020-07-11)

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

Modify validators for conlist and conset to not have always=True, #1682 by @samuelcolvin

add port check to AnyUrl (can't exceed 65536) ports are 16 insigned bits: 0 <= port <= 2**16-1 src: rfc793 header format, #1654 by @flapili

Document default regex anchoring semantics, #1648 by @yurikhan

Use chain.from_iterable in class_validators.py. This is a faster and more idiomatic way of using itertools.chain. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space, #1642 by @cool-RR

Add conset(), analogous to conlist(), #1623 by @patrickkwang

make pydantic errors (un)pickable, #1616 by @PrettyWood

Allow custom encoding for dotenv files, #1615 by @PrettyWood

Ensure SchemaExtraCallable is always defined to get type hints on BaseConfig, #1614 by @PrettyWood

Update datetime parser to support negative timestamps, #1600 by @mlbiche

Update mypy, remove AnyType alias for Type[Any], #1598 by @samuelcolvin

Adjust handling of root validators so that errors are aggregated from all failing root validators, instead of reporting on only the first root validator to fail, #1586 by @beezee

Make __modify_schema__ on Enums apply to the enum schema rather than fields that use the enum, #1581 by @therefromhere

Fix behavior of __all__ key when used in conjunction with index keys in advanced include/exclude of fields that are sequences, #1579 by @xspirus

Subclass validators do not run when referencing a List field defined in a parent class when each_item=True. Added an example to the docs illustrating this, #1566 by @samueldeklund

change schema.field_class_to_schema to support frozenset in schema, #1557 by @wangpeibao

Call __modify_schema__ only for the field schema, #1552 by @PrettyWood

Move the assignment of field.validate_always in fields.py so the always parameter of validators work on inheritance, #1545 by @dcHHH

Added support for UUID instantiation through 16 byte strings such as b'\x12\x34\x56\x78' * 4. This was done to support BINARY(16) columns in sqlalchemy, #1541 by @shawnwall

Add a test assertion that default_factory can return a singleton, #1523 by @therefromhere

Add NameEmail.__eq__ so duplicate NameEmail instances are evaluated as equal, #1514 by @stephen-bunn

Add datamodel-code-generator link in pydantic document site, #1500 by @koxudaxi

Added a "Discussion of Pydantic" section to the documentation, with a link to "Pydantic Introduction" video by Alexander Hultnér, #1499 by @hultner

Avoid some side effects of default_factory by calling it only once if possible and by not setting a default value in the schema, #1491 by @PrettyWood

Added docs about dumping dataclasses to JSON, #1487 by @mikegrima

Make BaseModel.__signature__ class-only, so getting __signature__ from model instance will raise AttributeError, #1466 by @MrMrRobat

include 'format': 'password' in the schema for secret types, #1424 by @atheuz

Modify schema constraints on ConstrainedFloat so that exclusiveMinimum and minimum are not included in the schema if they are equal to -math.inf and exclusiveMaximum and maximum are not included if they are equal to math.inf, #1417 by @vdwees

Squash internal __root__ dicts in .dict() (and, by extension, in .json()), #1414 by @patrickkwang

Move const validator to post-validators so it validates the parsed value, #1410 by @selimb

Fix model validation to handle nested literals, e.g. Literal['foo', Literal['bar']], #1364 by @DBCerigo

Remove user_required = True from RedisDsn, neither user nor password are required, #1275 by @samuelcolvin

... (truncated)

Commits

acf7783 tweak history

829528c comment out broken tests

cf9a417 hack tests into passing

b37a922 fix formatting

ac360c5 prepare for release

bdde15b Merge pull request from GHSA-5jqp-qgf6-3pvh

d2b0501 uprev

e2fcab5 fix: validate and parse nested models properly with default_factory (#1712)

ba56a67 Bump pytest-mock from 3.1.1 to 3.2.0 (#1719)

f1f944f Update datamode_code_generator:typo in pip install (#1713)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Bump pydantic from 1.5.1 to 1.6.2 in /src_fastapi/classification
Bumps pydantic from 1.5.1 to 1.6.2.

Release notes

Sourced from pydantic's releases.

v1.6.2 (2021-05-11)

Security fix: Fix date and datetime parsing so passing either 'infinity' or float('inf') (or their negative values) does not cause an infinite loop, see security advisory CVE-2021-29510.

v1.6.1 (2020-07-15)

See Changelog.

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

changes:

fix validation and parsing of nested models with default_factory, #1710 by @PrettyWood

v1.6 (2020-07-11)

See Changelog.

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

changes:

Modify validators for conlist and conset to not have always=True, #1682 by @samuelcolvin

add port check to AnyUrl (can't exceed 65536) ports are 16 insigned bits: 0 <= port <= 2**16-1 src: rfc793 header format, #1654 by @flapili

Document default regex anchoring semantics, #1648 by @yurikhan

Use chain.from_iterable in class_validators.py. This is a faster and more idiomatic way of using itertools.chain. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space, #1642 by @cool-RR

Add conset(), analogous to conlist(), #1623 by @patrickkwang

make pydantic errors (un)pickable, #1616 by @PrettyWood

Allow custom encoding for dotenv files, #1615 by @PrettyWood

Ensure SchemaExtraCallable is always defined to get type hints on BaseConfig, #1614 by @PrettyWood

Update datetime parser to support negative timestamps, #1600 by @mlbiche

Update mypy, remove AnyType alias for Type[Any], #1598 by @samuelcolvin

Adjust handling of root validators so that errors are aggregated from all failing root validators, instead of reporting on only the first root validator to fail, #1586 by @beezee

Make __modify_schema__ on Enums apply to the enum schema rather than fields that use the enum, #1581 by @therefromhere

Fix behavior of __all__ key when used in conjunction with index keys in advanced include/exclude of fields that are sequences, #1579 by @xspirus

Subclass validators do not run when referencing a List field defined in a parent class when each_item=True. Added an example to the docs illustrating this, #1566 by @samueldeklund

change schema.field_class_to_schema to support frozenset in schema, #1557 by @wangpeibao

Call __modify_schema__ only for the field schema, #1552 by @PrettyWood

Move the assignment of field.validate_always in fields.py so the always parameter of validators work on inheritance, #1545 by @dcHHH

Added support for UUID instantiation through 16 byte strings such as b'\x12\x34\x56\x78' * 4. This was done to support BINARY(16) columns in sqlalchemy, #1541 by @shawnwall

Add a test assertion that default_factory can return a singleton, #1523 by @therefromhere

Add NameEmail.__eq__ so duplicate NameEmail instances are evaluated as equal, #1514 by @stephen-bunn

Add datamodel-code-generator link in pydantic document site, #1500 by @koxudaxi

Added a "Discussion of Pydantic" section to the documentation, with a link to "Pydantic Introduction" video by Alexander Hultnér, #1499 by @hultner

Avoid some side effects of default_factory by calling it only once if possible and by not setting a default value in the schema, #1491 by @PrettyWood

Added docs about dumping dataclasses to JSON, #1487 by @mikegrima

Make BaseModel.__signature__ class-only, so getting __signature__ from model instance will raise AttributeError, #1466 by @MrMrRobat

include 'format': 'password' in the schema for secret types, #1424 by @atheuz

Modify schema constraints on ConstrainedFloat so that exclusiveMinimum and minimum are not included in the schema if they are equal to -math.inf and exclusiveMaximum and maximum are not included if they are equal to math.inf, #1417 by @vdwees

Squash internal __root__ dicts in .dict() (and, by extension, in .json()), #1414 by @patrickkwang

... (truncated)

Changelog

Sourced from pydantic's changelog.

v1.6.2 (2021-05-11)

Security fix: Fix date and datetime parsing so passing either 'infinity' or float('inf') (or their negative values) does not cause an infinite loop, See security advisory CVE-2021-29510

v1.6.1 (2020-07-15)

fix validation and parsing of nested models with default_factory, #1710 by @PrettyWood

v1.6 (2020-07-11)

Thank you to pydantic's sponsors: @matin, @tiangolo, @chdsbd, @jorgecarleitao, and 1 anonymous sponsor for their kind support.

Modify validators for conlist and conset to not have always=True, #1682 by @samuelcolvin

add port check to AnyUrl (can't exceed 65536) ports are 16 insigned bits: 0 <= port <= 2**16-1 src: rfc793 header format, #1654 by @flapili

Document default regex anchoring semantics, #1648 by @yurikhan

Use chain.from_iterable in class_validators.py. This is a faster and more idiomatic way of using itertools.chain. Instead of computing all the items in the iterable and storing them in memory, they are computed one-by-one and never stored as a huge list. This can save on both runtime and memory space, #1642 by @cool-RR

Add conset(), analogous to conlist(), #1623 by @patrickkwang

make pydantic errors (un)pickable, #1616 by @PrettyWood

Allow custom encoding for dotenv files, #1615 by @PrettyWood

Ensure SchemaExtraCallable is always defined to get type hints on BaseConfig, #1614 by @PrettyWood

Update datetime parser to support negative timestamps, #1600 by @mlbiche

Update mypy, remove AnyType alias for Type[Any], #1598 by @samuelcolvin

Adjust handling of root validators so that errors are aggregated from all failing root validators, instead of reporting on only the first root validator to fail, #1586 by @beezee

Make __modify_schema__ on Enums apply to the enum schema rather than fields that use the enum, #1581 by @therefromhere

Fix behavior of __all__ key when used in conjunction with index keys in advanced include/exclude of fields that are sequences, #1579 by @xspirus

Subclass validators do not run when referencing a List field defined in a parent class when each_item=True. Added an example to the docs illustrating this, #1566 by @samueldeklund

change schema.field_class_to_schema to support frozenset in schema, #1557 by @wangpeibao

Call __modify_schema__ only for the field schema, #1552 by @PrettyWood

Move the assignment of field.validate_always in fields.py so the always parameter of validators work on inheritance, #1545 by @dcHHH

Added support for UUID instantiation through 16 byte strings such as b'\x12\x34\x56\x78' * 4. This was done to support BINARY(16) columns in sqlalchemy, #1541 by @shawnwall

Add a test assertion that default_factory can return a singleton, #1523 by @therefromhere

Add NameEmail.__eq__ so duplicate NameEmail instances are evaluated as equal, #1514 by @stephen-bunn

Add datamodel-code-generator link in pydantic document site, #1500 by @koxudaxi

Added a "Discussion of Pydantic" section to the documentation, with a link to "Pydantic Introduction" video by Alexander Hultnér, #1499 by @hultner

Avoid some side effects of default_factory by calling it only once if possible and by not setting a default value in the schema, #1491 by @PrettyWood

Added docs about dumping dataclasses to JSON, #1487 by @mikegrima

Make BaseModel.__signature__ class-only, so getting __signature__ from model instance will raise AttributeError, #1466 by @MrMrRobat

include 'format': 'password' in the schema for secret types, #1424 by @atheuz

Modify schema constraints on ConstrainedFloat so that exclusiveMinimum and minimum are not included in the schema if they are equal to -math.inf and exclusiveMaximum and maximum are not included if they are equal to math.inf, #1417 by @vdwees

Squash internal __root__ dicts in .dict() (and, by extension, in .json()), #1414 by @patrickkwang

Move const validator to post-validators so it validates the parsed value, #1410 by @selimb

Fix model validation to handle nested literals, e.g. Literal['foo', Literal['bar']], #1364 by @DBCerigo

Remove user_required = True from RedisDsn, neither user nor password are required, #1275 by @samuelcolvin

... (truncated)

Commits

acf7783 tweak history

829528c comment out broken tests

cf9a417 hack tests into passing

b37a922 fix formatting

ac360c5 prepare for release

bdde15b Merge pull request from GHSA-5jqp-qgf6-3pvh

d2b0501 uprev

e2fcab5 fix: validate and parse nested models properly with default_factory (#1712)

ba56a67 Bump pytest-mock from 3.1.1 to 3.2.0 (#1719)

f1f944f Update datamode_code_generator:typo in pip install (#1713)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
JSONDecodeError

After I run the "streamlit run NLPfily.py" and get the webpage, there is a jsondecodeerror raised when I choose the service. I don't know how to solve it, the error is below: could you please help me ? File "/home/test/ENTER/lib/python3.8/site-packages/streamlit/script_runner.py", line 332, in _run_script exec(code, module.dict) File "/home/test/insight-master/src_streamlit/NLPfiy.py", line 130, in main() File "/home/test/insight-master/src_streamlit/NLPfiy.py", line 116, in main model_details = apicall.model_list(service=service) File "/home/test/insight-master/src_streamlit/NLPfiy.py", line 24, in model_list return json.loads(models.text) File "/home/test/ENTER/lib/python3.8/json/init.py", line 357, in loads return _default_decoder.decode(s) File "/home/test/ENTER/lib/python3.8/json/decoder.py", line 337, in decode obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/home/test/ENTER/lib/python3.8/json/decoder.py", line 355, in raw_decode raise JSONDecodeError("Expecting value", s, err.value) from None

opened by James0128 0

Backend server giving errors on running

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1264, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1310, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1259, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
FileNotFoundError: [Errno 2] No such file or directory

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 449, in send
    timeout=timeout
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 638, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/usr/local/lib/python3.6/dist-packages/urllib3/util/retry.py", line 368, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 600, in urlopen
    chunked=chunked)
  File "/usr/local/lib/python3.6/dist-packages/urllib3/connectionpool.py", line 354, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/usr/lib/python3.6/http/client.py", line 1264, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1310, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1259, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.6/http/client.py", line 1038, in _send_output
    self.send(msg)
  File "/usr/lib/python3.6/http/client.py", line 976, in send
    self.connect()
  File "/usr/local/lib/python3.6/dist-packages/docker/transport/unixconn.py", line 43, in connect
    sock.connect(self.unix_socket)
urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 205, in _retrieve_server_version
    return self.version(api_version=False)["ApiVersion"]
  File "/usr/local/lib/python3.6/dist-packages/docker/api/daemon.py", line 181, in version
    return self._result(self._get(url), json=True)
  File "/usr/local/lib/python3.6/dist-packages/docker/utils/decorators.py", line 46, in inner
    return f(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 228, in _get
    return self.get(url, **self._set_request_timeout(kwargs))
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 543, in get
    return self.request('GET', url, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 530, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/sessions.py", line 643, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/requests/adapters.py", line 498, in send
    raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/docker-compose", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/main.py", line 67, in main
    command()
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/main.py", line 123, in perform_command
    project = project_from_options('.', options)
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/command.py", line 69, in project_from_options
    environment_file=environment_file
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/command.py", line 132, in get_project
    verbose=verbose, version=api_version, context=context, environment=environment
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/docker_client.py", line 43, in get_client
    environment=environment, tls_version=get_tls_version(environment)
  File "/usr/local/lib/python3.6/dist-packages/compose/cli/docker_client.py", line 170, in docker_client
    client = APIClient(**kwargs)
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 188, in __init__
    self._version = self._retrieve_server_version()
  File "/usr/local/lib/python3.6/dist-packages/docker/api/client.py", line 213, in _retrieve_server_version
    'Error while fetching server API version: {0}'.format(e)
docker.errors.DockerException: Error while fetching server API version: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))

opened by garain 0

Internal server error invoking NER predict end point

Hi,

I have all other models working except for NER, which uses spaCy. lexemes.bin seems to be missing. I've used spaCy before, but not with an unpackaged model like this appears to be. Any pointers welcomed.

This is my trace:

nginx_1           | 172.20.0.1 - - [26/Aug/2020:12:29:56 +0000] "GET /api/v1/ner/docs HTTP/1.1" 200 910 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"
ner_1             | INFO:     172.20.0.6:56800 - "GET /api/v1/ner/openapi.json HTTP/1.0" 200 OK
nginx_1           | 172.20.0.1 - - [26/Aug/2020:12:29:57 +0000] "GET /api/v1/ner/openapi.json HTTP/1.1" 200 2724 "http://localhost:8080/api/v1/ner/docs" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"
nginx_1           | 172.20.0.1 - - [26/Aug/2020:12:30:15 +0000] "GET /api/v1/ner/info HTTP/1.1" 200 163 "http://localhost:8080/api/v1/ner/docs" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"
ner_1             | INFO:     172.20.0.6:56802 - "GET /api/v1/ner/info HTTP/1.0" 200 OK
ner_1             | INFO:     172.20.0.6:56810 - "POST /api/v1/ner/predict HTTP/1.0" 500 Internal Server Error
ner_1             | ERROR:    Exception in ASGI application
ner_1             | Traceback (most recent call last):
ner_1             |   File "/usr/local/lib/python3.7/site-packages/uvicorn/protocols/http/httptools_impl.py", line 385, in run_asgi
ner_1             |     result = await app(self.scope, self.receive, self.send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
ner_1             |     return await self.app(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/fastapi/applications.py", line 140, in __call__
ner_1             |     await super().__call__(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/applications.py", line 134, in __call__
ner_1             |     await self.error_middleware(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 178, in __call__
ner_1             |     raise exc from None
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/middleware/errors.py", line 156, in __call__
ner_1             |     await self.app(scope, receive, _send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 73, in __call__
ner_1             |     raise exc from None
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/exceptions.py", line 62, in __call__
ner_1             |     await self.app(scope, receive, sender)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 590, in __call__
ner_1             |     await route(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 208, in __call__
ner_1             |     await self.app(scope, receive, send)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/starlette/routing.py", line 41, in app
ner_1             |     response = await func(request)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/fastapi/routing.py", line 127, in app
ner_1             |     raw_response = await dependant.call(**values)
ner_1             |   File "./app/api/ner.py", line 46, in named_entity_recognition
ner_1             |     ner_process = NerProcessor(model=item.model.lower())
ner_1             |   File "./app/api/nerpro.py", line 21, in __init__
ner_1             |     self.model = spacy.load("./app/api/spacy/", disable=["tagger", "parser"])
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/__init__.py", line 21, in load
ner_1             |     return util.load_model(name, **overrides)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 116, in load_model
ner_1             |     return load_model_from_path(Path(name), **overrides)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 156, in load_model_from_path
ner_1             |     return nlp.from_disk(model_path)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 647, in from_disk
ner_1             |     util.from_disk(path, deserializers, exclude)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/util.py", line 511, in from_disk
ner_1             |     reader(path / key)
ner_1             |   File "/usr/local/lib/python3.7/site-packages/spacy/language.py", line 635, in <lambda>
ner_1             |     self.vocab.from_disk(p) and _fix_pretrained_vectors_name(self))),
ner_1             |   File "vocab.pyx", line 377, in spacy.vocab.Vocab.from_disk
ner_1             |   File "/usr/local/lib/python3.7/pathlib.py", line 1203, in open
ner_1             |     opener=self._opener)
ner_1             |   File "/usr/local/lib/python3.7/pathlib.py", line 1058, in _opener
ner_1             |     return self._accessor.open(self, flags, mode)
ner_1             | FileNotFoundError: [Errno 2] No such file or directory: 'app/api/spacy/vocab/lexemes.bin'
nginx_1           | 172.20.0.1 - - [26/Aug/2020:12:32:00 +0000] "POST /api/v1/ner/predict HTTP/1.1" 500 21 "http://localhost:8080/api/v1/ner/docs" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36" "-"

This is the request body I used from Swagger:

{
  "model": "spaCy",
  "text": "Dense, real valued vectors representing distributional similarity information are now a cornerstone of practical NLP. The most common way to train these vectors is the Word2vec family of algorithms. If you need to train a word2vec model, we recommend the implementation in the Python library Gensim.",
  "query": "string"
}

bug

opened by asehmi 2

Text extraction with tesseract

Nice project, similar to one I started but you got much further. One feature in mine you might like to steal is text extraction (via OCR) using tesseract. https://github.com/robmarkcole/text-insights-app
enhancement

opened by robmarkcole 1