A framework for detecting, highlighting and correcting grammatical errors on natural language text.

Prithivida

Last update: Jan 8, 2023

Related tags

Overview

Gramformer

Human and machine generated text often suffer from grammatical and/or typographical errors. It can be spelling, punctuation, grammatical or word choice errors. Gramformer is a library that exposes 3 seperate interfaces to a family of algorithms to detect, highlight and correct grammar errors. To make sure the corrections and highlights recommended are of high quality, it comes with a quality estimator. You can use Gramformer in one or more areas mentioned under the "use-cases" section below or any other usecase as you see fit. Gramformer stands on the shoulders of gaints, it combines some of the top notch researches in grammar correction. Note: It works at sentence levels and has been trained on 128 length sentences, so not (yet) suitable for long prose or paragraphs (stay tuned for upcoming releases)

Usecases for Gramformer
Installation
Quick Start
Models
Dataset
Benchmark
References
Citation

Usecases for Gramformer

Area 1: Post-processing machine generated text

Machine-Language generation is becoming mainstream, so will post-processing machine generated text.

Conditioned Text generation output(Text2Text generation).
- NMT: Machine Translated output.
- ASR or STT: Speech to text output.
- HTR: Handwritten text recognition output.
- Paraphrase generation output.
Controlled Text generation output(Text generation with PPLM) [TBD].
Free-form text generation output(Text generation)[TBD].

Area 2:Human-In-The-Loop (HITL) text

Most Supervised NLU (Chatbots and Conversational) systems need humans/experts to enter or edit text that needs to be grammtical correct otherwise the quality of HITL data can degrade the model over a period of time

Area 3:Assisted writing for humans

Integrating into custom Text editors of your Apps. (A Poor man's grammarly, if you will)

Area 4:Custom Platform integration

As of today grammatical safety nets for authoring social contents (Post or Comments) or text in messaging platforms is very little (word level correction) or non-existent.The onus is on the author to install tools like grammarly to proof read.

Messaging platforms and Social platforms can highlight / correct grammtical errors automatically without altering the meaning or intent.

Installation

pip install git+https://github.com/PrithivirajDamodaran/Gramformer.git@v0.1

Quick Start

Correcter - [Available now]

from gramformer import Gramformer
import torch

def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)


gf = Gramformer(models = 2, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 

influent_sentences = [
    "Matt like fish",
    "the collection of letters was original used by the ancient Romans",
    "We enjoys horror movies",
    "Anna and Mike is going skiing",
    "I walk to the store and I bought milk",
    "We all eat the fish and then made dessert",
    "I will eat fish for dinner and drank milk",
    "what be the reason for everyone leave the company",
]   

for influent_sentence in influent_sentences:
    corrected_sentence = gf.correct(influent_sentence)
    print("[Input] ", influent_sentence)
    print("[Correction] ",corrected_sentence[0])
    print("-" *100)

[Input]  Matt like fish
[Correction]  Matt likes fish
----------------------------------------------------------------------------------------------------
[Input]  the collection of letters was original used by the ancient Romans
[Correction]  The collection of letters was originally used by the ancient Romans.
----------------------------------------------------------------------------------------------------
[Input]  We enjoys horror movies
[Correction]  We enjoy horror movies
----------------------------------------------------------------------------------------------------
[Input]  Anna and Mike is going skiing
[Correction]  Anna and Mike are going skiing
----------------------------------------------------------------------------------------------------
[Input]  I walk to the store and I bought milk
[Correction]  I walked to the store and bought milk.
----------------------------------------------------------------------------------------------------
[Input]  We all eat the fish and then made dessert
[Correction]  We all ate the fish and then made dessert
----------------------------------------------------------------------------------------------------
[Input]  I will eat fish for dinner and drank milk
[Correction]  I'll eat fish for dinner and drink milk.
----------------------------------------------------------------------------------------------------
[Input]  what be the reason for everyone leave the company
[Correction]  what can be the reason for everyone to leave the company.
----------------------------------------------------------------------------------------------------

Challenge with generative models

While Gramformer aims to post-process outputs from the generative models, Gramformer itself is a generative model. So the question arises, who will post-process the Gramformer outputs ? (I know, very meta :-)). In general all generative models have the tendency to generate spurious text sometimes, which we cannot control. So to make sure the gramformer grammar corrections (and highlights) are as accurate as possible, A quality estimator (QE) will be added. It can estimate a error correction quality score and use that as a filter on Top-N candidates to return only the best based on the score.

Correcter with QE estimator - [Coming soon !]

from gramformer import Gramformer
gf = Gramformer(models = 2, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 
corrected_sentence = gf.correct(<your input sentence>, filter_by_quality=True, max_candidates=3)

Highlighter - [Coming soon !]

from gramformer import Gramformer
gf = Gramformer(models = 1, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 
highlighted_sentence = gf.highlight(<your input sentence>)

[Input]  Matt like fish
[Highlight]  Matt <e> like </e> fish
----------------------------------------------------------------------------------------------------
[Input]  the collection of letters was original used by the ancient Romans
[Highlight]  the collection of letters was <e> original used </e> by the ancient Romans
----------------------------------------------------------------------------------------------------
[Input]  We enjoys horror movies
[Highlight]  We <e> enjoys horror </e> movies
----------------------------------------------------------------------------------------------------
[Input]  Anna and Mike is going skiing
[Highlight]  Anna and Mike <e> is going </e> skiing
----------------------------------------------------------------------------------------------------
[Input]  I walk to the store and I bought milk
[Highlight]  I <e> walk to </e> the store and I bought milk
----------------------------------------------------------------------------------------------------
[Input]  We all eat the fish and then made dessert
[Highlight]  We all <e> eat the </e> fish and then made dessert
----------------------------------------------------------------------------------------------------
[Input]  I will eat fish for dinner and drank milk
[Highlight]  I will eat fish for dinner and <e> drank milk </e> 
----------------------------------------------------------------------------------------------------
[Input]  what be the reason for everyone leave the company
[Highlight]  <e> what be </e> the reason <e> for everyone </e> <e> leave the </e> company
----------------------------------------------------------------------------------------------------
[Input]  One of the most important issue is the lack of parking spaces at the local mall.
[Highlight]  One of the most important <e> issue is </e> the lack of parking spaces at the local mall.
----------------------------------------------------------------------------------------------------
[Input]  The survey we performed recently showed that most of customers are satisfied.
[Highlight]  The survey we performed recently showed that most <e> of customers </e> are satisfied.
----------------------------------------------------------------------------------------------------
[Input]  I’ve loved classical music ever since I was child.
[Highlight]  I’ve loved classical music ever since I <e> was child </e>.
----------------------------------------------------------------------------------------------------

Detector - [Coming soon !]

from gramformer import Gramformer
gf = Gramformer(models = 0, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 
grammar_fluency_score = gf.detect(<your input sentence>)

Models

Model	Type	Return	status
prithivida/grammar_error_detector	Classifier	Label	TBD (prithivida/parrot_fluency_on_BERT can be repurposed here, but I would recommend you wait :-))
prithivida/grammar_error_highlighter	Seq2Seq	Grammar errors enclosed in `<e> and </e>`	Beta
prithivida/grammar_error_correcter	Seq2Seq	The corrected sentence	Beta

Dataset

First idea is to generate the dataset using the techniques mentioned in the first paper highlighted in reference section. You can use the technique on anyone of the publicy available wikipedia edits datasets. Write some rules to filter only the grammatical edits, do some cleanup and thats it Bob's your uncle :-).
Second and possibly very complicated and $$$ way to get some 200M synthetic sentences. This is based on the last paper under references section. Not recommended but by all means knock yourself out if you are interested :-)
Third source is to repurpose the GEC Task data
I combined sources 1 and 3 to get my training data (still working on source 2, will keep you posted)
I ended up with ~1M records and after some heurtistics based filtering amounted to ~1/2M records.
It took ~12 hours to train each of the above models.

Benchmark

TBD (I will benchmark grammformer models against the following publicy available models: salesken/grammar_correction and flexudy/t5-small-wav2vec2-grammar-fixer shortly.

References

Citation

TBD

Comments

[Spacy error] Can't find model 'en'

Hello I have successfully installed the Gramformer on my windows PC. but when I run, it gives the following error.

Traceback (most recent call last):
  File "main.py", line 27, in <module>
    grammar_correction = Gramformer(models = 1, use_gpu=True)
  File "~~\.conda\envs\nlp-transformer\lib\site-packages\gramformer\gramformer.py", line 8, in __init__
    self.annotator = errant.load('en')
  File "~~\.conda\envs\nlp-transformer\lib\site-packages\errant\__init__.py", line 16, in load
    nlp = nlp or spacy.load(lang, disable=["ner"])
  File "~~\.conda\envs\nlp-transformer\lib\site-packages\spacy\__init__.py", line 30, in load
    return util.load_model(name, **overrides)
  File "~~\.conda\envs\nlp-transformer\lib\site-packages\spacy\util.py", line 175, in load_model
    raise IOError(Errors.E050.format(name=name))
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

opened by muzamil47 3

Commercial use issue

Hey @PrithivirajDamodaran

The readme states that Gramformer versions above 1.0 are allowed for commercial use - however, this is not currently the case as the grammar_error_correcter_v1 model has been trained using the non-commercial WI&Locness data, even though the documentation states otherwise:

The grammar_error_correcter_v1 model is actually identical to the previous grammar_error_correcter model which is trained using the non-commercial WI&Locness data – they have identical weights, which you can verify with this script

As the models are the same, this means that both models have been trained using the non-commercial WI&Locness data, and the grammar_error_correcter_v1 model along with Gramformer v1.1 and v1.2 should not be allowed for commercial use.

Could you please update the readme to clarify this, or upload a new model that has not been trained using WI&Locness?

Thanks
question

opened by SimonHFL 2
Use corrector for highligher

Hi @PrithivirajDamodaran

This is a great framework. Is it possible (for now) to use model corrector (model=2) for the highlighter(model=1)? After getting some correction, match it to the input and give prefix and suffix () for the mismatch?

Thanks
question

opened by ilhamsyahids 2

Error loading the tokenizer in transformers==4.4.2

I'm getting error when initializing the class object, specifically at tokenizer loading:

In [6]: correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-6-d34dd9c5fe99> in <module>
----> 1 correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    414             tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
    415             if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
--> 416                 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
    417             else:
    418                 if tokenizer_class_py is not None:

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
   1703
   1704         return cls._from_pretrained(
-> 1705             resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
   1706         )
   1707

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs)
   1774         # Instantiate tokenizer.
   1775         try:
-> 1776             tokenizer = cls(*init_inputs, **init_kwargs)
   1777         except OSError:
   1778             raise OSError(

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/models/t5/tokenization_t5_fast.py in __init__(self, vocab_file, tokenizer_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, **kwargs)
    134             extra_ids=extra_ids,
    135             additional_special_tokens=additional_special_tokens,
--> 136             **kwargs,
    137         )
    138

~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_fast.py in __init__(self, *args, **kwargs)
     85         if fast_tokenizer_file is not None and not from_slow:
     86             # We have a serialization from tokenizers which let us directly build the backend
---> 87             fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
     88         elif slow_tokenizer is not None:
     89             # We need to convert a slow tokenizer to build the backend

Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 1 column 329667

transformers==4.4.2.

The installation package didn't specify the transformers version that this library is using. What should be the correct version? Or is it version independent and it's something else?

opened by zhangyilun 2

Figma Gramformer Plugin

Figma is used in creating a lot of digital interfaces today, a Gramformer Figma plugin would go a long way. I'll be willing to design the interface for the plugin but I don't know how to make the plugin itself. I hope someone takes this up. This is a link to get started https://www.figma.com/plugin-docs/setup/
enhancement

opened by ayoolafelix 2
README.md get_edits and get_highlight example small fixes

Hi there, when I copy and pasted the examples in the README locally I noticed they were bugging out for the edits and highlights (were only pulling the first char of the sentence for errant). Providing the full sentence seemed to get the desired output.

opened by parisac 1
Training dataset

Hi Prithiviraj,

Is there any chance you'd be able to release the training dataset you used to train the Gramformer huggingface model? I see that there are some details on the slices of data that you brought together in the Readme, but it would be useful to be able to use the same data that you used.

The main reason I'm asking is I'd like to create a model that can take correct text and add grammatical errors to it. So I was thinking I could take the dataset you used to train Gramformer and use the inverse to train a model that does the inverse. I can go through the data prep process as you did, but it would definitely be easier if I were able to reuse yours, and it might be useful for reproducibility for others as well.
invalid question

opened by d4buss 1

OSError: Can't load config for 'prithivida/grammar_error_correcter'

Hi, I have been using your code for the last few days. Suddenly, it started to crash.

Have a look at the code and error given below:

Code (Link: https://huggingface.co/prithivida/grammar_error_correcter_v1):

from gramformer import Gramformer
import torch

def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)


gf = Gramformer(models = 2, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 

influent_sentences = [
    "Matt like fish",
    "the collection of letters was original used by the ancient Romans",
    "We enjoys horror movies",
    "Anna and Mike is going skiing",
    "I walk to the store and I bought milk",
    "We all eat the fish and then made dessert",
    "I will eat fish for dinner and drank milk",
    "what be the reason for everyone leave the company",
]   

for influent_sentence in influent_sentences:
    corrected_sentence = gf.correct(influent_sentence)
    print("[Input] ", influent_sentence)
    print("[Correction] ",corrected_sentence[0])
    print("-" *100)

Error

404 Client Error: Not Found for url: https://huggingface.co/prithivida/grammar_error_correcter/resolve/main/config.json
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    491                 use_auth_token=use_auth_token,
--> 492                 user_agent=user_agent,
    493             )

7 frames
/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)
   1278             use_auth_token=use_auth_token,
-> 1279             local_files_only=local_files_only,
   1280         )

/usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)
   1441             r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)
-> 1442             r.raise_for_status()
   1443             etag = r.headers.get("X-Linked-Etag") or r.headers.get("ETag")

/usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
    942         if http_error_msg:
--> 943             raise HTTPError(http_error_msg, response=self)
    944 

HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/prithivida/grammar_error_correcter/resolve/main/config.json

During handling of the above exception, another exception occurred:

OSError                                   Traceback (most recent call last)
<ipython-input-10-0f43e537fe87> in <module>
     10 
     11 
---> 12 gf = Gramformer(models = 2, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all
     13 
     14 influent_sentences = [

/usr/local/lib/python3.7/dist-packages/gramformer/gramformer.py in __init__(self, models, use_gpu)
     14 
     15     if models == 2:
---> 16         self.correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
     17         self.correction_model     = AutoModelForSeq2SeqLM.from_pretrained(correction_model_tag)
     18         self.correction_model     = self.correction_model.to(device)

/usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    400         kwargs["_from_auto"] = True
    401         if not isinstance(config, PretrainedConfig):
--> 402             config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
    403 
    404         use_fast = kwargs.pop("use_fast", True)

/usr/local/lib/python3.7/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    428         """
    429         kwargs["_from_auto"] = True
--> 430         config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
    431         if "model_type" in config_dict:
    432             config_class = CONFIG_MAPPING[config_dict["model_type"]]

/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
    502                 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n"
    503             )
--> 504             raise EnvironmentError(msg)
    505 
    506         except json.JSONDecodeError:

OSError: Can't load config for 'prithivida/grammar_error_correcter'. Make sure that:

- 'prithivida/grammar_error_correcter' is a correct model identifier listed on 'https://huggingface.co/models'

- or 'prithivida/grammar_error_correcter' is the correct path to a directory containing a config.json file
![Screenshot from 2021-07-01 18-36-07](https://user-images.githubusercontent.com/4704211/124133526-5a9da900-da9b-11eb-9733-61df46ab01e1.png)

Possible Solution:

Rename this link from: https://huggingface.co/prithivida/grammar_error_correcter/ to: https://huggingface.co/prithivida/grammar_error_correcter_v1/

Please help me fix this. thank you

opened by Nomiluks 1

Inference Issue !!!
OSError Traceback (most recent call last)

/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 241 if resolved_config_file is None: --> 242 raise EnvironmentError 243 config_dict = cls._dict_from_json_file(resolved_config_file)

OSError:

During handling of the above exception, another exception occurred:

OSError Traceback (most recent call last)

3 frames

in () ----> 1 correction_tokenizer = AutoTokenizer.from_pretrained("prithivida/grammar_error_correcter") 2 correction_model = AutoModelForSeq2SeqLM.from_pretrained("prithivida/grammar_error_correcter") 3 print("[Gramformer] Grammar error correction model loaded..") 4 5

/usr/local/lib/python3.7/dist-packages/transformers/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 204 config = kwargs.pop("config", None) 205 if not isinstance(config, PretrainedConfig): --> 206 config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs) 207 208 if "bert-base-japanese" in str(pretrained_model_name_or_path):

/usr/local/lib/python3.7/dist-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs) 201 202 """ --> 203 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) 204 205 if "model_type" in config_dict:

/usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 249 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n" 250 ) --> 251 raise EnvironmentError(msg) 252 253 except json.JSONDecodeError:

OSError: Can't load config for 'prithivida/grammar_error_correcter'. Make sure that:

'prithivida/grammar_error_correcter' is a correct model identifier listed on 'https://huggingface.co/models'

or 'prithivida/grammar_error_correcter' is the correct path to a directory containing a config.json file

Solutions for this issue????
invalid
opened by sabhi27 1
How to train Gramformer on non-English languages.

Hey @PrithivirajDamodaran , Great work on building Gramformer, ive played with it and the results are amazing.

I work on pushing nlp forward in under represented languages, and hence i humbly request you to please tell me how do i train gramformer on non-English sentences ?

I checked out your HuggingFace page 'https://huggingface.co/prithivida/grammar_error_correcter' but coudn't find any resources on how to train gramformer from scratch. If you could help me in training Gramformer on non-English langauages it would really mean a lot to me. Do let me know.

Thanks
question

opened by StephennFernandes 1
pip install is erroring out,

I am unable to do pip install of the package, here is the error:

Collecting git+https://github.com/PrithivirajDamodaran/[email protected] Cloning https://github.com/PrithivirajDamodaran/Gramformer.git (to revision v0.1) to c:\users\sumit\appdata\local\temp\pip-req-build-sw54k_0h ERROR: Error [WinError 2] The system cannot find the file specified while executing command git clone -q https://github.com/PrithivirajDamodaran/Gramformer.git 'C:\Users\Sumit\AppData\Local\Temp\pip-req-build-sw54k_0h' ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

I also tried directly downloading the repo and tried executing the package. Model is not present in location(correction_model_tag = "prithivida/grammar_error_correcter"). Any way to download the pretrain model.

opened by ranjan-sumit 1
OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

OSError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_9376\2706950954.py in 25 26 ---> 27 gf = Gramformer(models = 1, use_gpu=False) # 1=corrector, 2=detector 28 29 influent_sentences = [

~\anaconda3_9\envs\python37\lib\site-packages\gramformer\gramformer.py in init(self, models, use_gpu) 7 import errant 8 #self.annotator = errant.load('en_core_web_sm') ----> 9 self.annotator = errant.load('en') # en is deprecated from spacy 3.0 onwards 10 11 if use_gpu:

~\anaconda3_9\envs\python37\lib\site-packages\errant_init_.py in load(lang, nlp) 17 18 # Load spacy ---> 19 nlp = nlp or spacy.load(lang, disable=["ner"]) 20 21 # Load language edit merger

~\anaconda3_9\envs\python37\lib\site-packages\spacy_init_.py in load(name, **overrides) 28 if depr_path not in (True, False, None): 29 warnings.warn(Warnings.W001.format(path=depr_path), DeprecationWarning) ---> 30 return util.load_model(name, **overrides) 31 32

~\anaconda3_9\envs\python37\lib\site-packages\spacy\util.py in load_model(name, **overrides) 173 elif hasattr(name, "exists"): # Path or Path-like to model data 174 return load_model_from_path(name, **overrides) --> 175 raise IOError(Errors.E050.format(name=name)) 176 177

OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

opened by vky2998 2
Word limit

The model is having trouble with long sentences. Specially if the words in the sentences are in upper case. It outputs only limited sentence as an output and the rest neglected sentence is shown as error.

opened by Talib6509 0
Gramformer Highlight function not working

Hello... I'm trying to get the edits between two sentences, but the highlight function is not working. Has anybody faced the same issue? Many thanks in advance

opened by NourAlMerey 0
Suggestions to improve the grammar results for short sentences

Hello..!

I have used Gramformer model and I think this could be quite useful for checking and correcting some grammar points, especially for correcting singular/plural, verb forms and tenses, and spelling. However, some other grammar points (like correcting sentence structure, comparative/superlative forms, pronoun cases, etc.) seem to be still tricky.

Note: I need to use the model on short sentences.

The biggest challenge I faced in my case is: (Please suggest how to avoid it or improve it or changing some parameters...) 1 - Since it corrects grammar by generating text, most of the time it completely changes the sentence and rephrase it. How can we avoid this.

whose bags you can bring? --> Which bags you can bring? (Just a sample, and sometime it generates totally changed verbose sentence)

2 - Every time I give the same sentence as input, it generates different outputs:

I go can there: three outputs in three different run ("I go, there"., "can I go there?", "I go back there.")

Thanks!

opened by muzamil47 0

Releases(v1.4)

v1.4(Aug 10, 2021)

⚡️ Features added/changed

✅ Correct API uses a ranker to sort good quality corrections. ✅ Highlight API returns sents w/errors marked up as readable tags. ✅ Edit API returns error types, positions, and respective corrections. ✅ The latest model checkpoint has been refreshed w/more data.

License update to MIT.
Source code(tar.gz)
Source code(zip)

Owner

Prithivida

Applied NLP, XAI for NLP and Data Engineering

GitHub

:sparkles: Surface lint errors during code review

✨ Linty Fresh ✨ Keep your codebase sparkly clean with the power of LINT! Linty Fresh parses lint errors and report them back to GitHub as comments on

183 Dec 18, 2022

A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycodestyle.

flake8-bugbear A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycode

869 Dec 30, 2022

Optional static typing for Python 3 and 2 (PEP 484)

Mypy: Optional Static Typing for Python Got a question? Join us on Gitter! We don't have a mailing list; but we are always happy to answer questions o

14.4k Jan 8, 2023

The strictest and most opinionated python linter ever!

wemake-python-styleguide Welcome to the strictest and most opinionated python linter ever. wemake-python-styleguide is actually a flake8 plugin with s

2.1k Jan 1, 2023

coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." ― John F. Woods coala provides a

3.4k Dec 29, 2022

Mypy stubs, i.e., type information, for numpy, pandas and matplotlib

Mypy type stubs for NumPy, pandas, and Matplotlib This is a PEP-561-compliant stub-only package which provides type information for matplotlib, numpy

194 Dec 19, 2022

Automated security testing using bandit and flake8.

flake8-bandit Automated security testing built right into your workflow! You already use flake8 to lint all your code for errors, ensure docstrings ar

96 Jan 1, 2023

Easy saving and switching between multiple KDE configurations.

Konfsave Konfsave is a config manager. That is, it allows you to save, back up, and easily switch between different (per-user) system configurations.

42 Sep 25, 2022

PEP-484 typing stubs for SQLAlchemy 1.4 and SQLAlchemy 2.0

SQLAlchemy 2 Stubs These are PEP-484 typing stubs for SQLAlchemy 1.4 and 2.0. They are released concurrently along with a Mypy extension which is desi

139 Dec 30, 2022

Utilities for pycharm code formatting (flake8 and black)

Pycharm External Tools Extentions to Pycharm code formatting tools. Currently supported are flake8 and black on a selected code block. Usage Flake8 [P

13 Nov 3, 2022

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

44 Jul 28, 2022

Alleviating Over-segmentation Errors by Detecting Action Boundaries

Alleviating Over-segmentation Errors by Detecting Action Boundaries Forked from ASRF offical code. This repo is the a implementation of replacing orig

13 Dec 12, 2022

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

Styleformer A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/cas

431 Dec 19, 2022

A framework for detecting, highlighting and correcting grammatical errors on natural language text.

Related tags

Overview

Gramformer

Table of contents

Usecases for Gramformer

Installation

Quick Start

Correcter - [Available now]

Challenge with generative models

Correcter with QE estimator - [Coming soon !]

Highlighter - [Coming soon !]

Detector - [Coming soon !]

Models

Dataset

Benchmark

References

Citation

Comments

Releases(v1.4)

v1.4(Aug 10, 2021)

Owner

Prithivida

:sparkles: Surface lint errors during code review

A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycodestyle.

Optional static typing for Python 3 and 2 (PEP 484)

The strictest and most opinionated python linter ever!

coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.

Mypy stubs, i.e., type information, for numpy, pandas and matplotlib

Automated security testing using bandit and flake8.

Easy saving and switching between multiple KDE configurations.

PEP-484 typing stubs for SQLAlchemy 1.4 and SQLAlchemy 2.0

Utilities for pycharm code formatting (flake8 and black)

Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

Alleviating Over-segmentation Errors by Detecting Action Boundaries

A Neural Language Style Transfer framework to transfer natural language text smoothly between fine-grained language styles like formal/casual, active/passive, and many more. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》；欢迎试用，关注，并反馈问题...

Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)