A framework for detecting, highlighting and correcting grammatical errors on natural language text.

Overview

PyPI - License Visits Badge

Gramformer

Human and machine generated text often suffer from grammatical and/or typographical errors. It can be spelling, punctuation, grammatical or word choice errors. Gramformer is a library that exposes 3 seperate interfaces to a family of algorithms to detect, highlight and correct grammar errors. To make sure the corrections and highlights recommended are of high quality, it comes with a quality estimator. You can use Gramformer in one or more areas mentioned under the "use-cases" section below or any other usecase as you see fit. Gramformer stands on the shoulders of gaints, it combines some of the top notch researches in grammar correction. Note: It works at sentence levels and has been trained on 128 length sentences, so not (yet) suitable for long prose or paragraphs (stay tuned for upcoming releases)

Table of contents

Usecases for Gramformer

Area 1: Post-processing machine generated text

Machine-Language generation is becoming mainstream, so will post-processing machine generated text.

  • Conditioned Text generation output(Text2Text generation).
    • NMT: Machine Translated output.
    • ASR or STT: Speech to text output.
    • HTR: Handwritten text recognition output.
    • Paraphrase generation output.
  • Controlled Text generation output(Text generation with PPLM) [TBD].
  • Free-form text generation output(Text generation)[TBD].

Area 2:Human-In-The-Loop (HITL) text

  • Most Supervised NLU (Chatbots and Conversational) systems need humans/experts to enter or edit text that needs to be grammtical correct otherwise the quality of HITL data can degrade the model over a period of time

Area 3:Assisted writing for humans

  • Integrating into custom Text editors of your Apps. (A Poor man's grammarly, if you will)

Area 4:Custom Platform integration

As of today grammatical safety nets for authoring social contents (Post or Comments) or text in messaging platforms is very little (word level correction) or non-existent.The onus is on the author to install tools like grammarly to proof read.

  • Messaging platforms and Social platforms can highlight / correct grammtical errors automatically without altering the meaning or intent.

Installation

pip install git+https://github.com/PrithivirajDamodaran/Gramformer.git@v0.1

Quick Start

Correcter - [Available now]

from gramformer import Gramformer
import torch

def set_seed(seed):
  torch.manual_seed(seed)
  if torch.cuda.is_available():
    torch.cuda.manual_seed_all(seed)

set_seed(1212)


gf = Gramformer(models = 2, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 

influent_sentences = [
    "Matt like fish",
    "the collection of letters was original used by the ancient Romans",
    "We enjoys horror movies",
    "Anna and Mike is going skiing",
    "I walk to the store and I bought milk",
    "We all eat the fish and then made dessert",
    "I will eat fish for dinner and drank milk",
    "what be the reason for everyone leave the company",
]   

for influent_sentence in influent_sentences:
    corrected_sentence = gf.correct(influent_sentence)
    print("[Input] ", influent_sentence)
    print("[Correction] ",corrected_sentence[0])
    print("-" *100)
[Input]  Matt like fish
[Correction]  Matt likes fish
----------------------------------------------------------------------------------------------------
[Input]  the collection of letters was original used by the ancient Romans
[Correction]  The collection of letters was originally used by the ancient Romans.
----------------------------------------------------------------------------------------------------
[Input]  We enjoys horror movies
[Correction]  We enjoy horror movies
----------------------------------------------------------------------------------------------------
[Input]  Anna and Mike is going skiing
[Correction]  Anna and Mike are going skiing
----------------------------------------------------------------------------------------------------
[Input]  I walk to the store and I bought milk
[Correction]  I walked to the store and bought milk.
----------------------------------------------------------------------------------------------------
[Input]  We all eat the fish and then made dessert
[Correction]  We all ate the fish and then made dessert
----------------------------------------------------------------------------------------------------
[Input]  I will eat fish for dinner and drank milk
[Correction]  I'll eat fish for dinner and drink milk.
----------------------------------------------------------------------------------------------------
[Input]  what be the reason for everyone leave the company
[Correction]  what can be the reason for everyone to leave the company.
----------------------------------------------------------------------------------------------------

Challenge with generative models

While Gramformer aims to post-process outputs from the generative models, Gramformer itself is a generative model. So the question arises, who will post-process the Gramformer outputs ? (I know, very meta :-)). In general all generative models have the tendency to generate spurious text sometimes, which we cannot control. So to make sure the gramformer grammar corrections (and highlights) are as accurate as possible, A quality estimator (QE) will be added. It can estimate a error correction quality score and use that as a filter on Top-N candidates to return only the best based on the score.

Correcter with QE estimator - [Coming soon !]

from gramformer import Gramformer
gf = Gramformer(models = 2, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 
corrected_sentence = gf.correct(<your input sentence>, filter_by_quality=True, max_candidates=3)

Highlighter - [Coming soon !]

from gramformer import Gramformer
gf = Gramformer(models = 1, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 
highlighted_sentence = gf.highlight(<your input sentence>)
[Input]  Matt like fish
[Highlight]  Matt <e> like </e> fish
----------------------------------------------------------------------------------------------------
[Input]  the collection of letters was original used by the ancient Romans
[Highlight]  the collection of letters was <e> original used </e> by the ancient Romans
----------------------------------------------------------------------------------------------------
[Input]  We enjoys horror movies
[Highlight]  We <e> enjoys horror </e> movies
----------------------------------------------------------------------------------------------------
[Input]  Anna and Mike is going skiing
[Highlight]  Anna and Mike <e> is going </e> skiing
----------------------------------------------------------------------------------------------------
[Input]  I walk to the store and I bought milk
[Highlight]  I <e> walk to </e> the store and I bought milk
----------------------------------------------------------------------------------------------------
[Input]  We all eat the fish and then made dessert
[Highlight]  We all <e> eat the </e> fish and then made dessert
----------------------------------------------------------------------------------------------------
[Input]  I will eat fish for dinner and drank milk
[Highlight]  I will eat fish for dinner and <e> drank milk </e> 
----------------------------------------------------------------------------------------------------
[Input]  what be the reason for everyone leave the company
[Highlight]  <e> what be </e> the reason <e> for everyone </e> <e> leave the </e> company
----------------------------------------------------------------------------------------------------
[Input]  One of the most important issue is the lack of parking spaces at the local mall.
[Highlight]  One of the most important <e> issue is </e> the lack of parking spaces at the local mall.
----------------------------------------------------------------------------------------------------
[Input]  The survey we performed recently showed that most of customers are satisfied.
[Highlight]  The survey we performed recently showed that most <e> of customers </e> are satisfied.
----------------------------------------------------------------------------------------------------
[Input]  I’ve loved classical music ever since I was child.
[Highlight]  I’ve loved classical music ever since I <e> was child </e>.
----------------------------------------------------------------------------------------------------

Detector - [Coming soon !]

from gramformer import Gramformer
gf = Gramformer(models = 0, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 
grammar_fluency_score = gf.detect(<your input sentence>)

Models

Model Type Return status
prithivida/grammar_error_detector Classifier Label TBD (prithivida/parrot_fluency_on_BERT can be repurposed here, but I would recommend you wait :-))
prithivida/grammar_error_highlighter Seq2Seq Grammar errors enclosed in <e> and </e> Beta
prithivida/grammar_error_correcter Seq2Seq The corrected sentence Beta

Dataset

  • First idea is to generate the dataset using the techniques mentioned in the first paper highlighted in reference section. You can use the technique on anyone of the publicy available wikipedia edits datasets. Write some rules to filter only the grammatical edits, do some cleanup and thats it Bob's your uncle :-).
  • Second and possibly very complicated and $$$ way to get some 200M synthetic sentences. This is based on the last paper under references section. Not recommended but by all means knock yourself out if you are interested :-)
  • Third source is to repurpose the GEC Task data
  • I combined sources 1 and 3 to get my training data (still working on source 2, will keep you posted)
  • I ended up with ~1M records and after some heurtistics based filtering amounted to ~1/2M records.
  • It took ~12 hours to train each of the above models.

Benchmark

TBD (I will benchmark grammformer models against the following publicy available models: salesken/grammar_correction and flexudy/t5-small-wav2vec2-grammar-fixer shortly.

References

Citation

TBD

Comments
  • [Spacy error] Can't find model 'en'

    [Spacy error] Can't find model 'en'

    Hello I have successfully installed the Gramformer on my windows PC. but when I run, it gives the following error.

    Traceback (most recent call last):
      File "main.py", line 27, in <module>
        grammar_correction = Gramformer(models = 1, use_gpu=True)
      File "~~\.conda\envs\nlp-transformer\lib\site-packages\gramformer\gramformer.py", line 8, in __init__
        self.annotator = errant.load('en')
      File "~~\.conda\envs\nlp-transformer\lib\site-packages\errant\__init__.py", line 16, in load
        nlp = nlp or spacy.load(lang, disable=["ner"])
      File "~~\.conda\envs\nlp-transformer\lib\site-packages\spacy\__init__.py", line 30, in load
        return util.load_model(name, **overrides)
      File "~~\.conda\envs\nlp-transformer\lib\site-packages\spacy\util.py", line 175, in load_model
        raise IOError(Errors.E050.format(name=name))
    OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.
    
    opened by muzamil47 3
  • Commercial use issue

    Commercial use issue

    Hey @PrithivirajDamodaran

    The readme states that Gramformer versions above 1.0 are allowed for commercial use - however, this is not currently the case as the grammar_error_correcter_v1 model has been trained using the non-commercial WI&Locness data, even though the documentation states otherwise:

    The grammar_error_correcter_v1 model is actually identical to the previous grammar_error_correcter model which is trained using the non-commercial WI&Locness data – they have identical weights, which you can verify with this script

    As the models are the same, this means that both models have been trained using the non-commercial WI&Locness data, and the grammar_error_correcter_v1 model along with Gramformer v1.1 and v1.2 should not be allowed for commercial use.

    Could you please update the readme to clarify this, or upload a new model that has not been trained using WI&Locness?

    Thanks

    question 
    opened by SimonHFL 2
  • Use corrector for highligher

    Use corrector for highligher

    Hi @PrithivirajDamodaran

    This is a great framework. Is it possible (for now) to use model corrector (model=2) for the highlighter(model=1)? After getting some correction, match it to the input and give prefix and suffix () for the mismatch?

    Thanks

    question 
    opened by ilhamsyahids 2
  • Error loading the tokenizer in transformers==4.4.2

    Error loading the tokenizer in transformers==4.4.2

    I'm getting error when initializing the class object, specifically at tokenizer loading:

    In [6]: correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
    ---------------------------------------------------------------------------
    Exception                                 Traceback (most recent call last)
    <ipython-input-6-d34dd9c5fe99> in <module>
    ----> 1 correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
    
    ~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
        414             tokenizer_class_py, tokenizer_class_fast = TOKENIZER_MAPPING[type(config)]
        415             if tokenizer_class_fast and (use_fast or tokenizer_class_py is None):
    --> 416                 return tokenizer_class_fast.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
        417             else:
        418                 if tokenizer_class_py is not None:
    
    ~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in from_pretrained(cls, pretrained_model_name_or_path, *init_inputs, **kwargs)
       1703
       1704         return cls._from_pretrained(
    -> 1705             resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs
       1706         )
       1707
    
    ~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_base.py in _from_pretrained(cls, resolved_vocab_files, pretrained_model_name_or_path, init_configuration, *init_inputs, **kwargs)
       1774         # Instantiate tokenizer.
       1775         try:
    -> 1776             tokenizer = cls(*init_inputs, **init_kwargs)
       1777         except OSError:
       1778             raise OSError(
    
    ~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/models/t5/tokenization_t5_fast.py in __init__(self, vocab_file, tokenizer_file, eos_token, unk_token, pad_token, extra_ids, additional_special_tokens, **kwargs)
        134             extra_ids=extra_ids,
        135             additional_special_tokens=additional_special_tokens,
    --> 136             **kwargs,
        137         )
        138
    
    ~/anaconda3/envs/npe/lib/python3.6/site-packages/transformers/tokenization_utils_fast.py in __init__(self, *args, **kwargs)
         85         if fast_tokenizer_file is not None and not from_slow:
         86             # We have a serialization from tokenizers which let us directly build the backend
    ---> 87             fast_tokenizer = TokenizerFast.from_file(fast_tokenizer_file)
         88         elif slow_tokenizer is not None:
         89             # We need to convert a slow tokenizer to build the backend
    
    Exception: data did not match any variant of untagged enum PyPreTokenizerTypeWrapper at line 1 column 329667
    

    transformers==4.4.2.

    The installation package didn't specify the transformers version that this library is using. What should be the correct version? Or is it version independent and it's something else?

    opened by zhangyilun 2
  • Figma Gramformer Plugin

    Figma Gramformer Plugin

    Figma is used in creating a lot of digital interfaces today, a Gramformer Figma plugin would go a long way. I'll be willing to design the interface for the plugin but I don't know how to make the plugin itself. I hope someone takes this up. This is a link to get started https://www.figma.com/plugin-docs/setup/

    enhancement 
    opened by ayoolafelix 2
  • README.md get_edits and get_highlight example small fixes

    README.md get_edits and get_highlight example small fixes

    Hi there, when I copy and pasted the examples in the README locally I noticed they were bugging out for the edits and highlights (were only pulling the first char of the sentence for errant). Providing the full sentence seemed to get the desired output.

    opened by parisac 1
  • Training dataset

    Training dataset

    Hi Prithiviraj,

    Is there any chance you'd be able to release the training dataset you used to train the Gramformer huggingface model? I see that there are some details on the slices of data that you brought together in the Readme, but it would be useful to be able to use the same data that you used.

    The main reason I'm asking is I'd like to create a model that can take correct text and add grammatical errors to it. So I was thinking I could take the dataset you used to train Gramformer and use the inverse to train a model that does the inverse. I can go through the data prep process as you did, but it would definitely be easier if I were able to reuse yours, and it might be useful for reproducibility for others as well.

    invalid question 
    opened by d4buss 1
  • OSError: Can't load config for 'prithivida/grammar_error_correcter'

    OSError: Can't load config for 'prithivida/grammar_error_correcter'

    Hi, I have been using your code for the last few days. Suddenly, it started to crash.

    Have a look at the code and error given below:

    Code (Link: https://huggingface.co/prithivida/grammar_error_correcter_v1):

    from gramformer import Gramformer
    import torch
    
    def set_seed(seed):
      torch.manual_seed(seed)
      if torch.cuda.is_available():
        torch.cuda.manual_seed_all(seed)
    
    set_seed(1212)
    
    
    gf = Gramformer(models = 2, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all 
    
    influent_sentences = [
        "Matt like fish",
        "the collection of letters was original used by the ancient Romans",
        "We enjoys horror movies",
        "Anna and Mike is going skiing",
        "I walk to the store and I bought milk",
        "We all eat the fish and then made dessert",
        "I will eat fish for dinner and drank milk",
        "what be the reason for everyone leave the company",
    ]   
    
    for influent_sentence in influent_sentences:
        corrected_sentence = gf.correct(influent_sentence)
        print("[Input] ", influent_sentence)
        print("[Correction] ",corrected_sentence[0])
        print("-" *100)
    

    Error

    404 Client Error: Not Found for url: https://huggingface.co/prithivida/grammar_error_correcter/resolve/main/config.json
    ---------------------------------------------------------------------------
    HTTPError                                 Traceback (most recent call last)
    /usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
        491                 use_auth_token=use_auth_token,
    --> 492                 user_agent=user_agent,
        493             )
    
    7 frames
    /usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in cached_path(url_or_filename, cache_dir, force_download, proxies, resume_download, user_agent, extract_compressed_file, force_extract, use_auth_token, local_files_only)
       1278             use_auth_token=use_auth_token,
    -> 1279             local_files_only=local_files_only,
       1280         )
    
    /usr/local/lib/python3.7/dist-packages/transformers/file_utils.py in get_from_cache(url, cache_dir, force_download, proxies, etag_timeout, resume_download, user_agent, use_auth_token, local_files_only)
       1441             r = requests.head(url, headers=headers, allow_redirects=False, proxies=proxies, timeout=etag_timeout)
    -> 1442             r.raise_for_status()
       1443             etag = r.headers.get("X-Linked-Etag") or r.headers.get("ETag")
    
    /usr/local/lib/python3.7/dist-packages/requests/models.py in raise_for_status(self)
        942         if http_error_msg:
    --> 943             raise HTTPError(http_error_msg, response=self)
        944 
    
    HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/prithivida/grammar_error_correcter/resolve/main/config.json
    
    During handling of the above exception, another exception occurred:
    
    OSError                                   Traceback (most recent call last)
    <ipython-input-10-0f43e537fe87> in <module>
         10 
         11 
    ---> 12 gf = Gramformer(models = 2, use_gpu=False) # 0=detector, 1=highlighter, 2=corrector, 3=all
         13 
         14 influent_sentences = [
    
    /usr/local/lib/python3.7/dist-packages/gramformer/gramformer.py in __init__(self, models, use_gpu)
         14 
         15     if models == 2:
    ---> 16         self.correction_tokenizer = AutoTokenizer.from_pretrained(correction_model_tag)
         17         self.correction_model     = AutoModelForSeq2SeqLM.from_pretrained(correction_model_tag)
         18         self.correction_model     = self.correction_model.to(device)
    
    /usr/local/lib/python3.7/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
        400         kwargs["_from_auto"] = True
        401         if not isinstance(config, PretrainedConfig):
    --> 402             config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
        403 
        404         use_fast = kwargs.pop("use_fast", True)
    
    /usr/local/lib/python3.7/dist-packages/transformers/models/auto/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
        428         """
        429         kwargs["_from_auto"] = True
    --> 430         config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
        431         if "model_type" in config_dict:
        432             config_class = CONFIG_MAPPING[config_dict["model_type"]]
    
    /usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs)
        502                 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n"
        503             )
    --> 504             raise EnvironmentError(msg)
        505 
        506         except json.JSONDecodeError:
    
    OSError: Can't load config for 'prithivida/grammar_error_correcter'. Make sure that:
    
    - 'prithivida/grammar_error_correcter' is a correct model identifier listed on 'https://huggingface.co/models'
    
    - or 'prithivida/grammar_error_correcter' is the correct path to a directory containing a config.json file
    ![Screenshot from 2021-07-01 18-36-07](https://user-images.githubusercontent.com/4704211/124133526-5a9da900-da9b-11eb-9733-61df46ab01e1.png)
    
    

    Possible Solution:

    Rename this link from: https://huggingface.co/prithivida/grammar_error_correcter/ to: https://huggingface.co/prithivida/grammar_error_correcter_v1/

    Please help me fix this. thank you

    opened by Nomiluks 1
  • Inference Issue !!!

    Inference Issue !!!

    OSError Traceback (most recent call last)

    /usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 241 if resolved_config_file is None: --> 242 raise EnvironmentError 243 config_dict = cls._dict_from_json_file(resolved_config_file)

    OSError:

    During handling of the above exception, another exception occurred:

    OSError Traceback (most recent call last)

    3 frames

    in () ----> 1 correction_tokenizer = AutoTokenizer.from_pretrained("prithivida/grammar_error_correcter") 2 correction_model = AutoModelForSeq2SeqLM.from_pretrained("prithivida/grammar_error_correcter") 3 print("[Gramformer] Grammar error correction model loaded..") 4 5

    /usr/local/lib/python3.7/dist-packages/transformers/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 204 config = kwargs.pop("config", None) 205 if not isinstance(config, PretrainedConfig): --> 206 config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs) 207 208 if "bert-base-japanese" in str(pretrained_model_name_or_path):

    /usr/local/lib/python3.7/dist-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs) 201 202 """ --> 203 config_dict, _ = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs) 204 205 if "model_type" in config_dict:

    /usr/local/lib/python3.7/dist-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, **kwargs) 249 f"- or '{pretrained_model_name_or_path}' is the correct path to a directory containing a {CONFIG_NAME} file\n\n" 250 ) --> 251 raise EnvironmentError(msg) 252 253 except json.JSONDecodeError:

    OSError: Can't load config for 'prithivida/grammar_error_correcter'. Make sure that:

    • 'prithivida/grammar_error_correcter' is a correct model identifier listed on 'https://huggingface.co/models'

    • or 'prithivida/grammar_error_correcter' is the correct path to a directory containing a config.json file

    Solutions for this issue????

    invalid 
    opened by sabhi27 1
  • How to train Gramformer on non-English languages.

    How to train Gramformer on non-English languages.

    Hey @PrithivirajDamodaran , Great work on building Gramformer, ive played with it and the results are amazing.

    I work on pushing nlp forward in under represented languages, and hence i humbly request you to please tell me how do i train gramformer on non-English sentences ?

    I checked out your HuggingFace page 'https://huggingface.co/prithivida/grammar_error_correcter' but coudn't find any resources on how to train gramformer from scratch. If you could help me in training Gramformer on non-English langauages it would really mean a lot to me. Do let me know.

    Thanks

    question 
    opened by StephennFernandes 1
  • pip install is erroring out,

    pip install is erroring out,

    I am unable to do pip install of the package, here is the error:

    Collecting git+https://github.com/PrithivirajDamodaran/[email protected] Cloning https://github.com/PrithivirajDamodaran/Gramformer.git (to revision v0.1) to c:\users\sumit\appdata\local\temp\pip-req-build-sw54k_0h ERROR: Error [WinError 2] The system cannot find the file specified while executing command git clone -q https://github.com/PrithivirajDamodaran/Gramformer.git 'C:\Users\Sumit\AppData\Local\Temp\pip-req-build-sw54k_0h' ERROR: Cannot find command 'git' - do you have 'git' installed and in your PATH?

    I also tried directly downloading the repo and tried executing the package. Model is not present in location(correction_model_tag = "prithivida/grammar_error_correcter"). Any way to download the pretrain model.

    opened by ranjan-sumit 1
  • OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

    OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

    OSError Traceback (most recent call last) ~\AppData\Local\Temp\ipykernel_9376\2706950954.py in 25 26 ---> 27 gf = Gramformer(models = 1, use_gpu=False) # 1=corrector, 2=detector 28 29 influent_sentences = [

    ~\anaconda3_9\envs\python37\lib\site-packages\gramformer\gramformer.py in init(self, models, use_gpu) 7 import errant 8 #self.annotator = errant.load('en_core_web_sm') ----> 9 self.annotator = errant.load('en') # en is deprecated from spacy 3.0 onwards 10 11 if use_gpu:

    ~\anaconda3_9\envs\python37\lib\site-packages\errant_init_.py in load(lang, nlp) 17 18 # Load spacy ---> 19 nlp = nlp or spacy.load(lang, disable=["ner"]) 20 21 # Load language edit merger

    ~\anaconda3_9\envs\python37\lib\site-packages\spacy_init_.py in load(name, **overrides) 28 if depr_path not in (True, False, None): 29 warnings.warn(Warnings.W001.format(path=depr_path), DeprecationWarning) ---> 30 return util.load_model(name, **overrides) 31 32

    ~\anaconda3_9\envs\python37\lib\site-packages\spacy\util.py in load_model(name, **overrides) 173 elif hasattr(name, "exists"): # Path or Path-like to model data 174 return load_model_from_path(name, **overrides) --> 175 raise IOError(Errors.E050.format(name=name)) 176 177

    OSError: [E050] Can't find model 'en'. It doesn't seem to be a shortcut link, a Python package or a valid path to a data directory.

    opened by vky2998 2
  • Word limit

    Word limit

    The model is having trouble with long sentences. Specially if the words in the sentences are in upper case. It outputs only limited sentence as an output and the rest neglected sentence is shown as error.

    opened by Talib6509 0
  • Gramformer Highlight function not working

    Gramformer Highlight function not working

    Hello... I'm trying to get the edits between two sentences, but the highlight function is not working. Has anybody faced the same issue? Many thanks in advance

    opened by NourAlMerey 0
  • Suggestions to improve the grammar results for short sentences

    Suggestions to improve the grammar results for short sentences

    Hello..!

    I have used Gramformer model and I think this could be quite useful for checking and correcting some grammar points, especially for correcting singular/plural, verb forms and tenses, and spelling. However, some other grammar points (like correcting sentence structure, comparative/superlative forms, pronoun cases, etc.) seem to be still tricky.

    Note: I need to use the model on short sentences.

    The biggest challenge I faced in my case is: (Please suggest how to avoid it or improve it or changing some parameters...) 1 - Since it corrects grammar by generating text, most of the time it completely changes the sentence and rephrase it. How can we avoid this.

    whose bags you can bring? --> Which bags you can bring? (Just a sample, and sometime it generates totally changed verbose sentence)

    2 - Every time I give the same sentence as input, it generates different outputs:

    I go can there: three outputs in three different run ("I go, there"., "can I go there?", "I go back there.")

    Thanks!

    opened by muzamil47 0
Releases(v1.4)
  • v1.4(Aug 10, 2021)

    ⚡️ Features added/changed

    ✅ Correct API uses a ranker to sort good quality corrections. ✅ Highlight API returns sents w/errors marked up as readable tags. ✅ Edit API returns error types, positions, and respective corrections. ✅ The latest model checkpoint has been refreshed w/more data.

    License update to MIT.

    Source code(tar.gz)
    Source code(zip)
Owner
Prithivida
Applied NLP, XAI for NLP and Data Engineering
Prithivida
:sparkles: Surface lint errors during code review

✨ Linty Fresh ✨ Keep your codebase sparkly clean with the power of LINT! Linty Fresh parses lint errors and report them back to GitHub as comments on

Lyft 183 Dec 18, 2022
A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycodestyle.

flake8-bugbear A plugin for Flake8 finding likely bugs and design problems in your program. Contains warnings that don't belong in pyflakes and pycode

Python Code Quality Authority 869 Dec 30, 2022
Optional static typing for Python 3 and 2 (PEP 484)

Mypy: Optional Static Typing for Python Got a question? Join us on Gitter! We don't have a mailing list; but we are always happy to answer questions o

Python 14.4k Jan 8, 2023
The strictest and most opinionated python linter ever!

wemake-python-styleguide Welcome to the strictest and most opinionated python linter ever. wemake-python-styleguide is actually a flake8 plugin with s

wemake.services 2.1k Jan 1, 2023
coala provides a unified command-line interface for linting and fixing all your code, regardless of the programming languages you use.

"Always code as if the guy who ends up maintaining your code will be a violent psychopath who knows where you live." ― John F. Woods coala provides a

coala development group 3.4k Dec 29, 2022
Mypy stubs, i.e., type information, for numpy, pandas and matplotlib

Mypy type stubs for NumPy, pandas, and Matplotlib This is a PEP-561-compliant stub-only package which provides type information for matplotlib, numpy

Predictive Analytics Lab 194 Dec 19, 2022
Automated security testing using bandit and flake8.

flake8-bandit Automated security testing built right into your workflow! You already use flake8 to lint all your code for errors, ensure docstrings ar

Tyler Wince 96 Jan 1, 2023
Easy saving and switching between multiple KDE configurations.

Konfsave Konfsave is a config manager. That is, it allows you to save, back up, and easily switch between different (per-user) system configurations.

null 42 Sep 25, 2022
PEP-484 typing stubs for SQLAlchemy 1.4 and SQLAlchemy 2.0

SQLAlchemy 2 Stubs These are PEP-484 typing stubs for SQLAlchemy 1.4 and 2.0. They are released concurrently along with a Mypy extension which is desi

SQLAlchemy 139 Dec 30, 2022
Utilities for pycharm code formatting (flake8 and black)

Pycharm External Tools Extentions to Pycharm code formatting tools. Currently supported are flake8 and black on a selected code block. Usage Flake8 [P

Haim Daniel 13 Nov 3, 2022
Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch

COCO LM Pretraining (wip) Implementation of COCO-LM, Correcting and Contrasting Text Sequences for Language Model Pretraining, in Pytorch. They were a

Phil Wang 44 Jul 28, 2022
Alleviating Over-segmentation Errors by Detecting Action Boundaries

Alleviating Over-segmentation Errors by Detecting Action Boundaries Forked from ASRF offical code. This repo is the a implementation of replacing orig

null 13 Dec 12, 2022
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

Grammarly 227 Jan 2, 2023
UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language

UA-GEC: Grammatical Error Correction and Fluency Corpus for the Ukrainian Language This repository contains UA-GEC data and an accompanying Python lib

Grammarly 226 Dec 29, 2022
keras复现场景文本检测网络CPTN: 《Detecting Text in Natural Image with Connectionist Text Proposal Network》;欢迎试用,关注,并反馈问题...

keras-ctpn [TOC] 说明 预测 训练 例子 4.1 ICDAR2015 4.1.1 带侧边细化 4.1.2 不带带侧边细化 4.1.3 做数据增广-水平翻转 4.2 ICDAR2017 4.3 其它数据集 toDoList 总结 说明 本工程是keras实现的CPTN: Detecti

mick.yi 107 Jan 9, 2023
Detecting Text in Natural Image with Connectionist Text Proposal Network (ECCV'16)

Detecting Text in Natural Image with Connectionist Text Proposal Network The codes are used for implementing CTPN for scene text detection, described

Tian Zhi 1.3k Dec 22, 2022
[EMNLP 2021] LM-Critic: Language Models for Unsupervised Grammatical Error Correction

LM-Critic: Language Models for Unsupervised Grammatical Error Correction This repo provides the source code & data of our paper: LM-Critic: Language M

Michihiro Yasunaga 98 Nov 24, 2022
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries

Moment-DETR QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries Jie Lei, Tamara L. Berg, Mohit Bansal For dataset de

Jie Lei 雷杰 133 Dec 22, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022