Augmenty is an augmentation library based on spaCy for augmenting texts.

Overview

Augmenty: The cherry on top of your NLP pipeline

PyPI version python version Code style: black github actions pytest github actions docs github coverage CodeFactor Streamlit App pip downloads

Augmenty is an augmentation library based on spaCy for augmenting texts. Besides a wide array of highly flexible augmenters, Augmenty provides a series of tools for working with augmenters, including combining and moderating augmenters. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the assigned labels under the augmentation, thus making many of the augmenters valid for training more than simply sentence classification.

🔧 Installation

To get started using augmenty simply install it using pip by running the following line in your terminal:

pip install augmenty

Do note that this is a minimal installation. As some augmenters requires additional packages please write the following line to install all dependencies.

pip install augmenty[all]

For more detailed instructions on installing augmenty, including specific language support, see the installation instructions.

🍒 Simple Example

The following shows a simple example of how you can quickly augment text using Augmenty. For more on using augmenty see the usage guides.

import spacy
import augmenty

nlp = spacy.load("en_core_web_sm")

docs = nlp.pipe(["Augmenty is a great tool for text augmentation"])

entity_augmenter = augmenty.load("ents_replace.v1", 
                                 ent_dict = {{"ORG": [["spaCy"], ["spaCy", "Universe"]]})

for doc in augmenty.docs(docs, augmenter=entity_augmenter)
    print(doc)
spaCy Universe is a great tool for text augmentation.

📖 Documentation

Documentation
📚 Usage Guides Guides and instruction on how to use augmenty and its features.
📰 News and changelog New additions, changes and version history.
🎛 API References The detailed reference for augmenty's API. Including function documentation
🍒 Augmenters Contains a full list of current augmenters in augmenty.
😎 Demo A simple streamlit demo to try out the augmenters.

💬 Where to ask questions

Type
🚨 Bug Reports GitHub Issue Tracker
🎁 Feature Requests & Ideas GitHub Issue Tracker
👩‍💻 Usage Questions GitHub Discussions
🗯 General Discussion GitHub Discussions
🍒 Adding an Augmenter Adding an augmenter

🤔 FAQ

How do I test the code and run the test suite?

augmenty comes with an extensive test suite. In order to run the tests, you'll usually want to clone the repository and build augmenty from the source. This will also install the required development dependencies and test utilities defined in the requirements.txt.

pip install -r requirements.txt
pip install pytest

python -m pytest

which will run all the test in the augmenty/tests folder.

Specific tests can be run using:

python -m pytest augmenty/tests/test_docs.py

Code Coverage If you want to check code coverage you can run the following:

pip install pytest-cov

python -m pytest --cov=.

Does augmenty run on X?

augmenty is intended to run on all major OS, this includes Windows (latest version), MacOS (Catalina) and the latest version of Linux (Ubuntu). Below you can see if augmenty passes its test suite for the system of interest. Please note these are only the systems augmenty is being actively tested on, if you run on a similar system (e.g. an earlier version of Linux) augmenty will likely run there as well, if not please create an issue.

Operating System Status
Ubuntu/Linux (Latest) github actions pytest ubuntu
MacOS (Catalina) github actions pytest catalina
Windows (Latest) github actions pytest windows

How is the documentation generated?

augmenty uses sphinx to generate documentation. It uses the Furo theme with a custom styling.

To make the documentation you can run:

# install sphinx, themes and extensions
pip install sphinx furo sphinx-copybutton sphinxext-opengraph

# generate html from documentations

make -C docs html

Many of these augmenters are completely useless for training?

That is true, some of the augmenters are rarely something you would augment with during training. For instance randomly adding or removing spacing. However, augmentation can just as well be used to test whether a model is robust to certain variations.


Can I use augmenty without using spacy?

Indeed augmenty contains convenience functions for applying augmentation directly to raw texts. Check out the getting started guide to learn how.


🎓 Citing this work

If you use this library in your research, please cite:

@inproceedings{augmenty2021,
    title={Augmenty, the cherry on top of your NLP pipeline},
    author={Enevoldsen, Kenneth and Hansen, Lasse},
    year={2021}
}
Comments
  • Use of augmenty with spacy config files for training

    Use of augmenty with spacy config files for training

    I didn't see any documentation on how to import these augmenters when using spacy 3.0's config and command line system when training. Is it possible to use it in this sense? If so, how?

    apon further review, for the command line to register new augmentations, the flag: -- code <code.py> Needs to be set when calling the training. I have tried to point to the specific file that contains the keystroke aug that I wanted but it complains about not knowing a parent for relative imports. I also tried the various init.py files but it complained also. It seems to work when you take the code out and place it in a new file without relative imports and point to that.

    image

    Which page or section is this issue related to?

    https://spacy.io/usage/training#data-augmentation-custom

    https://kennethenevoldsen.github.io/augmenty/tutorials/introduction.html#Applying-the-augmentation

    documentation 
    opened by Giles-Billenness 3
  • Added sententence_subset.v1 augmenter following #48

    Added sententence_subset.v1 augmenter following #48

    Following #48, Added the sententence_subset.v1 augmenter which subsamples sentences from a document:

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    text = """Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool
    for obtaining higher performance on limited data. You can also use it to see how
    robust your model is to changes. It will sample subset of the paragraf."""
    docs = nlp(text)
    
    augmenter = augmenty.load("sententence_subset.v1",  respect_sentences = True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Missing:

    • [ ] Add tests
    • [ ] Add documentation
    opened by KennethEnevoldsen 3
  • Paragraf subset augmenter

    Paragraf subset augmenter

    A paragraf subset augmentation which can work on token and sentence level. It will sample a random percentage of included coherent tokens/sentences and a random token/sentence start position ensuring the former constraint is maintained. The augmenter needs to handle annotated entities and avoid breaking them.

    Input arguments: level: how often to apply augmenter min_paragraf: Minimum percentage of tokens or sentences to include. Ie. 4 sentences with min_paragraf=0.5 means it as a minimum includes 2 sentences. sentence_level: Boolean to define if token or sentence level to define

    Example - sentence level

    import augmenty
    import spacy
    nlp = spacy.load("en_core_web_sm")
    
    # four sentences
    texts = [
        "Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool"
        "for obtaining higher performance on limited data. You can also use it to see how "
        "robust your model is to changes. It will sample subset of the paragraf.",
    ]
    docs = nlp(texts)
    
    augmenter = augmenty.load("paragraf_subset.v1", level=1.0, min_paragraf=0.5, sentence_level=True)
    
    list(augmenty.texts(texts, augmenter, nlp))
    

    Example outputs:

    The first section:

    Augmenty is a wonderful tool for augmentation. Augmentation is a wonderful tool 
    for obtaining higher performance on limited data.
    

    The middle section:

    Augmentation is a wonderful tool for obtaining higher performance on limited data. 
    You can also use it to see how robust your model is to changes.
    

    The middle section:

    You can also use it to see how robust your model is to changes. It will sample subset 
    of the paragraf.
    

    Additional thoughts:

    Possibly addition of a reverse augmenter, eg. removing a coherent section of tokens/sentences.

    additional augmenter 
    opened by martincjespersen 3
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26

    Bumps MishaKav/pytest-coverage-comment from 1.1.25 to 1.1.26.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 2
  • :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    :arrow_up: Update pydantic requirement from <1.9.0,>=1.8.2 to >=1.8.2,<1.10.0

    Updates the requirements on pydantic to permit the latest version.

    Release notes

    Sourced from pydantic's releases.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy
    • validate_arguments now supports extra customization (used to always be Extra.forbid), #3161 by @​PrettyWood

    ... (truncated)

    Changelog

    Sourced from pydantic's changelog.

    v1.9.0 (2021-12-31)

    Thank you to pydantic's sponsors: @​sthagen, @​timdrijvers, @​toinbis, @​koxudaxi, @​ginomempin, @​primer-io, @​and-semakin, @​westonsteimel, @​reillysiemens, @​es3n1n, @​jokull, @​JonasKs, @​Rehket, @​corleyma, @​daddycocoaman, @​hardbyte, @​datarootsio, @​jodal, @​aminalaee, @​rafsaf, @​jqueguiner, @​chdsbd, @​kevinalh, @​Mazyod, @​grillazz, @​JonasKs, @​simw, @​leynier, @​xfenix for their kind support.

    Highlights

    v1.9.0 (2021-12-31) Changes

    v1.9.0a2 (2021-12-24) Changes

    v1.9.0a1 (2021-12-18) Changes

    • Add support for Decimal-specific validation configurations in Field(), additionally to using condecimal(), to allow better support from editors and tooling, #3507 by @​tiangolo
    • Add arm64 binaries suitable for MacOS with an M1 CPU to PyPI, #3498 by @​samuelcolvin
    • Fix issue where None was considered invalid when using a Union type containing Any or object, #3444 by @​tharradine
    • When generating field schema, pass optional field argument (of type pydantic.fields.ModelField) to __modify_schema__() if present, #3434 by @​jasujm
    • Fix issue when pydantic fail to parse typing.ClassVar string type annotation, #3401 by @​uriyyo
    • Mention Python >= 3.9.2 as an alternative to typing_extensions.TypedDict, #3374 by @​BvB93
    • Changed the validator method name in the Custom Errors example to more accurately describe what the validator is doing; changed from name_must_contain_space to value_must_equal_bar, #3327 by @​michaelrios28
    • Add AmqpDsn class, #3254 by @​kludex
    • Always use Enum value as default in generated JSON schema, #3190 by @​joaommartins
    • Add support for Mypy 0.920, #3175 by @​christianbundy

    ... (truncated)

    Commits
    • fbf8002 prepare for v1.9.0 release, extra change
    • 5406423 prepare for v1.9.0 release
    • 87da9ac apply update_forward_refs to json_encoders (#3595)
    • 6f26a1c Support mypy 0.910 to 0.930 including CI tests (#3594)
    • 8ef492b build(deps): bump mypy from 0.920 to 0.930 (#3573)
    • 2d3d266 remove failing release step
    • ef46789 add step to upload pypi files to release
    • 5d6f48c prepare for v1.9.0a2
    • e882277 fix: support generic models with discriminated union (#3551)
    • edad0db fix: keep old behaviour of json() by default (#3542)
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 2
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40

    Bumps MishaKav/pytest-coverage-comment from 1.1.39 to 1.1.40.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Support GitHub enterprise urls

    What's Changed

    New Contributors

    Full Changelog: https://github.com/MishaKav/pytest-coverage-comment/compare/v1.1.39...v1.1.40

    Changelog

    Sourced from MishaKav/pytest-coverage-comment's changelog.

    Pytest Coverage Comment 1.1.40

    Release Date: 2022-12-03

    Changes

    • Support for url for github enterprise repositories, thanks to @​jbcumming for contribution
    • Minor readme improvements, thanks to @​AlexanderLanin for contribution
    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    :arrow_up: Bump MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31

    Bumps MishaKav/pytest-coverage-comment from 1.1.30 to 1.1.31.

    Release notes

    Sourced from MishaKav/pytest-coverage-comment's releases.

    Remove link on badge

    add option to remove link on badge

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    :arrow_up: Update streamlit requirement from <1.11.0,>=1.5.0 to >=1.5.0,<1.12.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    :arrow_up: Bump actions/setup-python from 3 to 4.1.0

    Bumps actions/setup-python from 3 to 4.1.0.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.1.0

    In scope of this pull request we updated actions/cache package as the new version contains fixes for caching error handling. Moreover, we added a new input update-environment. This option allows to specify if the action shall update environment variables (default) or not.

    Update-environment input

        - name: setup-python 3.9
          uses: actions/setup-python@v4
          with:
            python-version: 3.9
            update-environment: false
    

    Besides, we added such changes as:

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/setup-python@v4
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/setup-python@v4
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405

    ... (truncated)

    Commits
    • c4e89fa Improve readme for 3.x and 3.11-dev style python-version (#441)
    • 0ad0f6a Merge pull request #452 from mayeut/fix-env
    • f0bcf8b Merge pull request #456 from akx/patch-1
    • af97157 doc: Add multiple wildcards example to readme
    • 364e819 Merge pull request #394 from akv-platform/v-sedoli/set-env-by-default
    • 782f81b Merge pull request #450 from IvanZosimov/ResolveVersionFix
    • 2c9de4e Remove duplicate code introduced in #440
    • 412091c Fix tests for update-environment==false
    • 78a2330 Merge pull request #451 from dmitry-shibanov/fx-pipenv-python-version
    • 96f494e trigger checks
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Bump actions/setup-python from 3 to 4

    :arrow_up: Bump actions/setup-python from 3 to 4

    Bumps actions/setup-python from 3 to 4.

    Release notes

    Sourced from actions/setup-python's releases.

    v4.0.0

    What's Changed

    • Support for python-version-file input: #336

    Example of usage:

    - uses: actions/setup-python@v4
      with:
        python-version-file: '.python-version' # Read python version from a file
    - run: python my_script.py
    

    There is no default python version for this setup-python major version, the action requires to specify either python-version input or python-version-file input. If the python-version input is not specified the action will try to read required version from file from python-version-file input.

    • Use pypyX.Y for PyPy python-version input: #349

    Example of usage:

    - uses: actions/setup-python@v4
      with:
        python-version: 'pypy3.9' # pypy-X.Y kept for backward compatibility
    - run: python my_script.py
    
    • RUNNER_TOOL_CACHE environment variable is equal AGENT_TOOLSDIRECTORY: #338

    • Bugfix: create missing pypyX.Y symlinks: #347

    • PKG_CONFIG_PATH environment variable: #400

    • Added python-path output: #405 python-path output contains Python executable path.

    • Updated zeit/ncc to vercel/ncc package: #393

    • Bugfix: fixed output for prerelease version of poetry: #409

    • Made pythonLocation environment variable consistent for Python and PyPy: #418

    • Bugfix for 3.x-dev syntax: #417

    • Other improvements: #318 #396 #384 #387 #388

    Update actions/cache version to 2.0.2

    In scope of this release we updated actions/cache package as the new version contains fixes related to GHES 3.5 (actions/setup-python#382)

    Add "cache-hit" output and fix "python-version" output for PyPy

    This release introduces new output cache-hit (actions/setup-python#373) and fix python-version output for PyPy (actions/setup-python#365)

    The cache-hit output contains boolean value indicating that an exact match was found for the key. It shows that the action uses already existing cache or not. The output is available only if cache is enabled.

    ... (truncated)

    Commits
    • d09bd5e fix: 3.x-dev can install a 3.y version (#417)
    • f72db17 Made env.var pythonLocation consistent for Python and PyPy (#418)
    • 53e1529 add support for python-version-file (#336)
    • 3f82819 Fix output for prerelease version of poetry (#409)
    • 397252c Update zeit/ncc to vercel/ncc (#393)
    • de977ad Merge pull request #412 from vsafonkin/v-vsafonkin/fix-poetry-cache-test
    • 22c6af9 Change PyPy version to rebuild cache
    • 081a3cf Merge pull request #405 from mayeut/interpreter-path
    • ff70656 feature: add a python-path output
    • fff15a2 Use pypyX.Y for PyPy python-version input (#349)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies github_actions 
    opened by dependabot[bot] 1
  • :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    :arrow_up: Update streamlit requirement from <1.9.0,>=1.5.0 to >=1.5.0,<1.10.0

    Updates the requirements on streamlit to permit the latest version.

    Commits

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies python 
    opened by dependabot[bot] 1
  • Sample fake entities for entity augmenter using Faker package

    Sample fake entities for entity augmenter using Faker package

    Add sampling of entities (such as names or adresses) from https://faker.readthedocs.io/en/master/locales/da_DK.html. This tool supports random sampling of entities for numerous of languages.

    enhancement help wanted 
    opened by martincjespersen 1
  • implement an oversampling function

    implement an oversampling function

    Augmentation can be used to oversample a category.

    Imagined usage would look something like this:

    aug = augmenty.load(...)
    
    def is_positive(example):
        """return true if the example contains an entity"""
        if example.y.cats["positive"] == 1:
            return True
        return False
    
    upsampled_corpus = augumenty.oversample(corpus, augmenter=aug, conditional=is_positive, n=1000)
    
    enhancement 
    opened by KennethEnevoldsen 0
  • Back translation augmentation

    Back translation augmentation

    Augmenting of a document using back translation of various languages e.g., using huggingface models: https://huggingface.co/models?pipeline_tag=translation.

    Example blog: https://dzlab.github.io/dltips/en/pytorch/text-augmentation/

    Example sentence: Augmenty is an augmentation library based on spaCy for augmenting texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence and document labels under the augmentation.

    English -> Danish (Google): Augmenty er et udvidelsesbibliotek baseret på spaCy til forstørrelse af tekster. Augmenty adskiller sig fra andre augmentationsbiblioteker ved, at den korrigerer (så vidt muligt) token-, sætnings- og dokumentetiketterne under augmentationen.

    Danish -> English (Google): Augmenty is an extension library based on spaCy for enlarging texts. Augmenty differs from other augmentation libraries in that it corrects (as far as possible) the token, sentence, and document labels during augmentation.

    additional augmenter 
    opened by martincjespersen 1
  • List of potentially new augmenters

    List of potentially new augmenters

    The following is a list of potentially new augmenters. If you wish a specific augmenter to be added before others please update the issue corresponding to the augmenter (if it doesn't have one feel free to create one).

    A variation of existing augmenters:

    New augmenters

    Batch augmenters

    A combination of existing augmenters

    • [ ] EDA augmenter following the EDA paper
    additional augmenter 
    opened by KennethEnevoldsen 0
Releases(v1.0.1)
  • v1.0.1(Jun 21, 2022)

    Version

    What's Changed

    • Version 1.0.0 by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/50
    • Update replace.py by @koaning in https://github.com/KennethEnevoldsen/augmenty/pull/51

    Documentation updates

    • added faker based on PR by @martincjespersen by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/85
    • Added pre-config workflows by @KennethEnevoldsen in https://github.com/KennethEnevoldsen/augmenty/pull/86

    New Contributors

    • @dependabot made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/46
    • @koaning made their first contribution in https://github.com/KennethEnevoldsen/augmenty/pull/51
    • @martincjespersen

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/compare/v.0.0.12...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v.0.0.12(Feb 7, 2022)

    0.0.12 (03/08/21)

    • Many bugfixes
    • Added a few more augmenters
    • Notable updates to the documentation of the package

    0.0.1 (03/08/21)

    • First version of augmenty launches 🎉
      • with more than 15 highly customizable augmenters,
      • A high-quality code-base (coverage of 96% and a codefactor A),
      • and utilities for easy application of augmenters to strings and spaCy Docs.
      • Furthermore, it also includes a series of convenience functions for combining and moderating augmentations.

    Full Changelog: https://github.com/KennethEnevoldsen/augmenty/commits/v.0.0.12

    Source code(tar.gz)
    Source code(zip)
Owner
Kenneth Enevoldsen
Interdisciplinary PhD Student on representation learning in Clinical NLP and Genetics at Aarhus University and Interacting Minds Centre
Kenneth Enevoldsen
Spacy-ginza-ner-webapi - Named Entity Recognition API with spaCy and GiNZA

Named Entity Recognition API with spaCy and GiNZA I wrote a blog post about this

Yuki Okuda 3 Feb 27, 2022
Lingtrain Aligner — ML powered library for the accurate texts alignment.

Lingtrain Aligner ML powered library for the accurate texts alignment in different languages. Purpose Main purpose of this alignment tool is to build

Sergei Averkiev 76 Dec 14, 2022
This library is testing the ethics of language models by using natural adversarial texts.

prompt2slip This library is testing the ethics of language models by using natural adversarial texts. This tool allows for short and simple code and v

null 9 Dec 28, 2021
C.J. Hutto 3.8k Dec 30, 2022
C.J. Hutto 2.8k Feb 18, 2021
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 2k Jan 4, 2023
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 1.6k Feb 10, 2021
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.6k Jan 4, 2023
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 1.2k Jan 8, 2023
A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

AI2 1.3k Jan 3, 2023
spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

null 342 Nov 21, 2022
NLP, before and after spaCy

textacy: NLP, before and after spaCy textacy is a Python library for performing a variety of natural language processing (NLP) tasks, built on the hig

Chartbeat Labs Projects 1.6k Feb 17, 2021
✨Fast Coreference Resolution in spaCy with Neural Networks

✨ NeuralCoref 4.0: Coreference Resolution in spaCy with Neural Networks. NeuralCoref is a pipeline extension for spaCy 2.1+ which annotates and resolv

Hugging Face 2.2k Feb 18, 2021
🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

spacy-transformers: Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy This package provides spaCy components and architectures to use tr

Explosion 903 Feb 17, 2021
A full spaCy pipeline and models for scientific/biomedical documents.

This repository contains custom pipes and models related to using spaCy for scientific documents. In particular, there is a custom tokenizer that adds

AI2 831 Feb 17, 2021
spaCy plugin for Transformers , Udify, ELmo, etc.

Camphr - spaCy plugin for Transformers, Udify, Elmo, etc. Camphr is a Natural Language Processing library that helps in seamless integration for a wid

null 327 Feb 18, 2021
DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

Kenneth Enevoldsen 71 Jan 6, 2023
SpikeX - SpaCy Pipes for Knowledge Extraction

SpikeX is a collection of pipes ready to be plugged in a spaCy pipeline. It aims to help in building knowledge extraction tools with almost-zero effort.

Erre Quadro Srl 384 Dec 12, 2022
A spaCy wrapper of OpenTapioca for named entity linking on Wikidata

spaCyOpenTapioca A spaCy wrapper of OpenTapioca for named entity linking on Wikidata. Table of contents Installation How to use Local OpenTapioca Vizu

Universitätsbibliothek Mannheim 80 Jan 3, 2023