State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Open Business Software Solutions

Last update: Sep 21, 2022

Related tags

Text Data & NLP python nlp natural-language-processing deep-learning transformers pytorch transformer allennlp pytorch-transformers

Overview

Trapper (Transformers wRAPPER)

Trapper is an NLP library that aims to make it easier to train transformer based models on downstream tasks. It wraps huggingface/transformers to provide the transformer model implementations and training mechanisms. It defines abstractions with base classes for common tasks encountered while using transformer models. Additionally, it provides a dependency-injection mechanism and allows defining training and/or evaluation experiments via configuration files. By this way, you can replicate your experiment with different models, optimizers etc by only changing their values inside the configuration file without writing any new code or changing the existing code. These features foster code reuse, less boiler-plate code, as well as repeatable and better documented training experiments which is crucial in machine learning.

Why You Should Use Trapper

You have been a Transformers user for quite some time now. However, you started to feel that some computation steps could be standardized through new abstractions. You wish to reuse the scripts you write for data processing, post-processing etc with different models/tokenizers easily. You would like to separate the code from the experiment details, mix and match components through configuration files while keeping your codebase clean and free of duplication.
You are an AllenNLP user who is really happy with the dependency-injection system, well-defined abstractions and smooth workflow. However, you would like to use the latest transformer models without having to wait for the core developers to integrate them. Moreover, the Transformers community is scaling up rapidly, and you would like to join the party while still enjoying an AllenNLP touch.
You are an NLP researcher / practitioner, and you would like to give a shot to a library aiming to support state-of-the-art models along with datasets, metrics and more in unified APIs.

To see more, check the official Trapper blog post.

Key Features

Compatibility with HuggingFace Transformers

Trapper extends Transformers!

While implementing the components of trapper, we try to reuse the classes from the Transformers library as much as we can. For example, trapper uses the models, and the trainer as they are in Transformers. This makes it easy to use the models trained with trapper on other projects or libraries that depend on Transformers (or pytorch in general).

We strive to keep trapper fully compatible with Transformers, so you can always use some of our components to write a script for your own needs while not using the full pipeline (e.g. for training).

Dependency Injection and Training Based on Configuration Files

We use the registry mechanism of AllenNLP to provide dependency injection and enable reading the experiment details from the configuration files in json or jsonnet format. You can look at the AllenNLP guide on dependency injection to learn more about how the registry system and dependency injection works as well as how to write configuration files. In addition, we strongly recommend reading the remaining parts of the AllenNLP guide to learn more about its design philosophy, the importance of abstractions etc. (especially Part2: Abstraction, Design and Testing). As a warning, please note that we do not use AllenNLP's abstractions and base classes in general, which means you can not mix and match the trapper's and AllenNLP's components. Instead, we just use the class registry and dependency injection mechanisms and only adapt its very limited set of components, first by wrapping and registering them as trapper components. For example, we use the optimizers from AllenNLP since we can conveniently do so without hindering our full compatibility with Transformers.

Full Integration with HuggingFace Datasets

In trapper, we officially use the format of the datasets from datasets and provide full integration with it. You can directly use all datasets published in datasets hub without doing any extra work. You can write the dataset name and extra loading arguments (if there are any) in your training config file, and trapper will automatically download the dataset and pass it to the trainer. If you have a local or private dataset, you can still use it after converting it to the HuggingFace datasets format by writing a dataset loading script as explained here.

Support for Metrics through Jury

Trapper supports the common NLP metrics through jury. Jury is an NLP library dedicated to provide metric implementations by adopting and extending the datasets library. For metric computation during training you can use jury style metric instantiation/configuration to set up on your trapper configuration file to compute metrics on the fly on eval dataset with a specified eval_steps value. If your desired metric is not yet available on jury or datasets, you can still create your own by extending trapper.Metric and utilizing either jury.Metric or datasets.Metric for handling larger set of cases on predictions.

Abstractions and Base Classes

Following AllenNLP, we implement our own registrable base classes to abstract away the common operations for data processing and model training.

Data reading and preprocessing base classes including
- The classes to be used directly: DatasetReader, DatasetLoader and DataCollator.
- The classes that you may need to extend: LabelMapper,DataProcessor , DataAdapter and TokenizerWrapper.
- TokenizerWrapper classes utilizing AutoTokenizer from Transformers are used as factories to instantiate wrapped tokenizers into which task-specific special tokens are registered automatically.
ModelWrapper classes utilizing the AutoModelFor... classes from Transformers are used as factories to instantiate the actual task-specific models from the configuration files dynamically.
Optimizers from AllenNLP: Implemented as children of the base Optimizer class.
Metric computation is supported through jury. In order to make the metrics flexible enough to work with the trainer in a common interface, we introduced metric handlers. You may need to extend these classes accordingly
- For conversion of predictions and references to a suitable form for a particular metric or metric set: MetricInputHandler.
- For manipulating resulting score object containing the metric results: MetricOutputHandler.

Usage

To use trapper, you need to select the common NLP formulation of the problem you are tackling as well as decide on its input representation, including the special tokens.

Modeling the Problem

The first step in using trapper is to decide on how to model the problem. First, you need to model your problem as one of the common modeling tasks in NLP such as seq-to-seq, sequence classification etc. We stick with the Transformers' way of dividing the tasks into common categories as it does in its AutoModelFor... classes. To be compatible with Transformers and reuse its model factories, trapper formalizes the tasks by wrapping the AutoModelFor... classes and matching them to a name that represents a common task in NLP. For example, the natural choice for POS tagging is to model it as a token classification (i.e. sequence labeling) task. On the other hand, for question answering task, you can directly use the question answering formulation since Transformers already has a support for that task.

Modeling the Input

You need to decide on how to represent the input including the common special tokens such as BOS, EOS. This formulation is directly used while creating the input_ids value of the input instances. As a concrete example, you can represent a sequence classification input with BOS ... actual_input_tokens ... EOS format. Moreover, some tasks require extra task-specific special tokens as well. For example, in conditional text generation, you may need to prompt the generation with a special signaling token. In tasks that utilizes multiple sequences, you may need to use segment embeddings (via token_type_ids) to label the tokens according to their sequence.

Class Reference

The above diagram shows the basic components in Trapper. To use trapper on training, evaluation on a task that is not readily supported in Transformers, you need to extend the provided base classes according to your own needs. These are as follows:

For Training & Evaluation: LabelMapper, DataProcessor, DataAdapter, TokenizerWrapper, MetricInputHandler, MetricOutputHandler.

For Inference: In addition to the ones listed above, you may need to implement a transformers.Pipeline or directly use one from Transformers if they already implemented one that matches your need.

Typically Extended Classes

LabelMapper: Used in tasks that require mapping between categorical labels and integer ids such as token classification.
DataProcessor: This class is responsible for taking a single instance in dict format, typically coming from a datasets.Dataset, extracting the information fields suitable for the task and hand, and converting their values to integers or collections of integers. This includes, tokenizing the string fields, and getting the token ids, converting the categoric labels to integer ids and so on.
DataAdapter: This is responsible for converting the information fields inside an instance dict that was previously processed by a DataProcessor to a format suitable for feeding into a transformer model. This also includes handling the special tokens signaling the start or end of a sequence, the separation of tho sequence for a sequence-pair task as well as chopping excess tokens etc.
TokenizerWrapper: This class wraps a pretrained tokenizer from the Transformers while also recording the special tokens needed for the task to the wrapped tokenizer. It also stores the missing values from BOS - CLS, EOS - SEP token pairs for the tokenizers that only support one of them. This means you can model your input sequence by using the bos_token for start and eos_token for end without thinking which model you are working with. If your task and input modeling needs extra special tokens e.g. the <CONTEXT> for a context dependent task, you can store these tokens by setting the _TASK_SPECIFIC_SPECIAL_TOKENS class variable in your TokenizerWrapper subclass. Otherwise, you can directly use TokenizerWrapper.
MetricInputHandler: This class is mainly responsible for preprocessing applied to predictions and labels (references). This is performed for transforming the predictions and labels to a suitable format to be fed in metrics for computation. For example, while using BLEU in a language generation task, the predictions and labels need to be converted to a string or list of strings. However, for extractive question answering task in which the predictions are returned as start and end indices pointing the answer within the context, additional information (e.g context in such case) may be needed, so directly returning the start and end indices in this case does not help, and additional operation is needed to be done by converting predictions to actual answers extracted from the context. You are able to do this kind of operations through MetricInputHandler, storing additional information, converting predictions and labels to a suitable format, manipulating resulting score. Furthermore, in child classes helper classes can also be implemented (e.g TokenizerWrapper, LabelMapper) for required tasks. In this class, we provide three main functionality:
- _extract_metadata(): This method allows user to extract metadata from dataset instances to be later used for preprocessing predictions and labels in preprocess() method.
- __call__(): This method allows converting predictions and labels into a suitable form for metric computation. The default behavior is defined as directly returning predictions and labels without manipulation, but only applying argmax() to predictions to convert the model predictions to predictions input for metrics.
MetricOutputHandler: The intention of this class is to support for manipulating the score object returned by the metric computation phase. Jury returns a well-constructed dictionary output for all metrics; however, to shorten dictionary items, manipulate the information within the output or to add additional information to score dictionary, this class can be extended as desired.
transformers.Pipeline: The pipeline mechanism from Transformers have not been fully integrated yet. For now, you should check Transformers to find a pipeline that is suitable for your needs and does the same pre-processing. If you could not find one, you may need to write your own Pipeline by extending transformers.Pipeline or one of its subclasses and add it to transformers.pipelines.SUPPORTED_TASKS map. To enable instantiation of the pipelines from the checkpoint folders, we provide a factory, create_pipeline_from_checkpoint function. It accepts a checkpoint directory of a completed experiment, the path to the config file (already saved in that directory), as well as the task name that you used while adding the pipeline to SUPPORTED_TASKS. It re-creates the objects you used while training such as model wrapper, label mapper etc and provides them as keyword arguments to constructor of the pipeline you implemented.

Registering classes from custom modules to the library

We support both file based and command line argument based approaches to register the external modules written by the users.

Option 1 - File based

You should list the packages or modules (for stand-alone modules not residing inside any package) containing the classes to be registered as plugins to a local file named .trapper_plugins. This file must reside in the same directory where you run the trapper run command. Moreover, it is recommended to put the plugins file where the modules to be registered resides (e.g. the project root) for convenience since that directory will be added to the PYTHONPATH. Otherwise, you need to add the plugin module/packages to the PYTHONPATH manually. Another reminder is that each listed package must have an __init__.py file that imports the modules containing the custom classes to be registered.

E.g., let's say our project's root directory is project_root and the experiment config file is inside the root with a name test_experiment.jsonnet. To run the experiment, you should run the following commands:

cd project_root
trapper run test_experiment.jsonnet

Below output shows the content of the project_root directory.

ls project_root

  ner
  tests
  datasets
  .trapper_plugins
  test_experiment.jsonnet

Additionally, here is the content of the project_root/.trapper_plugins.

cat project_root/.trapper_plugins

  ner.core.models
  ner.data.dataset_readers

Option 2 - Using the command line argument

You can specify the packages and/or modules you want to be registered using the --include-package argument. However, note that you need to repeat the argument for each package/module to be registered.

E.g. the running the following commands is an alternative to Option-1 to start the experiment specified in the test_experiment.jsonnet.

trapper run test_experiment.jsonnet \
--include-package ner.core.models \
--include-package ner.data.dataset_readers

Running a training and/or evaluation experiment

Config File Based Training Using the CLI

Go to your project root and execute the trapper run command with a config file specifying the details of the training experiment. E.g.

trapper run SOME_DIRECTORY/experiment.jsonnet

Don't forget to provide the args["output_dir"] and args["result_dir"] values in your experiment file. Please look at the examples/pos_tagging/README.md for a detailed example.

Script Based Training

Go to your project root and execute the trapper run command with a config file specifying the details of the training experiment. E.g.

trapper run SOME_DIRECTORY/experiment.jsonnet

Don't forget to provide the args["output_dir"] and args["result_dir"] values in your experiment file. Please look at the examples/pos_tagging/README.md for a detailed example.

Examples for Using Trapper as a Library

We created an examples folder that includes example projects to help you get started using trapper. Currently, it includes a POS tagging project using the CONLL2003 dataset, and a question answering project using the SQuAD dataset. The POS tagging example shows how to use trapper on a task that does not have a direct support from Transformers. It implements all custom components and provides a complete project structure including the tests. On the other hand, the question answering example shows using trapper on a task that Transformers already supported. We implemented it to demonstrate how trapper may still be helpful thanks to configuration file based experiments.

Training a POS Tagging Model on CONLL2003

Since transformers lacks a direct support for POS tagging, we added an example project that trains a transformer model on CONLL2003 POS tagging dataset and perform inference using it. It is a self-contained project including its own requirements file, therefore you can copy the folder into another directory to use as a template for your own project. Please follow its README.md to get started.

Training a Question Answering Model on SQuAD Dataset

You can use the notebook in the Example QA Project examples/question_answering/question_answering.ipynb to follow the steps while training a transformer model on SQuAD v1.

Currently Supported Tasks and Models From Transformers

Hypothetically, nearly all models should work on any task if it has an entry in the table of AutoModelFor... factories for that task. However, since some models require more (or less) parameters compared to most of the models in the library, you might get errors while using such models. We try to cover these edge cases them by adding the extra parameters they require. Feel free to open an issue/PR if you encounter/solve such issues in a model-task combination. We have used trapper on a limited set of model-task combinations so far. We list these combinations below to indicate that they have been tested and validated to work without problems.

Table of Model-task Combinations Tested so far

model	question_answering	token_classification
BERT	✔	✔
ALBERT	✔	✔
DistillBERT	✔	✔
ELECTRA	✔	✔
RoBERTa	✔	✔

Installation

Environment Creation

It is strongly recommended creating a virtual environment using conda or virtualenv etc. before installing this package and its dependencies. For example, the following code creates a conda environment with name trapper and python version 3.7.10, and activates it.

conda create --name trapper python=3.7.10
conda activate trapper

Regular Installation

You can install trapper and its dependencies by pip as follows.

pip install trapper

Contributing

If you would like to open a PR, please create a fresh environment as described before, clone the repo locally and install trapper in editable mode as follows.

git clone https://github.com/obss/trapper.git
cd trapper
pip install -e .[dev]

After your changes, please ensure that the tests are still passing, and do not forget to apply code style formatting.

Testing trapper

Caching the test fixtures from the HuggingFace datasets library

To speed up the data-related tests, we cache the test dataset fixtures from HuggingFace's datasets library using the following command.

python -m scripts.cache_hf_datasets_fixtures

Then, you can simply run the tests with the following command:

python -m scripts.run_tests

NOTE: To significantly speed up the tests depending on HuggingFace's Transformers and datasets libraries, you can set the following environment variables to make them work in offline mode. However, beware that you may need to run the tests once first without setting these environment variables so that the pretrained models, tokenizers etc. are downloaded and cached.

export TRANSFORMERS_OFFLINE=1 HF_DATASETS_OFFLINE=1

Code Style

To check code style,

python -m scripts.run_code_style check

To format codebase,

python -m scripts.run_code_style format

Contributors

Comments

Adding support for distributed training

This PR adds parallelism to training phase by utilizing torch.distributed backend. torchrun is monkey-patched to properly run distributed training by trapper CLI.
enhancement

opened by devrimcavusoglu 3
Add jury or datasets/metrics integration
It would be really nice if we provide automatic evaluation metrics from datasets/metrics module, preferably through the jury library.

Can we wrap the metrics as Registrable classes?

Can we come up with some reosable default metrics to the common tasks e.g. F1, precision and recall so that a metric is chosen automatically unless the user specifies another metric (using the same mechanisms we will provide)? This part is not strictly required, but still might be nice to provide.

enhancement
opened by cemilcengiz 3

Provide better output from code style check

When I run the following code from the project root directory, python tests/run_code_style.py check

I get the error below.

ERROR: SOME_FOLDERS/trapper/trapper/data/dataset_readers/dataset_reader.py Imports are incorrectly sorted and/or formatted.
Skipped 1 files
Traceback (most recent call last):
  File "tests/run_code_style.py", line 10, in <module>
    assert_shell("isort . --check --settings setup.cfg")
  File "SOME_FOLDERS/trapper/tests/utils.py", line 19, in assert_shell
    ), f"Unexpected exit code {str(actual_exit_status)}"
AssertionError: Unexpected exit code 256

Can't we just print messages for each incorrectly formatted code instead of throwing an exception?

enhancement

opened by cemilcengiz 2

Correct in `_chop_excess_context_tokens` method in `SquadDataProcessor`

The _chop_excess_context_tokens method in SquadDataProcessor does not take into account the special tokens. Typically, we use N + 1 special tokens (such as BOS, SEP, EOS etc) if there are N sequences (e.g. context, answer etc in a question answering task).

opened by cemilcengiz 1
Add Integration Test

This adds an integration test to check different parts of the trapper are working together properly. Previously we were only testing individual components, but not whether we can execute a training experiment and confirm that it results in a model that is trained.

The new test runs a training experiment on a small model from scratch. It then saves and loads the model. Finally, it runs the loaded model on several inputs to see if the model behaves in an expected manner. The test is mostly to see if this process can be completed without any errors.

opened by Sophylax 1
Update Transformers and AllenNLP Requirements

With Transformers 4.18, we can use the new versions of transformers. There was a range of versions before that had a typing error which caused errors.

The earliest version of AllenNLP that supports 4.18 is 2.9.3. Which then had a bug with cached-path being updated and breaking some of the things on their side. 2.10.0, therefore, is the earliest version we can support with the new transformer requirement (and currently is the latest version).

opened by Sophylax 1
Refactor tests
Currently, the tests are too complicated and lack of structure and reuse. This makes it especially harder to write tests for custom classes in new tasks.

We can use a class and convert some functions to methods. Maybe, some methods can be abstract (if pytest does not complain) so that the child test classes can override them.

We can move some common fixtures to conftest.py files or place them in the test classes I described previously.

enhancement refactor
opened by cemilcengiz 1
Enable reading a dataset from datasets library using name, split and other optional arguments

An arguent handler class can be created to determine if datasets library is used and store the other arguments e.g. path, dataset name, split name etc. Then, the dataset reader could be given the handler instead of just the dataset path as currently done. The dataset reader should know how to read the dataset in both cases.
enhancement

opened by cemilcengiz 1
_run_experiment_from_params crashes during distributed training due to directory creation multiple times

In _run_experiment_from_params, the creation of serialization directories and saving the experiment configs should only done by the main process in the case of distributed training. Currently, training crashes there while doing distributed training as other processes tries to create a non-empy directory when the quickest process already creates one.

opened by cemilcengiz 0
Remove `overrides` decorator from the `Run` command

Currently the commands does not work as the commands.py module imports the overrides package whereas it is not in the requirements, hence not installed. The only reason it is used is to decorate the add_subparser method in Run class class.

opened by cemilcengiz 0
seq2seq trainer test refactor.

After jury version 2.2.2 (migrating package from datasets (to be deprecated) to evaluate), it turns out there's a little discrepancy between these two HF packages. Formerly, with datasets the arrow table schema was somehow bypassed with dtypes that are not conforming the table schema, this got unnoticed on the test cases on trapper. Currently (after switching to evaluate), the test fails as the inputs for metrics does not conform the table schema (int passed instead of string), and hence the test fails.

This PR addresses the issue above by adding an InputHandler for language generation tasks metrics that require string inputs.
bug refactor

opened by devrimcavusoglu 0

POS Tagging Example Not Working

When trying to run the POS Tagging Example experiment, I get the following exception:

Traceback (most recent call last):
  File "/home/sophylax/anaconda3/envs/trapper/lib/python3.7/site-packages/allennlp/common/params.py", line 211, in pop
    value = self.params.pop(key)
KeyError: 'metric_input_handler'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/sophylax/anaconda3/envs/trapper/bin/trapper", line 8, in <module>
    sys.exit(run())
  File "/home/sophylax/Documents/Git/Github/trapper/trapper/__main__.py", line 12, in run
    main(prog="trapper")
  File "/home/sophylax/Documents/Git/Github/trapper/trapper/commands.py", line 178, in main
    args.func(args)
  File "/home/sophylax/Documents/Git/Github/trapper/trapper/commands.py", line 101, in run_experiment_from_args
    run_experiment(args.config_path, args.overrides)
  File "/home/sophylax/Documents/Git/Github/trapper/trapper/training/train.py", line 41, in run_experiment
    return _run_experiment_from_params(params)
  File "/home/sophylax/Documents/Git/Github/trapper/trapper/training/train.py", line 64, in _run_experiment_from_params
    trainer = TransformerTrainer.from_params(params)
  File "/home/sophylax/anaconda3/envs/trapper/lib/python3.7/site-packages/allennlp/common/from_params.py", line 608, in from_params
    **extras,
  File "/home/sophylax/anaconda3/envs/trapper/lib/python3.7/site-packages/allennlp/common/from_params.py", line 636, in from_params
    kwargs = create_kwargs(constructor_to_inspect, cls, params, **extras)
  File "/home/sophylax/anaconda3/envs/trapper/lib/python3.7/site-packages/allennlp/common/from_params.py", line 207, in create_kwargs
    cls.__name__, param_name, annotation, param.default, params, **extras
  File "/home/sophylax/anaconda3/envs/trapper/lib/python3.7/site-packages/allennlp/common/from_params.py", line 310, in pop_and_construct_arg
    popped_params = params.pop(name, default) if default != _NO_DEFAULT else params.pop(name)
  File "/home/sophylax/anaconda3/envs/trapper/lib/python3.7/site-packages/allennlp/common/params.py", line 216, in pop
    raise ConfigurationError(msg)
allennlp.common.checks.ConfigurationError: key "metric_input_handler" is required

opened by Sophylax 0

Investigate relaxing seqeval and tensorboardX requirements.

Right now we are fixed on a certain version for these two packages. But we might be able to support a range of versions, increasing compatibility with other projects.

opened by Sophylax 0
Refactor pipelines

HuggingFace transformers' pipelinies underwent in a major major refactor in v4.11.0. Thanks to the new design, we can refactor our pipeline factory and existing pipelines as well as possibly wrapping the pipeline as a Registrable class. However, allennlp's latest release does not have that yet although their master branch has. Therefore, we may need to wait till their new release (I expect it won't take long) or use that commit to prevent a dependency mismatch.

opened by cemilcengiz 1
Support global plugin files as well

Support single plugin file e.g. in ~/.trapper_plugins containing the modules that can be used in multiple projects. allennlp already supports that, so it should be easy to add.
enhancement

opened by cemilcengiz 0

Releases(v0.0.11)

v0.0.11(Nov 3, 2022)
What's Changed

seq2seq trainer test refactor. by @devrimcavusoglu in https://github.com/obss/trapper/pull/66

flake8 dependency fix for python3.7. by @devrimcavusoglu in https://github.com/obss/trapper/pull/65

black configuration refactor. by @devrimcavusoglu in https://github.com/obss/trapper/pull/63

Remove @overrides decorator call and its dependency by @cemilcengiz in https://github.com/obss/trapper/pull/69

Update version.py by @cemilcengiz in https://github.com/obss/trapper/pull/72

Full Changelog: https://github.com/obss/trapper/compare/v0.0.10...v0.0.11
Source code(tar.gz)
Source code(zip)
v0.0.10(Sep 26, 2022)
What's Changed

bump datasets version. by @devrimcavusoglu in https://github.com/obss/trapper/pull/60

bump package version. by @devrimcavusoglu in https://github.com/obss/trapper/pull/61

Full Changelog: https://github.com/obss/trapper/compare/v0.0.9...v0.0.10
Source code(tar.gz)
Source code(zip)
v0.0.9(Aug 11, 2022)
Updates the dependencies for AllenNLP and transformers as well as improves the tests.

What's Changed

Reduce Upper Limit on AllenNLP Dependency by @Sophylax in https://github.com/obss/trapper/pull/53

Add VSCode Folder to .gitignore by @Sophylax in https://github.com/obss/trapper/pull/51

Update SQuAD QA Pipeline to New Transformer Pipeline Design by @Sophylax in https://github.com/obss/trapper/pull/52

Seq2seq support by @cemilcengiz in https://github.com/obss/trapper/pull/55

Update Transformers and AllenNLP Requirements by @Sophylax in https://github.com/obss/trapper/pull/56

Add Integration Test by @Sophylax in https://github.com/obss/trapper/pull/58

Increment patch version by @cemilcengiz in https://github.com/obss/trapper/pull/59

New Contributors

@Sophylax made their first contribution in https://github.com/obss/trapper/pull/53

Full Changelog: https://github.com/obss/trapper/compare/v0.0.8...v0.0.9
Source code(tar.gz)
Source code(zip)
v0.0.8(Dec 31, 2021)
What's Changed

Increment allennlp dependency and patch version by @cemilcengiz in https://github.com/obss/trapper/pull/50

Full Changelog: https://github.com/obss/trapper/compare/v0.0.7...v0.0.8
Source code(tar.gz)
Source code(zip)
v0.0.7(Dec 30, 2021)
What's Changed

fix typo in colab url by @fcakyon in https://github.com/obss/trapper/pull/42

Add the blog post link to the README by @cemilcengiz in https://github.com/obss/trapper/pull/43

Update dependencies by @cemilcengiz in https://github.com/obss/trapper/pull/45

Increment patch version by @cemilcengiz in https://github.com/obss/trapper/pull/46

Have allennlp installed from a nightly release by @cemilcengiz in https://github.com/obss/trapper/pull/47

Make allennlp and transformers dependencies wider by @cemilcengiz in https://github.com/obss/trapper/pull/48

Increment patch version by @cemilcengiz in https://github.com/obss/trapper/pull/49

New Contributors

@fcakyon made their first contribution in https://github.com/obss/trapper/pull/42

Full Changelog: https://github.com/obss/trapper/compare/v0.0.5...v0.0.7
Source code(tar.gz)
Source code(zip)
v0.0.5(Nov 10, 2021)
What's Changed

Update readme by @cemilcengiz in https://github.com/obss/trapper/pull/38

Add ignored labels to labelmapper by @cemilcengiz in https://github.com/obss/trapper/pull/40

Fix patch version by @cemilcengiz in https://github.com/obss/trapper/pull/41

Full Changelog: https://github.com/obss/trapper/compare/v0.0.4...v0.0.5
Source code(tar.gz)
Source code(zip)
v0.0.4(Nov 9, 2021)
What's Changed

QA example refactoring. by @devrimcavusoglu in https://github.com/obss/trapper/pull/36

Metrics implementation by @devrimcavusoglu in https://github.com/obss/trapper/pull/27

Full Changelog: https://github.com/obss/trapper/compare/v0.0.3...v0.0.4
Source code(tar.gz)
Source code(zip)
v0.0.3(Nov 1, 2021)
Exclude the packages except for trapper in the setup file.

What's Changed

Exclude scripts and test_fixtures in the setup file by @cemilcengiz in https://github.com/obss/trapper/pull/35

Full Changelog: https://github.com/obss/trapper/compare/v0.0.2...v0.0.3
Source code(tar.gz)
Source code(zip)
v0.0.2(Nov 1, 2021)
What's Changed

Pos tagging example by @cemilcengiz in https://github.com/obss/trapper/pull/29

Train pos tag by @cemilcengiz in https://github.com/obss/trapper/pull/33

Full Changelog: https://github.com/obss/trapper/compare/v0.0.1...v0.0.2
Source code(tar.gz)
Source code(zip)

Owner

Open Business Software Solutions

Open Source for Open Business

GitHub

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

Haystack is an end-to-end framework that enables you to build powerful and production-ready pipelines for different search use cases. Whether you want

1.4k Feb 18, 2021

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

12.3k Dec 31, 2022

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. IMPORTANT: (30.08.2020) We moved our models

10k Feb 18, 2021

DaCy: The State of the Art Danish NLP pipeline using SpaCy

DaCy: A SpaCy NLP Pipeline for Danish DaCy is a Danish preprocessing pipeline trained in SpaCy. At the time of writing it has achieved State-of-the-Ar

71 Jan 6, 2023

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art NLP. Developed by Humboldt University of Berlin and friends. Flair is: A powerful NLP library. Flair allo

12.3k Jan 2, 2023

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

In recent years, the dense retrievers based on pre-trained language models have achieved remarkable progress. To facilitate more developers using cutt

475 Jan 4, 2023

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

MIDI Language Introduction Reference Paper: Pop Music Transformer: Beat-based Modeling and Generation of Expressive Pop Piano Compositions: code This

3 May 25, 2022

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models. Everything is pure Python and PyTorch based to keep it as simple and beginner-friendly, yet powerful as possible.

Digital Phonetics at the University of Stuttgart

247 Jan 5, 2023

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Haystack is an end-to-end framework for Question Answering & Neural search that enables you to ... ... ask questions in natural language and find gran

6.4k Jan 9, 2023

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

1.6k Dec 27, 2022

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

77.3k Jan 3, 2023

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

6.2k Dec 31, 2022

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

2.9k Jan 2, 2023

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

1.5k Feb 11, 2021

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

State-of-the-art Natural Language Processing for PyTorch and TensorFlow 2.0 ?? Transformers provides thousands of pretrained models to perform tasks o

40.9k Feb 18, 2021

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

Provides an implementation of today's most used tokenizers, with a focus on performance and versatility. Main features: Train new vocabularies and tok

4.3k Feb 18, 2021

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

2.6k Feb 18, 2021

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

NeuroNER NeuroNER is a program that performs named-entity recognition (NER). Website: neuroner.com. This page gives step-by-step instructions to insta

1.5k Feb 17, 2021

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

The PyTorch-Kaldi Speech Recognition Toolkit PyTorch-Kaldi is an open-source repository for developing state-of-the-art DNN/HMM speech recognition sys

2.3k Dec 27, 2022

State-of-the-art NLP through transformer models in a modular design and consistent APIs.

Related tags

Overview

Trapper (Transformers wRAPPER)

Why You Should Use Trapper

Key Features

Compatibility with HuggingFace Transformers

Dependency Injection and Training Based on Configuration Files

Full Integration with HuggingFace Datasets

Support for Metrics through Jury

Abstractions and Base Classes

Usage

Modeling the Problem

Modeling the Input

Class Reference

Registering classes from custom modules to the library

Option 1 - File based

Option 2 - Using the command line argument

Running a training and/or evaluation experiment

Config File Based Training Using the CLI

Script Based Training

Examples for Using Trapper as a Library

Training a POS Tagging Model on CONLL2003

Training a Question Answering Model on SQuAD Dataset

Currently Supported Tasks and Models From Transformers

Table of Model-task Combinations Tested so far

Installation

Environment Creation

Regular Installation

Contributing

Testing trapper

Caching the test fixtures from the HuggingFace datasets library

Code Style

Contributors

Comments

Releases(v0.0.11)

v0.0.11(Nov 3, 2022)

What's Changed

v0.0.10(Sep 26, 2022)

What's Changed

v0.0.9(Aug 11, 2022)

What's Changed

New Contributors

v0.0.8(Dec 31, 2021)

What's Changed

v0.0.7(Dec 30, 2021)

What's Changed

New Contributors

v0.0.5(Nov 10, 2021)

What's Changed

v0.0.4(Nov 9, 2021)

What's Changed

v0.0.3(Nov 1, 2021)

What's Changed

v0.0.2(Nov 1, 2021)

What's Changed

Owner

Open Business Software Solutions

:mag: End-to-End Framework for building natural language search interfaces to data by utilizing Transformers and the State-of-the-Art of NLP. Supporting DPR, Elasticsearch, HuggingFace’s Modelhub and much more!

A very simple framework for state-of-the-art Natural Language Processing (NLP)

A very simple framework for state-of-the-art Natural Language Processing (NLP)

DaCy: The State of the Art Danish NLP pipeline using SpaCy

A very simple framework for state-of-the-art Natural Language Processing (NLP)

🚀 RocketQA, dense retrieval for information retrieval and question answering, including both Chinese and English state-of-the-art models.

A design of MIDI language for music generation task, specifically for Natural Language Processing (NLP) models.

IMS-Toucan is a toolkit to train state-of-the-art Speech Synthesis models

:mag: Transformers at scale for question answering & neural search. Using NLP via a modular Retriever-Reader-Pipeline. Supporting DPR, Elasticsearch, HuggingFace's Modelhub...

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

🤗Transformers: State-of-the-art Natural Language Processing for Pytorch and TensorFlow 2.0.

💥 Fast State-of-the-Art Tokenizers optimized for Research and Production

A model library for exploring state-of-the-art deep learning topologies and techniques for optimizing Natural Language Processing neural networks

Named-entity recognition using neural networks. Easy-to-use and state-of-the-art results.

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.