🏎️ Accelerate training and inference of 🤗 Transformers with easy to use hardware optimization tools

Overview

ONNX Runtime neural_compressor

Hugging Face Optimum

🤗 Optimum is an extension of 🤗 Transformers, providing a set of performance optimization tools enabling maximum efficiency to train and run models on targeted hardware.

The AI ecosystem evolves quickly and more and more specialized hardware along with their own optimizations are emerging every day. As such, Optimum enables users to efficiently use any of these platforms with the same ease inherent to transformers.

Integration with Hardware Partners

🤗 Optimum aims at providing more diversity towards the kind of hardware users can target to train and finetune their models.

To achieve this, we are collaborating with the following hardware manufacturers in order to provide the best transformers integration:

Optimizing models towards inference

Along with supporting dedicated AI hardware for training, Optimum also provides inference optimizations towards various frameworks and platforms.

We currently support ONNX runtime along with Intel Neural Compressor (INC).

Features ONNX Runtime Intel Neural Compressor
Post-training Dynamic Quantization ✔️ ✔️
Post-training Static Quantization ✔️ ✔️
Quantization Aware Training (QAT) Stay tuned! ✔️
Pruning N/A ✔️

Installation

🤗 Optimum can be installed using pip as follows:

python -m pip install optimum

If you'd like to use the accelerator-specific features of 🤗 Optimum, you can install the required dependencies according to the table below:

Accelerator Installation
ONNX runtime python -m pip install optimum[onnxruntime]
Intel Neural Compressor (INC) python -m pip install optimum[intel]
Graphcore IPU python -m pip install optimum[graphcore]
Habana Gaudi Processor (HPU) python -m pip install optimum[habana]

If you'd like to play with the examples or need the bleeding edge of the code and can't wait for a new release, you can install the base library from source as follows:

python -m pip install git+https://github.com/huggingface/optimum.git

For the accelerator-specific features, you can install them by appending #egg=optimum[accelerator_type] to the pip command, e.g.

python -m pip install git+https://github.com/huggingface/optimum.git#egg=optimum[onnxruntime]

Quickstart

At its core, 🤗 Optimum uses configuration objects to define parameters for optimization on different accelerators. These objects are then used to instantiate dedicated optimizers, quantizers, and pruners.

Quantization

For example, here's how you can apply dynamic quantization with ONNX Runtime:

from optimum.onnxruntime.configuration import AutoQuantizationConfig
from optimum.onnxruntime import ORTQuantizer

# The model we wish to quantize
model_checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
# The type of quantization to apply
qconfig = AutoQuantizationConfig.arm64(is_static=False, per_channel=False)
quantizer = ORTQuantizer.from_pretrained(model_checkpoint, feature="sequence-classification")

# Quantize the model!
quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    quantization_config=qconfig,
)

In this example, we've quantized a model from the Hugging Face Hub, but it could also be a path to a local model directory. The feature argument in the from_pretrained() method corresponds to the type of task that we wish to quantize the model for. The result from applying the export() method is a model-quantized.onnx file that can be used to run inference. Here's an example of how to load an ONNX Runtime model and generate predictions with it:

from functools import partial
from datasets import Dataset
from optimum.onnxruntime.model import ORTModel

# Load quantized model
ort_model = ORTModel("model-quantized.onnx", quantizer._onnx_config)
# Create a dataset or load one from the Hub
ds = Dataset.from_dict({"sentence": ["I love burritos!"]})
# Tokenize the inputs
def preprocess_fn(ex, tokenizer):
    return tokenizer(ex["sentence"])

tokenized_ds = ds.map(partial(preprocess_fn, tokenizer=quantizer.tokenizer))
ort_outputs = ort_model.evaluation_loop(tokenized_ds)
# Extract logits!
ort_outputs.predictions

Similarly, you can apply static quantization by simply setting is_static to True when instantiating the QuantizationConfig object:

qconfig = AutoQuantizationConfig.arm64(is_static=True, per_channel=False)

Static quantization relies on feeding batches of data through the model to estimate the activation quantization parameters ahead of inference time. To support this, 🤗 Optimum allows you to provide a calibration dataset. The calibration dataset can be a simple Dataset object from the 🤗 Datasets library, or any dataset that's hosted on the Hugging Face Hub. For this example, we'll pick the sst2 dataset that the model was originally trained on:

from optimum.onnxruntime.configuration import AutoCalibrationConfig

# Create the calibration dataset
calibration_dataset = quantizer.get_calibration_dataset(
    "glue",
    dataset_config_name="sst2",
    preprocess_function=partial(preprocess_fn, tokenizer=quantizer.tokenizer),
    num_samples=50,
    dataset_split="train",
)
# Create the calibration configuration containing the parameters related to calibration.
calibration_config = AutoCalibrationConfig.minmax(calibration_dataset)
# Perform the calibration step: computes the activations quantization ranges
ranges = quantizer.fit(
    dataset=calibration_dataset,
    calibration_config=calibration_config,
    onnx_model_path="model.onnx",
    operators_to_quantize=qconfig.operators_to_quantize,
)
# Quantize the same way we did for dynamic quantization!
quantizer.export(
    onnx_model_path="model.onnx",
    onnx_quantized_model_output_path="model-quantized.onnx",
    calibration_tensors_range=ranges,
    quantization_config=qconfig,
)

Graph optimization

Then let's take a look at applying graph optimizations techniques such as operator fusion and constant folding. As before, we load a configuration object, but this time by setting the optimization level instead of the quantization approach:

from optimum.onnxruntime.configuration import OptimizationConfig

# optimization_config=99 enables all available graph optimisations
optimization_config = OptimizationConfig(optimization_level=99)

Next, we load an optimizer to apply these optimisations to our model:

from optimum.onnxruntime import ORTOptimizer

optimizer = ORTOptimizer.from_pretrained(
    model_checkpoint,
    feature="sequence-classification",
)

# Export the optimized model
optimizer.export(
    onnx_model_path="model.onnx",
    onnx_optimized_model_output_path="model-optimized.onnx",
    optimization_config=optimization_config,
)

And that's it - the model is now optimized and ready for inference!

As you can see, the process is similar in each case:

  1. Define the optimization / quantization strategies via an OptimizationConfig / QuantizationConfig object
  2. Instantiate an ORTQuantizer or ORTOptimizer class
  3. Apply the export() method
  4. Run inference

Training

Besides supporting ONNX Runtime inference, 🤗 Optimum also supports ONNX Runtime training, reducing the memory and computations needed during training. This can be achieved by using the class ORTTrainer, which possess a similar behavior than the Trainer of 🤗 Transformers:

-from transformers import Trainer
+from optimum.onnxruntime import ORTTrainer

# Step 1: Create your ONNX Runtime Trainer
-trainer = Trainer(
+trainer = ORTTrainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
    compute_metrics=compute_metrics,
    tokenizer=tokenizer,
    data_collator=default_data_collator,
    feature="sequence-classification",
)

# Step 2: Use ONNX Runtime for training and evalution!🤗
train_result = trainer.train()
eval_metrics = trainer.evaluate()

By replacing Trainer by ORTTrainer, you will be able to leverage ONNX Runtime for fine-tuning tasks.

Check out the examples directory for more sophisticated usage.

Happy optimizing 🤗 !

Comments
  • Handling ONNX models with external data

    Handling ONNX models with external data

    This PR aims to handle loading and exporting ONNX models with external data, locally and from the hub. We can also now use FORCE_ONNX_EXTERNAL_DATA=1 to force using external data format even for small models

    • [X] Saving/loading a model with external data locally
    • [X] Saving external data in a single file (ends with .onnx_data for easy loading from hub)
    • [X] Saving/loading a model with external data from the hub
    • [X] Writing tests
    • [X] Apply the same changes for other models besides seq2seq

    cc @fxmarty @mht-sharma @michaelbenayoun

    Fixes https://github.com/huggingface/optimum/issues/254 and https://github.com/huggingface/optimum/issues/377

    opened by NouamaneTazi 32
  • add mt5 to ORTConfigManager conf list

    add mt5 to ORTConfigManager conf list

    What does this PR do?

    Add MT5 to ORTConfigManager.

    Fixes #321

    I re-arranged in alphabetical order all available models. I can put it back like it was if needed. 🤗

    @JingyaHuang

    Aside from this PR, I was wondering if opening an issue like https://github.com/huggingface/transformers/issues/16308 for implementing all available onnx models in the ORTConfigManager could be nice?

    opened by ChainYo 24
  • [BT] Add `Bettertransformer` support for FSMT

    [BT] Add `Bettertransformer` support for FSMT

    What does this PR do?

    Fixes # (issue)

    Before submitting

    • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [ ] Did you make sure to update the documentation with your changes?
    • [ ] Did you write any new necessary tests?
    opened by Sumanth077 18
  • Add the ORTModelForSemanticSegmentation class

    Add the ORTModelForSemanticSegmentation class

    What does this PR do?

    This PR aims to implement the ORTModelForImageSegmentation class to provide support for image segmentation .onnx models, and full integration of such models through transformers pipelines for CPU or GPU onnxruntime inference (see Issue #382)

    Implementation details

    The ORTModelForImageSegmentation was based on the already implemented ORTModelForImageClassification in optimum/onnxruntime/modeling_ort.py with several modifications:

    1. For CPU and GPU inference :
    • class was added to optimum/onnxruntime/__init__.py
    • self.forward method returns a SemanticSegmenterOutput instead of ImageClassifierOutput
    • correct auto_model_classand export_feature referenced
    • Copied all tests from the ORTModelForImageClassificationIntegrationTest in tests/onnxruntime/test_modeling.py
    1. For GPU inference
    • logits_shape was changed ORTModelForImageSegmentation.prepare_logits_buffer to return a 4 dimensional tensor shape 2D of shape (input_batch_size, self.config.num_labels, output_height, output_width). The issue is that I did not find a way to get model output size, which is different from input size from config.json, or any other attribute of ORTModelForImageSegmentation or ORTModelForImageSegmentation.model.

    CPU inference works as following:

    from optimum.onnxruntime.modeling_ort import ORTModelForImageSegmentation
    session = ORTModelForImageSegmentation.load_model(onnx_path)
    onnx_model = ORTModelForImageSegmentation(session)
    inputs = feature_extractor(pil_image, return_tensors="pt")
    outputs = onnx_model(**inputs)
    

    I could not test GPU inference because I could not manage to make onnxruntime-gpu work:

    onnx_model.to('cuda:0')
    >>>  File "C:\Users\theol\Documents\GitHub\Repositories\optimum\optimum\onnxruntime\modeling_ort.py", line 202, in to
        validate_provider_availability(provider)  # raise error if the provider is not available
    >>>  File "C:\Users\theol\Documents\GitHub\Repositories\optimum\optimum\onnxruntime\utils.py", line 227, in validate_provider_availability
        raise ImportError(
    >>>ImportError: Asked to use CUDAExecutionProvider, but `onnxruntime-gpu` package was not found. Make sure to install `onnxruntime-gpu` package instead of `onnxruntime`.
    

    Might be because of local venv setup issues on my side. My CUDA installation is working for transformers with torch models. Still, it probably would not work properly yet because of the wrong output size in prepare_logits_buffer

    Remaining tasks

    • Fixing proper output size for io binding
    • Uploading a .onnx segmentation model to https://huggingface.co/hf-internal-testing and modify IMAGE_SEGMENTATION_EXAMPLE checkpoint name and image url to appropriate example. (See two comments at optimum/onnxruntime/modeling_ort.py lines 1463 and 1533)
    • Modify test class model to a SemanticSegmentation model in order to get working tests

    @michaelbenayoun @JingyaHuang your help would be appreciated 👍

    opened by TheoMrc 17
  • Saving external data for large ONNX models

    Saving external data for large ONNX models

    What does this PR do?

    Fixes #254 and https://github.com/huggingface/optimum/issues/377

    We can now load and save ORT models that have external data 🚀

    opened by NouamaneTazi 16
  • onnx speed is even slower

    onnx speed is even slower

    System Info

    win10 python 3.8.4 pytorch 12.1 cpu transformers4.22.2 optimum 1.4.0 onnxruntime 1.12.1

    Who can help?

    @Narsil @patil-suraj

    Information

    • [X] The official example scripts
    • [ ] My own modified scripts

    Tasks

    • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [ ] My own task or dataset (give details below)

    Reproduction

    from transformers import AutoTokenizer, pipeline from optimum.onnxruntime import ORTModelForSeq2SeqLM import warnings text="Vehicle detection technology is of great significance for realizing automatic monitoring and AI-assisted driving systems. The state-of-the-art object detection method, namely, a class of YOLOv5, has often been used to detect vehicles." warnings.filterwarnings("ignore") import time textlists=[text,text,text,text,text] model_checkpoint = "Helsinki-NLP/opus-mt-en-zh" model = ORTModelForSeq2SeqLM.from_pretrained(model_checkpoint, from_transformers=True) tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)

    model.save_pretrained("onnx")

    tokenizer.save_pretrained("onnx")

    onnx_translation = pipeline("translation_en_to_zh", model=model, tokenizer=tokenizer) t1=time.time() result = onnx_translation(textlists) print(result ,time.time()-t1)

    from transformers import ( MarianTokenizer, MarianMTModel, ) modchoice = "Helsinki-NLP/opus-mt-en-zh" tokenizer = MarianTokenizer.from_pretrained(modchoice)

    model = MarianMTModel.from_pretrained(modchoice)

    t1 = time.time() encoded=tokenizer.prepare_seq2seq_batch( textlists, truncation=True, padding="longest", return_tensors="pt" )

    encoded.to(device)

    translated = model.generate( **encoded )

    tgt_text = [tokenizer.decode(t, skip_special_tokens=True) for t in translated] print(tgt_text,time.time() - t1)

    Batch processing is much slower, and single processing is only a little faster

    Expected behavior

    Faster batch processing

    inference onnxruntime 
    opened by chaodreaming 14
  • Issue to use GPT2 ONNX export with past key values

    Issue to use GPT2 ONNX export with past key values

    System Info

    python: 3.10.6
    platform: Ubuntu 22.10
    optimum version: 1.5.1
    onnxruntime: 1.13.1
    

    Who can help?

    @JingyaHuang @ec

    Information

    • [ ] The official example scripts
    • [X] My own modified scripts

    Tasks

    • [ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [X] My own task or dataset (give details below)

    Reproduction

    Command line to export a GPT2 model:

    python -m optimum.exporters.onnx --model gpt2 --task causal-lm-with-past output/
    

    Gives the following output logs:

    Framework not specified. Using pt to export to ONNX.
    Using framework PyTorch: 1.13.0+cu117
    Overriding 2 configuration item(s)
    	- use_cache -> True
    	- pad_token_id -> 0
    /home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py:796: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
      if batch_size <= 0:
    /home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py:185: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
      attn_weights = attn_weights / torch.tensor(
    /home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py:185: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
      attn_weights = attn_weights / torch.tensor(
    /home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/transformers/models/gpt2/modeling_gpt2.py:200: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this function to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might cause the trace to be incorrect.
      mask_value = torch.tensor(mask_value, dtype=attn_weights.dtype).to(attn_weights.device)
    Validating ONNX model...
    	-[✓] ONNX model output names match reference model (present.1.value, present.0.key, present.6.key, present.6.value, present.5.value, present.8.key, present.0.value, present.2.key, present.5.key, present.10.key, present.9.value, present.10.value, logits, present.4.value, present.7.key, present.11.value, present.3.value, present.3.key, present.4.key, present.2.value, present.1.key, present.9.key, present.11.key, present.8.value, present.7.value)
    	- Validating ONNX Model output "logits":
    		-[✓] (2, 16, 50257) matches (2, 16, 50257)
    		-[x] values not close enough, max diff: 0.0013427734375 (atol: 1e-05)
    	- Validating ONNX Model output "present.0.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.0.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.1.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.1.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.2.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.2.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.3.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.3.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.4.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.4.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.5.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.5.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.6.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.6.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.7.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.7.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.8.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.8.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.9.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.9.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.10.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.10.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.11.key":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    	- Validating ONNX Model output "present.11.value":
    		-[✓] (2, 12, 32, 64) matches (2, 12, 32, 64)
    		-[✓] all values close (atol: 1e-05)
    An error occured, but the model was saved at: model_repository/gpt2/1/model.onnx
    

    Eventhough there is an error in the close values validation, that's ok. Now I would like to run the model with the following Python:

    from optimum.onnxruntime import ORTModelForCausalLM
    from transformers import GPT2Tokenizer
    
    model = ORTModelForCausalLM.from_pretrained("output/", from_transformers=False, use_cache=True)
    tokenizer = GPT2Tokenizer.from_pretrained("output/")
    tokens = tokenizer("My name is Julien and I like", return_tensors="pt")
    outputs_model = model.generate(**tokens)
    

    And I get the following error:

    /home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/transformers/generation_utils.py:1359: UserWarning: Neither `max_length` nor `max_new_tokens` has been set, `max_length` will default to 20 (`self.config.max_length`). Controlling `max_length` via the config is deprecated and `max_length` will be removed from the config in v5 of Transformers -- we recommend using `max_new_tokens` to control the maximum length of the generation.
      warnings.warn(
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "/home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/transformers/generation_utils.py", line 1490, in generate
        return self.greedy_search(
      File "/home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/transformers/generation_utils.py", line 2233, in greedy_search
        outputs = self(
      File "/home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/optimum/modeling_base.py", line 60, in __call__
        return self.forward(*args, **kwargs)
      File "/home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/optimum/onnxruntime/modeling_ort.py", line 1454, in forward
        outputs = self.model.run(None, onnx_inputs)
      File "/home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 196, in run
        raise ValueError("Model requires {} inputs. Input Feed contains {}".format(num_required_inputs, num_inputs))
    ValueError: Model requires 26 inputs. Input Feed contains 2
    

    Do I have to randomly feed myself the past_key_values.X.value and past_key_values.X.keys?

    When I try to do this directly with onnxruntime, I also get an error. Here what I do:

    import onnxruntime as ort
    from transformers import GPT2Tokenizer
    import numpy as np
    
    sess = ort.InferenceSession('output/model.onnx', providers=["CPUExecutionProvider"])
    tokenizer = GPT2Tokenizer.from_pretrained("output/")
    tokens = dict(tokenizer("My name is Julien and I like", return_tensors="np"))
    shape = (1, 12, len(tokens["input_ids"][0]), 64)
    
    for i in range(12):
        tokens[f"past_key_values.{i}.key"] = np.random.uniform(0, 1, shape).astype(np.float32)
        tokens[f"past_key_values.{i}.value"] = np.random.uniform(0, 1, shape).astype(np.float32)
    
    sess.run(None, tokens)
    

    And I get the following error:

    2022-12-06 16:42:17.603173515 [E:onnxruntime:, sequential_executor.cc:369 Execute] Non-zero status code returned while running Add node. Name:'/transformer/h.0/attn/Add' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:503 void onnxruntime::BroadcastIterator::Init(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 8 by 16
    
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/home/jplu/anaconda3/envs/transformers/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 200, in run
        return self._sess.run(output_names, input_feed, run_options)
    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Add node. Name:'/transformer/h.0/attn/Add' Status Message: /onnxruntime_src/onnxruntime/core/providers/cpu/math/element_wise_ops.h:503 void onnxruntime::BroadcastIterator::Init(ptrdiff_t, ptrdiff_t) axis == 1 || axis == largest was false. Attempting to broadcast an axis by a dimension other than 1. 8 by 16
    

    Expected behavior

    I expect to have a proper generation and usage with onnxruntime. The final goal is to use it through a Triton server.

    I certainly miss something, but the documentation is not clear on how to properly use seq2seq and causal-lm with past-key-values either directly with onnxruntime or with optimum.

    Thanks a lot in advance for all the advices you could provide :)

    bug 
    opened by jplu 13
  • Added support for Tapas Model

    Added support for Tapas Model

    What does this PR do?

    Fixes # 20372

    Before submitting

    • This PR adds new support for BetterTransformer integration for the Tapas model.
    • This PR adds documentation that indicates BetterTransofrmer integration for Tapas is added.

    Questions

    • Can I ask you how I can test to add Bettertransformer feature for Tapas Model?

    To: @younesbelkada, @sgugger

    opened by JuheonChu 13
  • Inference worse with onnxruntime-gpu than native pytorch for seq2seq model

    Inference worse with onnxruntime-gpu than native pytorch for seq2seq model

    System Info

    Optimum: 1.4.1.dev0
    torch: 1.12.1+cu116
    onnx: 1.12.0
    onnxruntime-gpu: 1.12.1
    python: 3.8.13
    CUDA: 11.6
    cudnn: 8.4.1
    RTX 3090
    

    Who can help?

    @JingyaHuang @echarlaix

    Information

    • [X] The official example scripts
    • [ ] My own modified scripts

    Tasks

    • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [ ] My own task or dataset (give details below)

    Reproduction

    I compared inference on GPU of a native torch Helsinki-NLP/opus-mt-fr-en model with respect to the optimized onnx model thanks to Optimum library. So, I have defined a fastAPI microservice based on two classes below for GPU both torch and optimized ONNX, repsectively:

    class Seq2SeqModel:
        tokenizer: Optional[MarianTokenizer]
        model: Optional[MarianMTModel]
    
        def load_model(self):
            """Loads the model"""
            # model_id="Helsinki-NLP/opus-mt-fr-en"
            model_path = Path("./app/artifacts/HF")
            tokenizer = AutoTokenizer.from_pretrained(model_path)
            model = AutoModelForSeq2SeqLM.from_pretrained(model_path).to("cuda")
            self.tokenizer = tokenizer
            self.model = model
    
        async def predict(self, input: PredictionInput) -> PredictionOutput:
            """Runs a prediction"""
            if not self.tokenizer or not self.model:
                raise RuntimeError("Model is not loaded")
            tokens = self.tokenizer(input.text, return_tensors="pt").to("cuda")
            translated = self.model.generate(**tokens, num_beams=beam_size)
            return PredictionOutput(translated_text=self.tokenizer.decode(translated[0], skip_special_tokens=True))
    
    class OnnxOptimizedSeq2SeqModel:
        tokenizer: Optional[MarianTokenizer]
        model: Optional[ORTModelForSeq2SeqLM]
    
        def load_model(self):
            """Loads the model"""
            # model_id="Helsinki-NLP/opus-mt-fr-en"
            onnx_path = Path("./app/artifacts/OL_1")
            tokenizer = AutoTokenizer.from_pretrained(onnx_path)
            optimized_model = ORTModelForSeq2SeqLM.from_pretrained(
                onnx_path,
                encoder_file_name="encoder_model_optimized.onnx",
                decoder_file_name="decoder_model_optimized.onnx",
                decoder_file_with_past_name="decoder_with_past_model_optimized.onnx",
                provider="CUDAExecutionProvider"
            )
            self.tokenizer = tokenizer
            self.model = optimized_model
    
    app = FastAPI()
    seq2seq_model = Seq2SeqModel()
    onnx_optimized_seq2seq_model = OnnxOptimizedSeq2SeqModel()
    beam_size = 3
    
    @app.on_event("startup")
    async def startup():
        seq2seq_model.load_model()
        onnx_optimized_seq2seq_model.load_model()
    
    @app.post("/prediction")
    async def prediction(
        output: PredictionOutput = Depends(seq2seq_model.predict),
    ) -> PredictionOutput:
        return output
    
    @app.post("/prediction_onnx_optimized")
    async def prediction(
        output: PredictionOutput = Depends(onnx_optimized_seq2seq_model.predict),
    ) -> PredictionOutput:
        return output
    

    Expected behavior

    When load testing the model on my local computer, I was surprised by two things:

    1. The performance on GPU of the optimized ONNX model is worse than the native torch (maybe linked to #365 and #396?) :

    GPU_optimized_onnxruntime GPU_torch

    1. When running this fastAPI service into a docker image I got the following warning:

    2022-09-28 08:20:21.214094612 [W:onnxruntime:Default, onnxruntime_pybind_state.cc:566 CreateExecutionProviderInstance] Failed to create CUDAExecutionProvider. Please reference https://onnxruntime.ai/docs/reference/execution-providers/CUDA-ExecutionProvider.html#requirements to ensure all dependencies are met.

    Does this mean the CUDAExecutionProvider is not working even if I set it in?:

            optimized_model = ORTModelForSeq2SeqLM.from_pretrained(
                onnx_path,
                encoder_file_name="encoder_model_optimized.onnx",
                decoder_file_name="decoder_model_optimized.onnx",
                decoder_file_with_past_name="decoder_with_past_model_optimized.onnx",
                provider="CUDAExecutionProvider"
            )
    

    What could be caused that? I saw in https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html that CUDA 11.6 is not mentionned, could it be this?

    bug inference onnxruntime 
    opened by Matthieu-Tinycoaching 12
  • [ORT] Filter out invalid inputs in ORTModelForXXX forward pass

    [ORT] Filter out invalid inputs in ORTModelForXXX forward pass

    Context

    TL;DR

    Transformers #17617

    Long story to tell... For DeBERTa model, the tokenizer gives out token_type_ids by default. However, the exported IR might not contain token_type_ids(eg. the case when config.type_vocab_size=0 if exported by transformers.onnx.export). In this situation:

    1. The forward pass will fail if the user takes directly the output as input(as our snippet does).
    2. Otherwise, they need to add another line to filter out invalid input themselves which needs a deeper understanding of the model and its tokenizer.

    Considering the user experience, I think that we shall add this filter directly in the ORTModelForXXX.

    What does this PR do?

    • Filter out invalid inputs in ORTModelForXXX.

    Fixes #207

    opened by JingyaHuang 12
  • Unable to use GPU accelerated Optimum Onnx transformer model for inference

    Unable to use GPU accelerated Optimum Onnx transformer model for inference

    System Info

    Optimum Version: 1.5.0
    Ubuntu 20.04 Linux 
    Python version 3.8
    

    Who can help?

    @JingyaHuang @echarlaix When following the documentation on https://huggingface.co/docs/optimum/main/en/onnxruntime/usage_guides/gpu for 1.5.0 version optimum. We get the following error:


    RuntimeError Traceback (most recent call last) in 19 "education", 20 "music"] ---> 21 pred = onnx_z0(sequence_to_classify, candidate_labels, multi_class=False)

    8 frames /usr/local/lib/python3.8/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py in bind_input(self, name, device_type, device_id, element_type, shape, buffer_ptr) 454 :param buffer_ptr: memory pointer to input data 455 """ --> 456 self._iobinding.bind_input( 457 name, 458 C.OrtDevice(

    RuntimeError: Error when binding input: There's no data transfer registered for copying tensors from Device:[DeviceType:1 MemoryType:0 DeviceId:0] to Device:[DeviceType:0 MemoryType:0 DeviceId:0]

    This is reproducible on google colab gpu instance as well. This is observed from 1.5.0 version only and 1.4.1 works as expected.

    Information

    • [X] The official example scripts
    • [ ] My own modified scripts

    Tasks

    • [X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
    • [ ] My own task or dataset (give details below)

    Reproduction

    !pip install optimum[onnxruntime-gpu]==1.5.1 !pip install transformers onnx

    from optimum.onnxruntime import ORTModelForSequenceClassification

    ort_model = ORTModelForSequenceClassification.from_pretrained( "philschmid/tiny-bert-sst2-distilled", from_transformers=True, provider="CUDAExecutionProvider", )

    from optimum.pipelines import pipeline from transformers import AutoTokenizer

    tokenizer = AutoTokenizer.from_pretrained("philschmid/tiny-bert-sst2-distilled")

    pipe = pipeline(task="text-classification", model=ort_model, tokenizer=tokenizer) result = pipe("Both the music and visual were astounding, not to mention the actors performance.") print(result)

    Expected behavior

    Inference fails due to device error, which is not expected.

    bug 
    opened by smiraldr 11
  • ONNX transformation to cast int64 constants to int32 when possible

    ONNX transformation to cast int64 constants to int32 when possible

    As per title.

    Partially fixes #627 , we need to integrate this in this CLI and document + test.

    Try with:

    import onnx
    from pathlib import Path
    from optimum.onnx import model_to_int32
    
    path = "/path/to/decoder_model.onnx"
    model = onnx.load(path)
    
    model = model_to_int32(model)
    
    onnx.save(
        model,
        path,
        save_as_external_data=True,
        all_tensors_to_one_file=True,
        location=Path(path).name + "_data",
    )
    
    onnx.checker.check_model(path)
    

    Inspect the original and transformed models "Slice" nodes.

    opened by fxmarty 2
  • Fix provider options when several providers are passed

    Fix provider options when several providers are passed

    When several providers are passed to the InferenceSession, which is the case when TensorrtExecutionProvider is chosen, the provider_options argument needs to be of the same length than providers, otherwise raising:

    EP Error using ['TensorrtExecutionProvider', 'CUDAExecutionProvider']
    Falling back to ['CUDAExecutionProvider', 'CPUExecutionProvider'] and retrying.
    

    Reference: https://onnxruntime.ai/docs/api/python/api_summary.html#inferencesession

    This was untested up to now. Still need to add a test for this PR.

    In a next PR: remove all the code duplication for load_model() in modeling_ort.py, modeling_decoder.py, modeling_seq2seq.py. But I won't do it in this PR.

    This should fix https://github.com/huggingface/optimum/issues/606 https://github.com/huggingface/optimum/issues/605

    opened by fxmarty 1
  • Support generation config in ORTModel

    Support generation config in ORTModel

    This PR adds support for generation config in ORTModel, following https://github.com/huggingface/transformers/pull/20388

    Note: we should really add nightly tests tracking on transformers/diffusers main.

    opened by fxmarty 3
  • Fix uninformative message when passing `use_cache=True` to ORTModel and no ONNX with cache is available

    Fix uninformative message when passing `use_cache=True` to ORTModel and no ONNX with cache is available

    As per title,

    from optimum.onnxruntime import ORTModelForCausalLM
    
    ort_model = ORTModelForCausalLM.from_pretrained("/path/to/gpt2_onnx", use_cache=True)
    

    raises

      File "/home/fxmarty/hf_internship/optimum/optimum/onnxruntime/modeling_decoder.py", line 536, in _from_pretrained
        decoder_with_past_path = ORTModelDecoder.infer_onnx_filename(
      File "/home/fxmarty/hf_internship/optimum/optimum/onnxruntime/modeling_ort.py", line 351, in infer_onnx_filename
        raise FileNotFoundError(f"Could not find any ONNX model file in {path}")
    FileNotFoundError: Could not find any ONNX model file in /home/fxmarty/hf_internship/optimum/gpt2_onnx
    

    which is not informative. With this PR, we get:

    FileNotFoundError: The parameter `use_cache=True` was passed to ORTModelDecoder.from_pretrained() but no ONNX file using past key values could be found in /home/fxmarty/hf_internship/optimum/gpt2_onnx, with the error:
        Could not find any ONNX model file for the regex (.*)?decoder(.*)?with_past(.*)?\.onnx in /home/fxmarty/hf_internship/optimum/gpt2_onnx.
    
    opened by fxmarty 1
  • Added mapping for prophetnet

    Added mapping for prophetnet

    What does this PR do?

    Opening up draft PR to start discussion on how to add Better Transformer support for ProphetNet

    Fixes #488

    Before submitting

    • [ ] This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
    • [ ] Did you make sure to update the documentation with your changes?
    • [ ] Did you write any new necessary tests?
    opened by adit299 2
  • Enable merged decoder in ORTModel

    Enable merged decoder in ORTModel

    What does this PR do?

    Enable the use of merged decoders in ORT modeling.

    • [x] Check if it works for large proto, and add a saving option.
    • [ ] Adapt ORTModels to be able to use merged model (New input use_cache, dummy inputs for past_key_values...)
    • [ ] Check if merged ONNX model works for IOBinding (introduces new input use_cache, but dlpack doesn't support dtype=bool )

    To discuss:

    • Where should the merging be applied?
    • Shall it be automatically applied?
    opened by JingyaHuang 1
Releases(v1.6.1)
  • v1.6.1(Dec 23, 2022)

    Hotfixes

    • Revert breaking removal of EncoderOnnxConfig, DecoderOnnxConfig, _DecoderWithLMhead by @fxmarty in https://github.com/huggingface/optimum/pull/643
    • Fix item access of some _TASKS_TO_AUTOMODELS by @fxmarty in https://github.com/huggingface/optimum/pull/642

    Full Changelog: https://github.com/huggingface/optimum/compare/v1.6.0...v1.6.1

    Source code(tar.gz)
    Source code(zip)
  • v1.6.0(Dec 23, 2022)

    Optimum CLI

    The Optimum command line interface is introduced, and is now the official entrypoint for the ONNX export. Example commands:

    optimum-cli --help
    optimum-cli export onnx --help
    optimum-cli export onnx --model bert-base-uncased --task sequence-classification bert_onnx/
    
    • Add Optimum CLI backbone by @fxmarty in https://github.com/huggingface/optimum/pull/593

    Stable Diffusion ONNX export

    Optimum now supports the ONNX export of stable diffusion models from the diffusers library:

    optimum-cli export onnx --model runwayml/stable-diffusion-v1-5 sd_v15_onnx/
    
    • Add Stable Diffusion ONNX export by @echarlaix in https://github.com/huggingface/optimum/pull/570

    BetterTransformer support for more architectures

    BetterTransformer integration includes new models in this release: CLIP, RemBERT, mBART, ViLT, FSMT

    The complete list of supported models is available in the documentation.

    • [BT] Add Bettertransformer support for FSMT by @Sumanth077 in https://github.com/huggingface/optimum/pull/494
    • [BT] add BetterTransformer support for ViLT architecture by @ka00ri in https://github.com/huggingface/optimum/pull/508
    • Add MBart support for BetterTransformer by @ravenouse in https://github.com/huggingface/optimum/pull/516
    • Add CLIP BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/534
    • Add BetterTransformer support for RemBERT by @hchings in https://github.com/huggingface/optimum/pull/545

    ONNX export for more architectures

    The ONNX export now supports Swin, MobileNet-v1, MobileNet-v2.

    • Add Swin support in exporters.onnx by @fxmarty in https://github.com/huggingface/optimum/pull/528
    • [ONNX] add mobilenet support by @younesbelkada in https://github.com/huggingface/optimum/pull/633

    Extended ONNX export for encoder-decoder and decoder models

    Encoder-decoder or decoder-only models normally making use of the generate() method in transformers can now be exported in several files using the --for-ort argument:

    optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_small_onnx
    

    yielding:

    .
    └── t5_small_onnx
        ├── config.json
        ├── decoder_model.onnx
        ├── decoder_with_past_model.onnx
        ├── encoder_model.onnx
        ├── special_tokens_map.json
        ├── spiece.model
        ├── tokenizer_config.json
        └── tokenizer.json
    

    Passing --for-ort, exported models are expected to be loadable directly into ORTModel.

    • Add ort export in exporters for encoder-decoder models by @mht-sharma in https://github.com/huggingface/optimum/pull/497
    • Support decoder generated with --for-ort from optimum.exporters.onnx in ORTDecoder by @fxmarty in https://github.com/huggingface/optimum/pull/554

    Support for ONNX models with external data at export, optimization, quantization

    The ONNX export from PyTorch normally creates external data in case the exported model is larger than 2 GB. This release introduces a better support for the export and use of large models, writting all external data into a .onnx_data file if necessary.

    • Handling ONNX models with external data by @NouamaneTazi in https://github.com/huggingface/optimum/pull/586
    • Improve the compatibility dealing with large ONNX proto in ORTOptimizer and ORTQuantizer by @JingyaHuang in https://github.com/huggingface/optimum/pull/332

    ONNX Runtime API improvement

    Various improvements to allow for a better user experience in the ONNX Runtime integration:

    • ORTModel, ORTModelDecoder and ORTModelForConditionalGeneration can now load any ONNX model files regardless of their names, allowing to load optimized and quantized models without having to specify a file name argument.

    • ORTModel.from_pretrained() with from_transformers=True now downloads and loads the model in a temporary directory instead of the cache, which was not a right place to store it.

    • ORTQuantizer.save_pretrained() now saves the model configuration and the preprocessor, making the exported directory usable end-to-end.

    • ORTOptimizer.save_pretrained() now saves the preprocessor, making the exported directory usable end-to-end.

    • ONNX Runtime integration API improvement by @michaelbenayoun in https://github.com/huggingface/optimum/pull/515

    Custom shapes support at ONNX export

    The shape of the example input to provide for the export to ONNX can be overridden in case the validity of the ONNX model is sensitive to the shape used during the export.

    Read more: optimum-cli export onnx --help

    • Support custom shapes for dummy inputs by @fxmarty in https://github.com/huggingface/optimum/pull/522
    • Support for custom input shapes in exporters onnx by @fxmarty in https://github.com/huggingface/optimum/pull/575

    Enable use_cache=True for ORTModelForCausalLM

    Reusing past key values for models using ORTModelForCausalLM (e.g. gpt2) is now possible using use_cache=True, avoiding to recompute them at each iteration of the decoding:

    from transformers import AutoTokenizer
    from optimum.onnxruntime import ORTModelForCausalLM
    import torch
    
    tokenizer = AutoTokenizer.from_pretrained("gpt2")
    model = ORTModelForCausalLM.from_pretrained("gpt2", from_transformers=True, use_cache=True)
    
    inputs = tokenizer("My name is Arthur and I live in", return_tensors="pt")
    
    gen_tokens = model.generate(**inputs)
    tokenizer.batch_decode(gen_tokens)
    
    • Enable past_key_values for ORTModelForCausalLM by @echarlaix in https://github.com/huggingface/optimum/pull/326

    IO binding support for ORTModelForCustomTasks

    ORTModelForCustomTasks now supports IO Binding when using CUDAExecutionProvider.

    • Add IO binding support for custom ORTModel by @JingyaHuang in https://github.com/huggingface/optimum/pull/447

    Experimental support to merge ONNX decoder with/without past key values

    Along with --for-ort, when passing --task causal-lm-with-past, --task seq2seq-with-past or --task speech2seq-lm-with-past during the ONNX export exports two models: one not using the previously computed keys/values, and one using them.

    An experimental support is introduced to merge the two models in one. Example:

    optimum-cli export onnx --model t5-small --task seq2seq-lm-with-past --for-ort t5_onnx/
    
    import onnx
    from optimum.onnx import merge_decoders
    
    decoder = onnx.load("t5_onnx/decoder_model.onnx")
    decoder_with_past = onnx.load("t5_onnx/decoder_with_past_model.onnx")
    
    merged_model = merge_decoders(decoder, decoder_with_past)
    onnx.save(merged_model, "t5_onnx/decoder_merged_model.onnx")
    
    • Merge ONNX decoder models by @JingyaHuang in https://github.com/huggingface/optimum/pull/587

    Major bugs fixed

    • Fix BetterTransformer with padding="max_length" by @fxmarty in https://github.com/huggingface/optimum/pull/543
    • Fix non-nesting bug in BetterTransformer integration by @younesbelkada in https://github.com/huggingface/optimum/pull/637

    Other changes, bugfixes and improvements

    • Fix doc-builder premission error by @mishig25 in https://github.com/huggingface/optimum/pull/482
    • Fix doc build pr premissions by @mishig25 in https://github.com/huggingface/optimum/pull/484
    • Re-order the task manager doc by @michaelbenayoun in https://github.com/huggingface/optimum/pull/483
    • Fix whisper device for gpu test by @fxmarty in https://github.com/huggingface/optimum/pull/486
    • Fix tensorflow CI by @fxmarty in https://github.com/huggingface/optimum/pull/489
    • Fix PR doc generation by @regisss in https://github.com/huggingface/optimum/pull/495
    • Fix broken links in the doc by @fxmarty in https://github.com/huggingface/optimum/pull/499
    • Update iobinding ORT encoder whisper by @mht-sharma in https://github.com/huggingface/optimum/pull/498
    • fix NormalizedConfig init error message by @PaulQbFeng in https://github.com/huggingface/optimum/pull/500
    • Change import structure for ORTModel by @fxmarty in https://github.com/huggingface/optimum/pull/456
    • [BT] Fix failing CI tests by @younesbelkada in https://github.com/huggingface/optimum/pull/501
    • Remove redundant condition statement in ORTDecoder(Seq2seq) by @JingyaHuang in https://github.com/huggingface/optimum/pull/504
    • [BT] put decorator on the correct place by @younesbelkada in https://github.com/huggingface/optimum/pull/509
    • [BT] clearer error message for norm_first by @younesbelkada in https://github.com/huggingface/optimum/pull/510
    • Deprecate PyTorch 1.12. for BetterTransformer by @fxmarty in https://github.com/huggingface/optimum/pull/513
    • Fix ORTModelForSeq2SeqLM test by @fxmarty in https://github.com/huggingface/optimum/pull/455
    • Clearer error messages when initilizing the requested ONNX Runtime execution provider fails by @fxmarty in https://github.com/huggingface/optimum/pull/514
    • [BT] Fix doc bugs by @younesbelkada in https://github.com/huggingface/optimum/pull/517
    • Replace sklearn by scikit-learn by @lesteve in https://github.com/huggingface/optimum/pull/502
    • ORTModel uses optimum.exporters.onnx by @michaelbenayoun in https://github.com/huggingface/optimum/pull/490
    • Cleanup deprecated ONNX Runtime training docker files by @JingyaHuang in https://github.com/huggingface/optimum/pull/523
    • Added support for Tapas Model by @JuheonChu in https://github.com/huggingface/optimum/pull/520
    • Add benchmark results to gpu doc by @JingyaHuang in https://github.com/huggingface/optimum/pull/525
    • ORTModelForConditionalGeneration uses optimum.exporters.onnx by @mht-sharma in https://github.com/huggingface/optimum/pull/529
    • Better error message when wrong task is given to exporters by @fxmarty in https://github.com/huggingface/optimum/pull/531
    • Add OrtModelForSpeechSeq2Seq to doc by @fxmarty in https://github.com/huggingface/optimum/pull/533
    • Fold sections by default in the documentation's side-bar by @regisss in https://github.com/huggingface/optimum/pull/535
    • Import GenerationMixin from transformers.generation if transformers >= 4.25.0 by @regisss in https://github.com/huggingface/optimum/pull/536
    • Add check_if_transformers_greater to manage different versions of transformers by @regisss in https://github.com/huggingface/optimum/pull/537
    • Enable to push some sections to the end of the TOC in the doc by @regisss in https://github.com/huggingface/optimum/pull/532
    • Fix import in ONNX export CLI by @fxmarty in https://github.com/huggingface/optimum/pull/553
    • Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/550
    • Refactor of 2 functions used in ORTModel by @michaelbenayoun in https://github.com/huggingface/optimum/pull/551
    • Update readme by @echarlaix in https://github.com/huggingface/optimum/pull/556
    • Fix ORTTrainer wrapper duplication / PyTorch evaluate / update with transformers 4.25.1 by @JingyaHuang in https://github.com/huggingface/optimum/pull/561
    • Fix flaky BetterTransformer test by @fxmarty in https://github.com/huggingface/optimum/pull/564
    • enable FP16Optimizer for fp16 deepspeed training. by @AdamLouly in https://github.com/huggingface/optimum/pull/547
    • Update documentation quick tour section by @echarlaix in https://github.com/huggingface/optimum/pull/574
    • Move custom IOBinding to IOBindingHelper by @JingyaHuang in https://github.com/huggingface/optimum/pull/571
    • Add test for exporters.onnx CLI by @fxmarty in https://github.com/huggingface/optimum/pull/573
    • Documentation on quantization by @michaelbenayoun in https://github.com/huggingface/optimum/pull/565
    • More robust tests for ORTModel using decoders and use_cache=True by @fxmarty in https://github.com/huggingface/optimum/pull/576
    • Fix errors in onnxruntime modeling tests by @fxmarty in https://github.com/huggingface/optimum/pull/585
    • [BT] fix flaky test by @younesbelkada in https://github.com/huggingface/optimum/pull/591
    • Fix exporters onnx shapes by @fxmarty in https://github.com/huggingface/optimum/pull/581
    • Fix exporters.onnx tests by @fxmarty in https://github.com/huggingface/optimum/pull/584
    • Update on the ONNX Runtime documentation by @michaelbenayoun in https://github.com/huggingface/optimum/pull/567
    • Add the ORTModelForSemanticSegmentation class by @TheoMrc in https://github.com/huggingface/optimum/pull/539
    • Refactor BetterTransformer to be able to raise more informative error messages by @fxmarty in https://github.com/huggingface/optimum/pull/594
    • Constraint temprarily NumPy version to save CIs by @JingyaHuang in https://github.com/huggingface/optimum/pull/614
    • Add encoder_last_hidden_state as an output for encoder-decoder models by @fxmarty in https://github.com/huggingface/optimum/pull/601
    • Update dev version by @fxmarty in https://github.com/huggingface/optimum/pull/617
    • Fix documentation example by @echarlaix in https://github.com/huggingface/optimum/pull/603
    • Documentation improvements by @fxmarty in https://github.com/huggingface/optimum/pull/598
    • More informative message at ONNX export by @fxmarty in https://github.com/huggingface/optimum/pull/609
    • Use optimum exporter for current weight sharing test by @JingyaHuang in https://github.com/huggingface/optimum/pull/616
    • OnnxConfig now handle the export to encoder / decoder / decoder_with_past themselves by @michaelbenayoun in https://github.com/huggingface/optimum/pull/590
    • Set explictly the device index by @JingyaHuang in https://github.com/huggingface/optimum/pull/613
    • Fix ORT GPU test by @JingyaHuang in https://github.com/huggingface/optimum/pull/624
    • Add GPT-J normalized config by @fxmarty in https://github.com/huggingface/optimum/pull/623
    • Remove diffusers dependency in onnxruntime code by @fxmarty in https://github.com/huggingface/optimum/pull/619
    • Use exporters in ORTTrainer by @mht-sharma in https://github.com/huggingface/optimum/pull/546
    • Improve use_io_binding default value for different execution providers by @JingyaHuang in https://github.com/huggingface/optimum/pull/604
    • fixed FuseBiasInLinear by specifying device by @IlyasMoutawwakil in https://github.com/huggingface/optimum/pull/630
    • Fixed GPU documentation for HF pipelines by @smiraldr in https://github.com/huggingface/optimum/pull/602
    • Add argument in the CLI to specify device to do the ONNX export on by @fxmarty in https://github.com/huggingface/optimum/pull/634
    • Allow kwargs in all generate_dummy_inputs() methods by @fxmarty in https://github.com/huggingface/optimum/pull/638

    Full Changelog: https://github.com/huggingface/optimum/compare/v1.5.2...v1.6.0

    Significant community contributions

    The following contributors have made significant changes to the library over the last release:

    • @TheoMrc
      • Add ORTModelForSemanticSegmentation https://github.com/huggingface/optimum/pull/539
    • @ravenouse
      • Add MBart support for BetterTransformer https://github.com/huggingface/optimum/pull/516
    • @ka00ri
      • Add BetterTransformer support for ViLT architecture https://github.com/huggingface/optimum/pull/508
    • @Sumanth077
      • Add Bettertransformer support for FSMT https://github.com/huggingface/optimum/pull/494
    Source code(tar.gz)
    Source code(zip)
  • v1.5.2(Dec 19, 2022)

  • v1.5.1(Nov 24, 2022)

  • v1.5.0(Nov 17, 2022)

    BetterTransformer

    Convert your model into its PyTorch BetterTransformer format using a one liner with the new BetterTransformer integration for faster inference on CPU and GPU!

    from optimum.bettertransformer import BetterTransformer
    
    model = BetterTransformer.transform(model)
    

    Check the full list of supported models in the documentaiton, and check out the Google Colab demo.

    Contributions

    • BetterTransformer integration (#423)
    • ViT and Wav2Vec2 support (#470)

    ONNX Runtime IOBinding support

    ORT models (except for ORTModelForCustomTasks) now support IOBinding to avoid data copying overheads between the host and device. Significant inference speedup during the decoding process on GPU.

    By default, use_io_binding is set to True when using CUDA. You can turn off the IOBinding in case of any memory issue:

    from optimum.onnxruntime import ORTModelForSeq2SeqLM
    
    model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small", use_io_binding=False)
    

    Contributions

    • Add IOBinding support to ONNX Runtime module (#421)

    Optimum Exporters

    optimum.exporters is a new module that handles the export of PyTorch and TensorFlow models to several backends. Only ONNX is supported for now, and more than 50 architectures can already be exported, among which BERT, GPT-Neo, Bloom, T5, ViT, Whisper, CLIP.

    The export can be done via the CLI:

    python -m optimum.exporters.onnx --model openai/whisper-tiny.en whisper_onnx/
    

    For more information, check the documentation.

    Contributions

    • optimum.exporters creation (#403)
    • Automatic task detection (#445)

    Whisper

    • Whisper can be exported to ONNX using optimum.exporters.
    • Whisper can also be exported and ran using optimum.onnxruntime, IO binding is also supported.

    Note: For the now the export from optimum.exporters will not be usable by ORTModelForSpeechSeq2Seq. To be able to run inference, export Whisper directly using ORTModelForSpeechSeq2Seq. This will be solved in the next release.

    Contributions

    • Whisper support with optimum.onnxruntime and optimum.exporters (#420)

    Other contributions

    • ONNX Runtime training now supports ORT 1.13.1 and transformers 4.23.1 (#434)
    • ORTModel can load models from subfolders in a similar fashion as in transformers (#443)
    • ORTOptimizer has been refactored, and a factory class has been added to create common OptimizationConfigs (#457)
    • Fixes and updates in the documentation (#411, #432, #437, #441)
    • Fixes IOBinding (#454, #461)
    Source code(tar.gz)
    Source code(zip)
  • v1.4.1(Oct 26, 2022)

    • Add inference with ORTModel to ORTTrainer and ORTSeq2SeqTrainer #189
    • Add InferenceSession options and provider to ORTModel #271
    • Add mT5 (#341) and Marian (#393) support to ORTOptimizer
    • Add batchnorm folding torch.fx transformations #348
    • The torch.fx transformations now use the marking methods mark_as_transformed, mark_as_restored, get_transformed_nodes #385
    • Update BaseConfig for transformers 4.22.0 release #386
    • Update ORTTrainer for transformers 4.22.1 release #388
    • Add extra ONNX Runtime quantization options #398
    • Add possibility to pass provider_options to ORTModel #401
    • Add support to pass a specific device for ORTModel, as transformers does for pipelines #427
    • Fixes to support onnxruntime 1.13.1 #430
    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(Sep 8, 2022)

    ONNX Runtime

    • Refactorization of ORTQuantizer (#270) and ORTOptimizer (#294)
    • Add ONNX Runtime fused Adam Optimizer (#295)
    • Add ORTModelForCustomTasks allowing ONNX Runtime inference support for custom tasks (#303)
    • Add ORTModelForMultipleChoice allowing ONNX Runtime inference for models with multiple choice classification head (#358)

    Torch FX

    • Add FuseBiasInLinear a transformation that fuses the weight and the bias of linear modules (#253)

    Improvements and bugfixes

    • Enable the possibility to disregard the precomputed past_key_values during ONNX Runtime inference of Seq2Seq models (#241)
    • Enable node exclusion from quantization for benchmark suite (#284)
    • Enable possibility to use a token authentication when loading a calibration dataset (#289)
    • Fix optimum pipeline when no model is given (#301)
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Jul 12, 2022)

    Torch FX

    The optimum.fx.optimization module (#232) provides a set of torch.fx graph transformations, along with classes and functions to write your own transformations and compose them.

    • The Transformation and ReversibleTransformation represent non-reversible and reversible transformations, and it is possible to write such transformations by inheriting from those classes
    • The compose utility function enables transformation composition
    • Two reversible transformations were added:
      • MergeLinears: merges linear layers that have the same input
      • ChangeTrueDivToMulByInverse: changes a division by a static value to a multiplication of its inverse

    ORTModelForSeq2SeqLM

    ORTModelForSeq2SeqLM (#199) allows ONNX export and ONNX Runtime inference for Seq2Seq models.

    • When exported, Seq2Seq models are decomposed into three parts : the encoder, the decoder (actually consisting of the decoder with the language modeling head), and the decoder with pre-computed key/values as additional inputs.
    • This specific export comes from the fact that during the first pass, the decoder has no pre-computed key/values hidden-states, while during the rest of the generation past key/values will be used to speed up sequential decoding.

    Below is an example that downloads a T5 model from the Hugging Face Hub, exports it through the ONNX format and saves it :

    from optimum.onnxruntime import ORTModelForSeq2SeqLM
    
    # Load model from hub and export it through the ONNX format 
    model = ORTModelForSeq2SeqLM.from_pretrained("t5-small",  from_transformers=True)
    
    # Save the exported model in the given directory
    model.save_pretrained(output_dir)
    

    ORTModelForImageClassification

    ORTModelForImageClassification (#226) allows ONNX Runtime inference for models with an image classification head.

    Below is an example that downloads a ViT model from the Hugging Face Hub, exports it through the ONNX format and saves it :

    from optimum.onnxruntime import ORTModelForImageClassification
    
    # Load model from hub and export it through the ONNX format 
    model = ORTModelForImageClassification.from_pretrained("google/vit-base-patch16-224",  from_transformers=True)
    
    # Save the exported model in the given directory
    model.save_pretrained(output_dir)
    

    ORTOptimizer

    Adds support for converting model weights from fp32 to fp16 by adding a new optimization parameter (fp16) to OptimizationConfig (#273).

    Pipelines

    Additional pipelines tasks are now supported, here is a list of the supported tasks along with the default model for each:

    Below is an example that downloads a T5 small model from the Hub and loads it with transformers pipeline for translation :

    from transformers import AutoTokenizer, pipeline
    from optimum.onnxruntime import ORTModelForSeq2SeqLM
    
    tokenizer = AutoTokenizer.from_pretrained("optimum/t5-small")
    model = ORTModelForSeq2SeqLM.from_pretrained("optimum/t5-small")
    onnx_translation = pipeline("translation_en_to_fr", model=model, tokenizer=tokenizer)
    
    text = "What a beautiful day !"
    pred = onnx_translation(text)
    # [{'translation_text': "C'est une belle journée !"}]
    

    Breaking change

    The ORTModelForXXX execution provider default value is now set to CPUExecutionProvider (#203). Before, if no execution provider was provided, it was set to CUDAExecutionProvider if a gpu was detected, or to CPUExecutionProvider otherwise.

    Source code(tar.gz)
    Source code(zip)
  • v1.2.3(Jun 15, 2022)

  • v1.2.2(Jun 2, 2022)

    • Extend QuantizationPreprocessor to dynamic quantization (https://github.com/huggingface/optimum/pull/196)
    • Introduce unified approach to create transformers vs optimized models benchmark (https://github.com/huggingface/optimum/pull/194)
    • Bump huggingface_hub version and protobuf fix (https://github.com/huggingface/optimum/pull/205)
    Source code(tar.gz)
    Source code(zip)
  • v1.2.1(May 13, 2022)

  • v1.2.0(May 10, 2022)

    ORTModel

    ORTModelForXXX classes such as ORTModelForSequenceClassification were integrated with the Hugging Face Hub in order to easily export models through the ONNX format, load ONNX models, as well as easily save the resulting model and push it to the 🤗 Hub by using respectively the save_pretrained and push_to_hub methods. An already optimized and / or quantized ONNX model can also be loaded using the ORTModelForXXX classes using the from_pretrained method.

    Below is an example that downloads a DistilBERT model from the Hub, exports it through the ONNX format and saves it :

    from optimum.onnxruntime import ORTModelForSequenceClassification
    
    # Load model from hub and export it through the ONNX format 
    model = ORTModelForSequenceClassification.from_pretrained(
        "distilbert-base-uncased-finetuned-sst-2-english", 
        from_transformers=True
    )
    
    # Save the exported model
    model.save_pretrained("a_local_path_for_convert_onnx_model")
    

    Pipelines

    Built-in support for transformers pipelines was added. This allows us to leverage the same API used from Transformers, with the power of accelerated runtimes such as ONNX Runtime.

    The currently supported tasks with the default model for each are the following :

    • Text Classification (DistilBERT model fine-tuned on SST-2)
    • Question Answering (DistilBERT model fine-tuned on SQuAD v1.1)
    • Token Classification(BERT large fine-tuned on CoNLL2003)
    • Feature Extraction (DistilBERT)
    • Zero Shot Classification (BART model fine-tuned on MNLI)
    • Text Generation (DistilGPT2)

    Below is an example that downloads a RoBERTa model from the Hub, exports it through the ONNX format and loads it with transformers pipeline for question-answering.

    from transformers import AutoTokenizer, pipeline
    from optimum.onnxruntime import ORTModelForQuestionAnswering
    
    # load vanilla transformers and convert to onnx
    model = ORTModelForQuestionAnswering.from_pretrained("deepset/roberta-base-squad2",from_transformers=True)
    tokenizer = AutoTokenizer.from_pretrained("deepset/roberta-base-squad2")
    
    # test the model with using transformers pipeline, with handle_impossible_answer for squad_v2 
    optimum_qa = pipeline(task, model=model, tokenizer=tokenizer, handle_impossible_answer=True)
    prediction = optimum_qa(
      question="What's my name?", context="My name is Philipp and I live in Nuremberg."
    )
    
    print(prediction)
    # {'score': 0.9041663408279419, 'start': 11, 'end': 18, 'answer': 'Philipp'}
    

    Improvements

    • Add loss when performing the evalutation step using an instance of ORTTrainer, previously not enabled when inference was performed with ONNX Runtime in #152
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Apr 26, 2022)

    Habana

    ONNX Runtime

    • Add the possibility to specify the execution provider in ORTModel.
    • Add IncludeFullyConnectedNodes class to find the nodes composing the fully connected layers in order to (only) target the latter for quantization to limit the accuracy drop.
    • Update QuantizationPreprocessor so that the intersection of the two sets representing the nodes to quantize and the nodes to exclude from quantization to be an empty set.
    • Rename Seq2SeqORTTrainer to ORTSeq2SeqTrainer for clarity and to keep consistency.
    • Add ORTOptimizer support for ELECTRA models.
    • Fix the loading of pretrained ORTConfig which contains optimization and quantization config.
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Apr 1, 2022)

    ORTTrainer and Seq2SeqORTTrainer

    The ORTTrainer and Seq2SeqORTTrainer are two newly experimental classes.

    • Both ORTTrainer and Seq2SeqORTTrainer were created to have a similar user-facing API as the Trainer and Seq2SeqTrainer of the Transformers library.
    • ORTTrainer allows the usage of the ONNX Runtime backend to train a given PyTorch model in order to accelerate training. ONNX Runtime will run the forward and backward passes using an optimized automatically-exported ONNX computation graph, while the rest of the training loop is executed by native PyTorch.
    • ORTTrainer allows the usage of ONNX Runtime inferencing during both the evaluation and the prediction step.
    • For Seq2SeqORTTrainer, ONNX Runtime inferencing is incompatible with --predict_with_generate, as the generate method is not supported yet.

    ONNX Runtime optimization and quantization APIs improvements

    The ORTQuantizer and ORTOptimizer classes underwent a massive refactoring that should allow a simpler and more flexible user-facing API.

    • Addition of the possibility to iteratively compute the quantization activation ranges when applying static quantization by using the ORTQuantizer method partial_fit. This is especially useful when using memory-hungry calibration methods such as Entropy and Percentile methods.
    • When using the MinMax calibration method, it is now possible to compute the moving average of the minimum and maximum values representing the activations quantization ranges instead of the global minimum and maximum (feature available with onnxruntime v1.11.0 or higher).
    • The classes OptimizationConfig, QuantizationConfig and CalibrationConfig were added in order to better segment the different ONNX Runtime related parameters instead of having one unique configuration ORTConfig.
    • The QuantizationPreprocessor class was added in order to find the nodes to include and / or exclude from quantization, by finding the nodes following a given pattern (such as the nodes forming LayerNorm for example). This is particularly useful in the context of static quantization, where the quantization of modules such as LayerNorm or GELU are responsible of important drop in accuracy.
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Feb 24, 2022)

    ONNX Runtime support

    • An ORTConfig class was introduced, allowing the user to define the desired export, optimization and quantization strategies.
    • The ORTOptimizer class takes care of the model's ONNX export as well as the graph optimization provided by ONNX Runtime. In order to create an instance of ORTOptimizer, the user needs to provide an ORTConfig object, defining the export and graph-level transformations informations. Then optimization can be perfomed by calling the ORTOptimizer.fit method.
    • ONNX Runtime static and dynamic quantization can also be applied on a model by using the newly added ORTQuantizer class. In order to create an instance of ORTQuantizer, the user needs to provide an ORTConfig object, defining the export and quantization informations, such as the quantization approach to use or the activations and weights data types. Then quantization can be applied by calling the ORTQuantizer.fit method.

    Additionnal features for Intel Neural Compressor

    We have also added a new class called IncOptimizer which will take care of combining the pruning and the quantization processes.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Feb 2, 2022)

    With this release, we enable Intel Neural Compressor v1.8 magnitude pruning for a variety of NLP tasks with the introduction of IncTrainer which handles the pruning process.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Nov 10, 2021)

    With this release, we enable Intel Neural Compressor v1.7 PyTorch dynamic, post-training and aware-training quantization for a variety of NLP tasks. This support includes the overall process, from quantization application to the loading of the resulting quantized model. The latter being enabled by the introduction of the IncQuantizedModel class.

    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Sep 14, 2021)

Owner
Hugging Face
The AI community building the future.
Hugging Face
PyTorch common framework to accelerate network implementation, training and validation

pytorch-framework PyTorch common framework to accelerate network implementation, training and validation. This framework is inspired by works from MML

Dongliang Cao 3 Dec 19, 2022
Accelerate Neural Net Training by Progressively Freezing Layers

FreezeOut A simple technique to accelerate neural net training by progressively freezing layers. This repository contains code for the extended abstra

Andy Brock 203 Jun 19, 2022
A machine learning library for spiking neural networks. Supports training with both torch and jax pipelines, and deployment to neuromorphic hardware.

Rockpool Rockpool is a Python package for developing signal processing applications with spiking neural networks. Rockpool allows you to build network

SynSense 21 Dec 14, 2022
DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Microsoft 8.4k Jan 1, 2023
FuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space OptimizationFuseDream: Training-Free Text-to-Image Generationwith Improved CLIP+GAN Space Optimization

FuseDream This repo contains code for our paper (paper link): FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimizat

XCL 191 Dec 31, 2022
Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

null 82 Nov 29, 2022
(CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic

ClassSR (CVPR2021) ClassSR: A General Framework to Accelerate Super-Resolution Networks by Data Characteristic Paper Authors: Xiangtao Kong, Hengyuan

Xiangtao Kong 308 Jan 5, 2023
GT4SD, an open-source library to accelerate hypothesis generation in the scientific discovery process.

The GT4SD (Generative Toolkit for Scientific Discovery) is an open-source platform to accelerate hypothesis generation in the scientific discovery process. It provides a library for making state-of-the-art generative AI models easier to use.

Generative Toolkit 4 Scientific Discovery 142 Dec 24, 2022
Monocular 3D pose estimation. OpenVINO. CPU inference or iGPU (OpenCL) inference.

human-pose-estimation-3d-python-cpp RealSenseD435 (RGB) 480x640 + CPU Corei9 45 FPS (Depth is not used) 1. Run 1-1. RealSenseD435 (RGB) 480x640 + CPU

Katsuya Hyodo 8 Oct 3, 2022
Data-depth-inference - Data depth inference with python

Welcome! This readme will guide you through the use of the code in this reposito

Marco 3 Feb 8, 2022
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

This repository holds NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pytorch. Some of the code here will be included in upstream Pytorch eventually. The intention of Apex is to make up-to-date utilities available to users as quickly as possible.

NVIDIA Corporation 6.9k Jan 3, 2023
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Introduction This is a Python package available on PyPI for NVIDIA-maintained utilities to streamline mixed precision and distributed training in Pyto

Artit 'Art' Wangperawong 5 Sep 29, 2021
A Lighting Pytorch Framework for Recommendation System, Easy-to-use and Easy-to-extend.

Torch-RecHub A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend. 安装 pip install torch-rechub 主要特性 scikit-learn风格易用

Mincai Lai 67 Jan 4, 2023
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

郭飞 3.7k Jan 3, 2023
library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Steven G. Johnson 1.4k Dec 25, 2022
Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

Parsa Dahesh 6 Dec 14, 2022
this is a lite easy to use virtual keyboard project for anyone to use

virtual_Keyboard this is a lite easy to use virtual keyboard project for anyone to use motivation I made this for this year's recruitment for RobEn AA

Mohamed Emad 3 Oct 23, 2021
A collection of easy-to-use, ready-to-use, interesting deep neural network models

Interesting and reproducible research works should be conserved. This repository wraps a collection of deep neural network models into a simple and un

Aria Ghora Prabono 16 Jun 16, 2022
Experimental Python implementation of OpenVINO Inference Engine (very slow, limited functionality). All codes are written in Python. Easy to read and modify.

PyOpenVINO - An Experimental Python Implementation of OpenVINO Inference Engine (minimum-set) Description The PyOpenVINO is a spin-off product from my

Yasunori Shimura 7 Oct 31, 2022