ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

Last update: Dec 24, 2022

Related tags

Pytorch Utilities ort

Overview

Accelerate PyTorch models with ONNX Runtime

ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

It is available via the torch-ort python package.

This repository contains the source code for the package, as well as instructions for running the package.

Pre-requisites

You need a machine with at least one NVIDIA or AMD GPU to run ONNX Runtime for PyTorch.

You can install and run torch-ort in your local environment, or with Docker.

Install in a local Python environment

Default dependencies

By default, torch-ort depends on PyTorch 1.9.0, ONNX Runtime 1.8.1 and CUDA 10.2.

Install CUDA 10.2
Install CuDNN 7.6
Install torch-ort
- pip install torch-ort
Run post-installation script for ORTModule
- python -m torch_ort.configure

Get install instructions for other combinations in the Get Started Easily section at https://www.onnxruntime.ai/ under the Optimize Training tab.

Test your installation

Clone this repo
- git clone [email protected]:pytorch/ort.git
Install extra dependencies
- pip install wget pandas sklearn transformers
Run the training script
- python ./ort/tests/bert_for_sequence_classification.py

Add ONNX Runtime for PyTorch to your PyTorch training script

from torch_ort import ORTModule
model = ORTModule(model)

# PyTorch training script follows

Samples

To see torch-ort in action, see https://github.com/microsoft/onnxruntime-training-examples, which shows you how to train the most popular HuggingFace models.

License

This project has an MIT license, as found in the LICENSE file.

Comments

Running ORTModule with other EPs from ORT

I am building a new wheel with the OneDNN EP using Onnx runtime training. After that is installed, I install torch_ort and then run the configure, but it does not seem to work ( I get the same error asking me to run the configure again). From the instructions, I see that there is no recipe for this combination. Is this possible or is there any other way for me to build a custom wheel and use it to train bert model with OneDNN and ORT?

opened by chethanpk 11

ONNXRuntimeError after enabled fp16 mixed precision training

Hi folks,

I tested fp16 mixed precision training with ORTModule wrapped GPT2 model on a fine-tuning task. However, after enabling fp16, I encountered the following error:

Error Message

Traceback (most recent call last):
  File "test_onnxruntime_train.py", line 115, in test_ort_trainer
    train_result = trainer.train()
  File "/workspace/optimum/onnxruntime/trainer.py", line 498, in train
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/trainer.py", line 1984, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.6/dist-packages/transformers/trainer.py", line 2016, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/ortmodule.py", line 81, in _forward
    return self._torch_module.forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_torch_module_ort.py", line 32, in _forward
    return self._execution_manager(self.is_training()).forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 265, in forward
    override_policy=_FallbackPolicy.FALLBACK_FORCE_TORCH_FORWARD)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_fallback.py", line 194, in handle_exception
    raise exception
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 85, in forward
    self._initialize_graph_builder(training=True)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 420, in _initialize_graph_builder
    self._onnx_models.exported_model.SerializeToString(), grad_builder_config)
RuntimeError: /onnxruntime_src/orttraining/orttraining/python/orttraining_pybind_state.cc:707 onnxruntime::python::addObjectMethodsForTraining(pybind11::module&, onnxruntime::python::ExecutionProviderRegistrationFn)::<lambda(onnxruntime::training::OrtModuleGraphBuilder*, const pybind11::bytes&, const onnxruntime::training::OrtModuleGraphBuilderConfiguration&)> [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) of Optype (Where) bound to different types (tensor(float) and tensor(float16) in node (Where_183).

It seems that the exported ONNX graph is broken due to incompatible input types. I am wondering where comes the problem. Do any insight on that?

System information

Docker image built with the Dockerfile-cu11 in onnxruntime-training-examples.

OS: Ubuntu 18.04
CUDA/cuDNN version: 11/8
onnxruntime-training: 1.9.0+cu111
torch: 1.9.0+cu111
torch-ort: 1.9.0
Python version:3.6
GPU: A100

Additional Information

I actually have a work version under the environment: torch 1.8.1+torch-ort 1.9.0+onnxruntime-training1.11.0.dev20220113001+cu102, so I wonder if the error comes from the fact that what's in the Dockerfile are outdated. However, I can't find how to install onnxruntime-training1.11.0.dev20220113001+cu102 anymore.
Here is the onnx graph exported with DebugOptions, not sure if that could help

opened by JingyaHuang 8

RecursionError: maximum recursion depth exceeded in comparison

I use ort like this:

...
model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
model = ORTModule(model)
model = nn.parallel.DistributedDataParallel(model, find_unused_parameters=True, device_ids=[device])
...

But found error:

Traceback (most recent call last):
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
    fn(i, *args)
  File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/hdlt/engine/ddp_trainer.py", line 156, in _main_func
    main_func(local_rank, *args)
  File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/tools/train.py", line 163, in train_entrance
    trainer.fit()
  File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/tools/trainer_wrapper.py", line 225, in fit
    self._trainer.fit()
  File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/hdlt/engine/trainer.py", line 298, in fit
    profiler=self.profiler,
  File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/hdlt/engine/processors/processor.py", line 265, in __call__
    model_outs = model(*_as_list(batch_i))
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/ortmodule.py", line 41, in _forward
    return self._execution_manager(self._is_training()).forward(*inputs, **kwargs)
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_training_manager.py", line 67, in forward
    build_gradient_graph = self._export_model(*inputs, **kwargs)
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 206, in _export_model
    schema = _io._extract_schema({'args': copy.copy(inputs), 'kwargs': copy.copy(kwargs)})
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 300, in _extract_schema
    data[key] = _extract_schema(data[key])
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 291, in _extract_schema
    data[idx] = _extract_schema(data[idx])
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 291, in _extract_schema
    data[idx] = _extract_schema(data[idx])
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 291, in _extract_schema
    data[idx] = _extract_schema(data[idx])
  [Previous line repeated 949 more times]
  File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 287, in _extract_schema
    if isinstance(data, abc.Sequence):
  File "/home/users/min.du/venvs/pytorch1.8/lib64/python3.6/abc.py", line 184, in __instancecheck__
    if subclass in cls._abc_cache:
  File "/home/users/min.du/venvs/pytorch1.8/lib64/python3.6/_weakrefset.py", line 75, in __contains__
    return wr in self.data
RecursionError: maximum recursion depth exceeded in comparison

Any suggestion?

opened by DuinoDu 7

Refactor torch_ort and ort_moe into their respective directories

Moving torch_ort and ort_moe code including setup.*, tests, docker, docs into respective directory - moving files, update documentations and directory only, no functional code change
CLA Signed module: rocm

opened by jingyanwangms 6

MaxPool op resolved as Aten OP

I created a small model training script, to test out maxpool gradient op for OneDNN EP, and the model definition below was used. But for some reason, the maxpool was resolved to Aten Op in the onnx graph. Is there a way to force torch-ort to use maxpool instead aten op? (maybe by disabling use of aten ops?)

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
        self.maxpool1 = nn.MaxPool2d(2)
        self.maxpool2 = nn.MaxPool2d(2)
        #self.conv2_drop = nn.Dropout2d()
        self.fc1 = nn.Linear(320, 50)
        self.fc2 = nn.Linear(50, 10)

    def forward(self, x):
        x = F.relu(self.maxpool1(self.conv1(x)))
        x = F.relu(self.maxpool2(self.conv2(x)))
        x = x.view(-1, 320)
        x = F.relu(self.fc1(x))
        x = F.dropout(x, training=self.training)
        x = self.fc2(x)
        return F.log_softmax(x)

opened by chethanpk 6

Lack of speed improvement when using custom GPT model with ORT

hey guys! In my investigation to try figure out why there is a speed regression for #56, I created a simple minimal script to benchmark ORT vs no ORT.

With the script I'm seeing basically the same time between ORT and no ORT. Any ideas on what is causing performance issue? I'm also seeing a few warnings which I've included below!

No ORT Time taken: 85.2842013835907 seconds ORT Time taken 85.33545899391174 seconds

Warnings:

/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_logger.py:52: UserWarning: There were one or more warnings or errors raised while exporting the PyTorch model. Please enable INFO level logging to view all warnings and errors.
  "model. Please enable INFO level logging to view all warnings and errors.", UserWarning)
Warning: Unsupported operator ATenOp. No schema registered for this operator.
Warning: Unsupported operator ATenOp. No schema registered for this operator.

script:

import math
import os
import time

import numpy as np
import torch
import torch.nn as nn
from torch.cuda.amp import autocast
from torch.nn import functional as F
from torch.utils.data import Dataset, DataLoader
from tqdm import tqdm
from torch_ort import ORTModule


class GPTConfig:
    """ base GPT config, params common to all GPT versions """
    embd_pdrop = 0.1
    resid_pdrop = 0.1
    attn_pdrop = 0.1

    def __init__(self, vocab_size, block_size, **kwargs):
        self.vocab_size = vocab_size
        self.block_size = block_size
        for k, v in kwargs.items():
            setattr(self, k, v)


class CausalSelfAttention(nn.Module):
    """
    A vanilla multi-head masked self-attention layer with a projection at the end.
    I believe I could have just used torch.nn.MultiheadAttention but their documentation
    is all but absent and code ugly so I don't trust it, rolling my own here.
    """

    def __init__(self, config):
        super().__init__()
        assert config.n_embd % config.n_head == 0
        # key, query, value projections for all heads
        self.key = nn.Linear(config.n_embd, config.n_embd)
        self.query = nn.Linear(config.n_embd, config.n_embd)
        self.value = nn.Linear(config.n_embd, config.n_embd)
        # regularization
        self.attn_drop = nn.Dropout(config.attn_pdrop)
        self.resid_drop = nn.Dropout(config.resid_pdrop)
        # output projection
        self.proj = nn.Linear(config.n_embd, config.n_embd)
        # causal mask to ensure that attention is only applied to the left in the input sequence
        self.register_buffer("mask", torch.tril(torch.ones(config.block_size, config.block_size))
                             .view(1, 1, config.block_size, config.block_size))
        self.n_head = config.n_head

    def forward(self, x, layer_past=None):
        B, T, C = x.size()

        # calculate query, key, values for all heads in batch and move head forward to be the batch dim
        k = self.key(x).view(B, T, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
        q = self.query(x).view(B, T, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
        v = self.value(x).view(B, T, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)

        # causal self-attention; Self-attend: (B, nh, T, hs) x (B, nh, hs, T) -> (B, nh, T, T)
        att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))
        att = att.masked_fill(self.mask[:, :, :T, :T] == 0, float('-inf'))
        att = F.softmax(att, dim=-1)
        att = self.attn_drop(att)
        y = att @ v  # (B, nh, T, T) x (B, nh, T, hs) -> (B, nh, T, hs)
        y = y.transpose(1, 2).contiguous().view(B, T, C)  # re-assemble all head outputs side by side

        # output projection
        y = self.resid_drop(self.proj(y))
        return y


class Block(nn.Module):
    """ an unassuming Transformer block """

    def __init__(self, config):
        super().__init__()
        self.ln1 = nn.LayerNorm(config.n_embd)
        self.ln2 = nn.LayerNorm(config.n_embd)
        self.attn = CausalSelfAttention(config)
        self.mlp = nn.Sequential(
            nn.Linear(config.n_embd, 4 * config.n_embd),
            nn.GELU(),
            nn.Linear(4 * config.n_embd, config.n_embd),
            nn.Dropout(config.resid_pdrop),
        )

    def forward(self, x):
        x = x + self.attn(self.ln1(x))
        x = x + self.mlp(self.ln2(x))
        return x


class GPT(torch.nn.Module):
    def __init__(self, vocab_size, n_embd, block_size, embd_pdrop, n_layer, config):
        # input embedding stem
        super().__init__()
        self.tok_emb = nn.Embedding(vocab_size, n_embd)
        self.pos_emb = nn.Parameter(torch.zeros(1, block_size, n_embd))
        self.drop = nn.Dropout(embd_pdrop)
        self.config = config

        # decoder head
        self.ln_f = nn.LayerNorm(n_embd)
        self.head = nn.Linear(n_embd, vocab_size, bias=False)

        self.block_size = block_size

        blocks = []
        for x in range(n_layer):
            layer = Block(self.config)
            blocks.append(layer)
        self.blocks = nn.Sequential(*blocks)

    def forward(self, idx):
        b, t = idx.size()
        assert t <= self.block_size, "Cannot forward, model block size is exhausted."

        # forward the GPT model
        token_embeddings = self.tok_emb(idx)  # each index maps to a (learnable) vector
        position_embeddings = self.pos_emb[:, :t, :]  # each position maps to a (learnable) vector
        x = self.drop(token_embeddings + position_embeddings)
        x = self.blocks(x)
        x = self.ln_f(x)
        logits = self.head(x)
        return logits


class CharDataset(Dataset):

    def __init__(self, data, block_size):
        chars = list(set(data))
        data_size, vocab_size = len(data), len(chars)

        self.stoi = {ch: i for i, ch in enumerate(chars)}
        self.itos = {i: ch for i, ch in enumerate(chars)}
        self.block_size = block_size
        self.vocab_size = vocab_size
        self.data = data

    def __len__(self):
        return math.ceil(len(self.data) / (self.block_size + 1))

    def __getitem__(self, idx):
        # we're actually going to "cheat" and pick a spot in the dataset at random
        i = np.random.randint(0, len(self.data) - (self.block_size + 1))
        chunk = self.data[i:i + self.block_size + 1]
        dix = [self.stoi[s] for s in chunk]
        x = torch.tensor(dix[:-1], dtype=torch.long)
        y = torch.tensor(dix[1:], dtype=torch.long)
        return x, y


if __name__ == '__main__':
    n_embd = 2048
    block_size = 128
    n_layer = 6
    batch_size = 8
    num_workers = 0
    n_head = 16
    n_warmup = 20
    enable_ort = True

    device = torch.device("cuda:0")

    if not os.path.exists("input.txt"):
        os.system("wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt")

    file = 'input.txt'
    text = open(file, 'r').read()
    train_dataset = CharDataset(text, block_size)  # one line of poem is roughly 50 characters
    train_loader = DataLoader(train_dataset, batch_size=batch_size, num_workers=num_workers)
    vocab_size = train_dataset.vocab_size

    model = GPT(
        vocab_size=vocab_size,
        n_embd=n_embd,
        embd_pdrop=0.1,
        block_size=block_size,
        n_layer=n_layer,
        config=GPTConfig(
            vocab_size=vocab_size,
            block_size=block_size,
            n_layer=n_layer,
            n_head=n_head,
            n_embd=n_embd,
        )
    )
    if enable_ort:
        model = ORTModule(model)

    model.to(device)
    optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)

    torch.cuda.synchronize()
    # warmup before measuring
    for x, (idx, targets) in tqdm(enumerate(train_loader), total=len(train_loader)):
        if x == n_warmup:
            break
        idx = idx.to(device)
        targets = targets.to(device)
        with autocast():
            logits = model(idx)
            loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1))

    torch.cuda.synchronize()
    start_time = time.time()

    for idx, targets in tqdm(train_loader, total=len(train_loader)):
        idx = idx.to(device)
        targets = targets.to(device)
        with autocast():
            logits = model(idx)
            loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1))
        loss.backward()
        optimizer.step()

    torch.cuda.synchronize()
    print("Time taken", time.time() - start_time)

cc @natke @ashbhandare

opened by SeanNaren 6

PyTorch Lightning Integration

Hey guys!

Really epic work in this repo! I'm currently working on integrating this into Lightning (any assistance would be appreciated). From what I see the ORTModule just wraps the forward function, converting it into ONNX format? As a result I've internally in Lightning wrapped the model to ensure that user defined functions (training_step validation_step test_step) are placed in a wrapped modules' forward function.

Currently I'm running into an error:

/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_io.py:473: UserWarning: This model cannot be deep copied (or pickled), which is a required step for stateful models to be properly exported to ONNX. Compute will continue, but unexpected results may occur!
  warnings.warn("This model cannot be deep copied (or pickled), "
2021-07-19 13:20:57.590381944 [E:onnxruntime:, inference_session.cc:1341 operator()] Exception during initialization: /onnxruntime_src/onnxruntime/core/framework/session_state_utils.cc:143 onnxruntime::common::Status onnxruntime::session_state_utils::SaveInitializedTensors(const onnxruntime::Env&, const std::basic_string<char>&, const onnxruntime::GraphViewer&, const AllocatorPtr&, const onnxruntime::OrtValueNameIdxMap&, const std::vector<int>&, onnxruntime::ITensorAllocator&, const std::function<onnxruntime::common::Status(int, const OrtValue&, const onnxruntime::OrtCallback&, bool)>&, const onnxruntime::logging::Logger&, const onnxruntime::DataTransferManager&, const onnxruntime::ExecutionPlanBase&, const onnxruntime::SessionOptions&) ort_value_name_idx_map.MaxIdx() > -1 was false. OrtValue indexes should have been populated.

Traceback (most recent call last):
  File "reproduce_test.py", line 99, in <module>
    run()
  File "reproduce_test.py", line 94, in run
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
  File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 515, in fit
    self._run(model)
  File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 896, in _run
    self._dispatch()
  File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 963, in _dispatch
    self.accelerator.start_training(self)
  File "/data/pytorch-lightning/pytorch_lightning/accelerators/accelerator.py", line 97, in start_training
    self.training_type_plugin.start_training(trainer)
  File "/data/pytorch-lightning/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
    self._results = trainer.run_stage()
  File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 973, in run_stage
    return self._run_train()
  File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1008, in _run_train
    self._run_sanity_check(self.lightning_module)
  File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1084, in _run_sanity_check
    self._evaluation_loop.run()
  File "/data/pytorch-lightning/pytorch_lightning/loops/base.py", line 112, in run
    self.advance(*args, **kwargs)
  File "/data/pytorch-lightning/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 122, in advance
    self.num_dataloaders,
  File "/data/pytorch-lightning/pytorch_lightning/loops/base.py", line 112, in run
    self.advance(*args, **kwargs)
  File "/data/pytorch-lightning/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
    output = self.evaluation_step(batch, batch_idx, dataloader_idx)
  File "/data/pytorch-lightning/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 162, in evaluation_step
    output = self.trainer.accelerator.validation_step(step_kwargs)
  File "/data/pytorch-lightning/pytorch_lightning/accelerators/accelerator.py", line 220, in validation_step
    return self.training_type_plugin.validation_step(*step_kwargs.values())
  File "reproduce_test.py", line 74, in validation_step
    return self.model(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/ortmodule.py", line 41, in _forward
    return self._execution_manager(self._is_training()).forward(*inputs, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_inference_manager.py", line 86, in forward
    self._create_execution_agent()
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_inference_manager.py", line 115, in _create_execution_agent
    session_options, providers, provider_options)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_execution_agent.py", line 52, in __init__
    self.create_inference_agent(path_or_bytes, session_options, providers, provider_options)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_execution_agent.py", line 56, in create_inference_agent
    providers, provider_options)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
    self._create_inference_session(providers, provider_options, disabled_optimizers)
  File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 321, in _create_inference_session
    sess.initialize_session(providers, provider_options, disabled_optimizers)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/framework/session_state_utils.cc:143 onnxruntime::common::Status onnxruntime::session_state_utils::SaveInitializedTensors(const onnxruntime::Env&, const std::basic_string<char>&, const onnxruntime::GraphViewer&, const AllocatorPtr&, const onnxruntime::OrtValueNameIdxMap&, const std::vector<int>&, onnxruntime::ITensorAllocator&, const std::function<onnxruntime::common::Status(int, const OrtValue&, const onnxruntime::OrtCallback&, bool)>&, const onnxruntime::logging::Logger&, const onnxruntime::DataTransferManager&, const onnxruntime::ExecutionPlanBase&, const onnxruntime::SessionOptions&) ort_value_name_idx_map.MaxIdx() > -1 was false. OrtValue indexes should have been populated.

With the script (requires you to install pytorch lightning, pip install pytorch-lightning):

import os
import pickle

import torch
from torch.utils.data import DataLoader, Dataset
from torch_ort import ORTModule

from pytorch_lightning import LightningModule, Trainer
from pytorch_lightning.overrides import LightningDistributedModule
from pytorch_lightning.plugins import SingleDevicePlugin


class RandomDataset(Dataset):

    def __init__(self, size, length):
        self.len = length
        self.data = torch.randn(length, size)

    def __getitem__(self, index):
        return self.data[index]

    def __len__(self):
        return self.len


class BoringModel(LightningModule):

    def __init__(self):
        super().__init__()
        self.layer = torch.nn.Linear(32, 2)

    def forward(self, x):
        return self.layer(x)

    def training_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("train_loss", loss)
        return {"loss": loss}

    def validation_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("valid_loss", loss)

    def test_step(self, batch, batch_idx):
        loss = self(batch).sum()
        self.log("test_loss", loss)

    def configure_optimizers(self):
        return torch.optim.SGD(self.layer.parameters(), lr=0.1)


def unwrap_lightning_module(wrapped_model) -> 'pl.LightningModule':
    model = wrapped_model
    if isinstance(model, LightningDistributedModule):
        model = unwrap_lightning_module(model.module)
    if isinstance(model, ORTModule):
        model = unwrap_lightning_module(model._module_metadata.original_module)
    return model


class ORTPlugin(SingleDevicePlugin):
    def setup(self, model: torch.nn.Module) -> torch.nn.Module:
        pickle.dumps(model)
        import pdb;pdb.set_trace()

        self.model = ORTModule(LightningDistributedModule(self.model))
        self.model_to_device()
        return self.model

    @property
    def lightning_module(self):
        return unwrap_lightning_module(self._model)

    def training_step(self, *args, **kwargs):
        return self.model(*args, **kwargs)

    def validation_step(self, *args, **kwargs):
        return self.model(*args, **kwargs)

    def test_step(self, *args, **kwargs):
        return self.model(*args, **kwargs)


def run():
    train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    test_data = DataLoader(RandomDataset(32, 64), batch_size=2)

    model = BoringModel()
    trainer = Trainer(
        default_root_dir=os.getcwd(),
        limit_train_batches=1,
        limit_val_batches=1,
        max_epochs=1,
        plugins=ORTPlugin(device=torch.device('cuda:0')),
        gpus=1,
    )
    trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
    trainer.test(model, dataloaders=test_data)


if __name__ == '__main__':
    run()

I'll continue to debug in the meantime :)

opened by SeanNaren 6

AttributeError: 'ORTModule' object has no attribute 'resize_token_embeddings'

Hi, I am using ort to run transformers/examples/pytorch/language-modeling/run_clm.py (fine-tuning GPT-2 on WikiText-2, using the raw WikiText-2 no tokens were replaced before the tokenization). I am running it on rocm platform. I edited the script like this

from torch_ort import ORTModule

    if model_args.model_name_or_path:
        model = AutoModelForCausalLM.from_pretrained(
            model_args.model_name_or_path,
            from_tf=bool(".ckpt" in model_args.model_name_or_path),
            config=config,
            cache_dir=model_args.cache_dir,
            revision=model_args.model_revision,
            use_auth_token=True if model_args.use_auth_token else None,
        )
        model = ORTModule(model)
    else:
        model = AutoModelForCausalLM.from_config(config)
        model = ORTModule(model)
        n_params = sum(dict((p.data_ptr(), p.numel()) for p in model.parameters()).values())
        logger.info(f"Training new model from scratch - Total size={n_params/2**20:.2f}M params")

I am getting this error

Traceback (most recent call last):
  File "./examples/pytorch/language-modeling/run_clm.py", line 519, in <module>
    main()
  File "./examples/pytorch/language-modeling/run_clm.py", line 353, in main
    model.resize_token_embeddings(len(tokenizer))
  File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 948, in __getattr__
    type(self).__name__, name))
AttributeError: 'ORTModule' object has no attribute 'resize_token_embeddings'

Could you kindly help me in resolving it Thank you Bhavya

opened by bmedishe 6

Update nvidia docker images with new gpg keys and update the blob upload logic

nvidia recently refreshed its signing keys (https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/), causing problems in our CI pipelines. Some of the docker images have been updated while others have not. Moving our pipelines to use the docker images that were updated.

In addition, this PR uses the new azure APIs for uploading whl blobs to our container registry.
CLA Signed

opened by baijumeswani 5

tests/bert_for_sequence_classification.py reports "This is an invalid model"

Hi, I installed torch-ort as the instruction. 'python -m torch_ort.configure' does not report any error. However, when I run the verification, it reports the following errors:

======== Epoch 1 / 4 with batch size 32 ========
Warning: Unsupported operator ATenOp. No schema registered for this operator.
Warning: Unsupported operator ATenOp. No schema registered for this operator.
Warning: Unsupported operator ATenOp. No schema registered for this operator.
Warning: Unsupported operator ATenOp. No schema registered for this operator.
Warning: Unsupported operator ATenOp. No schema registered for this operator.
Warning: Unsupported operator ATenOp. No schema registered for this operator.
Warning: Unsupported operator SoftmaxCrossEntropyLossInternal. No schema registered for this operator.
2021-11-16 23:56:39.446639117 [W:onnxruntime:Default, graph.cc:2538 InitFunctionBodyForNode] Function body initialization failed for node 'Softmax_131_Grad/SoftmaxGrad_0' optype SoftmaxGrad. Error message /onnxruntime_src/onnxruntime/core/graph/function.cc:749 onnxruntime::FunctionImpl::FunctionImpl(onnxruntime::Graph&, const NodeIndex&, const onnx::FunctionProto&, const std::unordered_map<std::basic_string<char>, const onnx::FunctionProto*>&, std::vector<std::unique_ptr<onnxruntime::Function> >&, const onnxruntime::logging::Logger&, bool) status.IsOK() was false. Resolve subgraph failed:This is an invalid model. Error in Node:0x557a5b74e130 : Node (0x557a5b74e130) has input size 2 not in range [min=1, max=1].
. Execution will fail if ORT does not have a specialized kernel for this op
2021-11-16 23:56:39.464632288 [W:onnxruntime:, graph.cc:2538 InitFunctionBodyForNode] Function body initialization failed for node 'Softmax_131_Grad/SoftmaxGrad_0' optype SoftmaxGrad. Error message /onnxruntime_src/onnxruntime/core/graph/function.cc:749 onnxruntime::FunctionImpl::FunctionImpl(onnxruntime::Graph&, const NodeIndex&, const onnx::FunctionProto&, const std::unordered_map<std::basic_string<char>, const onnx::FunctionProto*>&, std::vector<std::unique_ptr<onnxruntime::Function> >&, const onnxruntime::logging::Logger&, bool) status.IsOK() was false. Resolve subgraph failed:This is an invalid model. Error in Node:0x557a60d5e230 : Node (0x557a60d5e230) has input size 2 not in range [min=1, max=1].
. Execution will fail if ORT does not have a specialized kernel for this op
Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!

Testbed: V100-32GB CUDA-10.2 torch==1.9.0 torch-ort==1.9.0 onnxruntime-training==1.9.0

opened by xysmlx 4

Fallback not kicking in with non-contiguous tensors

I tried running the following training code: https://github.com/natke/onnxruntime-training-examples/blob/034a5b73ce804d55c120308804fda6b08b016a8d/orttrainer/getting-started/train_ort.py (added ORTModule to the previous getting started example)

It fails when the input tensor is non-contiguous but fallback is not getting initiated: https://github.com/natke/onnxruntime-training-examples/blob/034a5b73ce804d55c120308804fda6b08b016a8d/orttrainer/getting-started/train_ort.py#L112, even when the policy is explicitly set to FALLBACK_FORCE_TORCH_FORWARD.

Error message

File "train_ort.py", line 168, in <module>
train(model)
File "train_ort.py", line 116, in train
output = model(data, src_mask)
File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/ortmodule.py", line 81, in _forward
return self._torch_module.forward(*inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_torch_module_ort.py", line 32, in _forward
return self._execution_manager(self.is_training()).forward(*inputs, **kwargs)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 265, in forward
override_policy=_FallbackPolicy.FALLBACK_FORCE_TORCH_FORWARD)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_fallback.py", line 194, in handle_exception
raise exception
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 256, in forward
self._device)))
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 149, in forward
*inputs)
File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 42, in execution_session_run_forward
forward_inputs.push_back(to_dlpack(input), input.dtype == torch.bool)
RuntimeError: /onnxruntime_src/onnxruntime/core/dlpack/dlpack_converter.cc:223 OrtValue onnxruntime::dlpack::DlpackToOrtValue(DLManagedTensor*, bool) IsContiguousTensor(dlpack->dl_tensor) was false. ORT only supports contiguous tensor for now.

opened by natke 4

Warning: Checker does not support models with experimental ops: ATen

Even though I only see warnings below (no error) the trace does not get created

Repro script

import torch
import torch.nn as nn
import torch.backends.cudnn as cudnn
import torch.optim
import torch.utils.data
import torchvision
import torchvision.transforms as T
import torchvision.models as models

import torch.profiler

model = models.resnet50(pretrained=True)
model.cuda()
cudnn.benchmark = True

transform = T.Compose([T.Resize(256), T.CenterCrop(224), T.ToTensor()])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
                                          shuffle=True, num_workers=4)

criterion = nn.CrossEntropyLoss().cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
device = torch.device("cuda:0")

from torch_ort import ORTModule
model = ORTModule(model)

model.train()

with torch.profiler.profile(
    activities=[
        torch.profiler.ProfilerActivity.CPU,
        torch.profiler.ProfilerActivity.CUDA],
    schedule=torch.profiler.schedule(
        wait=1,
        warmup=1,
        active=2),
    on_trace_ready=torch.profiler.tensorboard_trace_handler('./result', worker_name='worker0'),
    record_shapes=True,
    profile_memory=True,  # This will take 1 to 2 minutes. Setting it to False could greatly speedup.
    with_stack=True
) as p:
    for step, data in enumerate(trainloader, 0):
        print("step:{}".format(step))
        inputs, labels = data[0].to(device=device), data[1].to(device=device)

        outputs = model(inputs)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        if step + 1 >= 4:
            break
        p.step()

Package versions

(serve) ubuntu@ip-172-31-17-70:~$ pip list
Package                Version
---------------------- -------------------
brotlipy               0.7.0
Cerberus               1.3.4
certifi                2020.12.5
cffi                   1.14.5
chardet                4.0.0
conda                  4.10.1
conda-package-handling 1.7.3
cryptography           3.4.7
flatbuffers            22.9.24
h5py                   3.7.0
idna                   2.10
mamba                  0.13.0
mpmath                 1.2.1
numpy                  1.23.3
onnx                   1.12.0
onnxruntime-training   1.12.0
packaging              21.3
Pillow                 9.2.0
pip                    21.1.2
protobuf               3.20.1
pycosat                0.6.3
pycparser              2.20
pyOpenSSL              20.0.1
pyparsing              3.0.9
PySocks                1.7.1
requests               2.25.1
ruamel-yaml-conda      0.15.80
setuptools             49.6.0.post20210108
six                    1.16.0
sympy                  1.11.1
torch                  1.12.1+cu116
torch-ort              1.12.0
torchaudio             0.12.1+cu116
torchvision            0.13.1+cu116
tqdm                   4.61.0
typing-extensions      4.4.0
urllib3                1.26.4
wheel                  0.36.2

Logs

(serve) ubuntu@ip-172-31-17-70:~$ python resnet.py 
/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
  warnings.warn(
/opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
Files already downloaded and verified
/opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:118: UserWarning: onnxruntime training package info: package_name: onnxruntime-training
  warnings.warn("onnxruntime training package info: package_name: %s" % package_name)
/opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:119: UserWarning: onnxruntime training package info: __version__: 1.12.0
  warnings.warn("onnxruntime training package info: __version__: %s" % version)
/opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:120: UserWarning: onnxruntime training package info: cuda_version: 10.2
  warnings.warn("onnxruntime training package info: cuda_version: %s" % cuda_version)
/opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:121: UserWarning: onnxruntime build info: cudart_version: 10020
  warnings.warn("onnxruntime build info: cudart_version: %s" % cudart_version)
/opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:129: UserWarning: WARNING: failed to find cudart version that matches onnxruntime build info
  warnings.warn("WARNING: failed to find cudart version that matches onnxruntime build info")
/opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:130: UserWarning: WARNING: found cudart versions: [11060]
  warnings.warn("WARNING: found cudart versions: %s" % local_cudart_versions)
step:0
/opt/conda/lib/python3.9/site-packages/onnxruntime/training/ortmodule/_training_manager.py:190: UserWarning: Fast path enabled - skipping checks. Rebuild graph: True, Execution agent: True, Device check: True
  warnings.warn(
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.0.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.0.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.0.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.0.downsample.1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.1.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.1.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.1.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.2.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.2.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.2.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.0.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.0.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.0.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.0.downsample.1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.1.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.1.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.1.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.2.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.2.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.2.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.3.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.3.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.3.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.0.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.0.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.0.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.0.downsample.1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.1.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.1.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.1.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.2.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.2.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.2.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.3.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.3.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.3.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.4.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.4.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.4.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.5.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.5.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.5.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.0.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.0.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.0.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.0.downsample.1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.1.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.1.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.1.bn3.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.2.bn1.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.2.bn2.num_batches_tracked'. This changes graph semantics.
Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.2.bn3.num_batches_tracked'. This changes graph semantics.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Warning: Checker does not support models with experimental ops: ATen
Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!

opened by msaroufim 2

RuntimeError: Error in execution: At least one output should be requested.

Getting this error with pretty simple model. This is direct error from ONNX, but I couldn’t find any methods to register output in ORTInferenceModule

Versions: torch Version: 1.12.1 onnx Version: 1.12.0 torch-ort-infer Version: 1.12.0

Reproduction steps:

import torch
from torch import nn
from torch_ort import ORTInferenceModule, OpenVINOProviderOptions

class Block(nn.Module):
    def __init__(self, size):
        super().__init__()
        self.size = size
        self.ff1 = nn.Linear(size, size)

    def forward(self, x):
        second = self.ff1(x)
        return second

model = Block(1024)
model.eval()

model = ORTInferenceModule(model, provider_options=OpenVINOProviderOptions(backend="CPU", precision="FP32"))

with torch.inference_mode():
    print("start")

    x = torch.randn(1, 1024, dtype=torch.float32)
    x = model(x)
    print(x.mean())

opened by StochasticRomanAgeev 0

[torch-ort-infer] Aten fallback doesn't work
Aten op doesn't fallback to native pytorch runtime as expected.

Versions: Torch - 1.12.0 OnnxRuntime - 1.12.0 Torch-ort-infer - 1.12.0

Reproduction steps:

import torch from torch_ort import ORTInferenceModule def test_numpy_T(input_shape): class NeuralNet(torch.nn.Module): def __init__(self): super(NeuralNet, self).__init__() def forward(self, input): return input.T device = "cpu" ort_model = ORTInferenceModule(NeuralNet().to(device)) def run_step(model, input): prediction = model(input) return prediction ort_input = torch.rand(input_shape, dtype=torch.float, device=device) ort_prediction = run_step(ort_model, ort_input) if __name__ == "__main__": test_numpy_T([3, 2, 5])

Error log

Traceback (most recent call last): File "unit_test_atenop.py", line 23, in test_numpy_T([3, 2, 5]) File "unit_test_atenop.py", line 20, in test_numpy_T ort_prediction = run_step(ort_model, ort_input) File "unit_test_atenop.py", line 16, in run_step prediction = model(input) File "/ort_aten_fb/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/ort_aten_fb/lib/python3.8/site-packages/torch_ort/ortinferencemodule/_utils_infer.py", line 98, in _forward return ortinferencemodule._forward_call(*inputs, **kwargs) File "/ort_aten_fb/lib/python3.8/site-packages/torch_ort/ortinferencemodule/ortinferencemodule.py", line 107, in _forward_call self._inference_session = onnxruntime.InferenceSession( File "/ort_aten_fb/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/ort_aten_fb/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 386, in create_inference_session sess = C.InferenceSession(session_options, self.model_bytes, False, self.read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (ATen_0) output arg (data) type inference failed.

Tested with symbolic shape inference call from ORTModule(ref: symbolic_shape). Fails with Exception("Incomplete symbolic shape inference").
opened by saipj 6
torch-ort cannot be installed on windows: onnxruntime-training not found

When running pip install torch-ort in a conda environment on Windows, I get the following error:

ERROR: Could not find a version that satisfies the requirement onnxruntime-training (from versions: none) ERROR: No matching distribution found for onnxruntime-training

However, if I run the same command in a conda environment in WSL, it works just fine. Other people could repro. Seems to be a Windows issue.

opened by alpatots 5

Compatibility between ORTModule and DeepSpeed

Hi folks,

I am recently working on validating distributed training features while using ORTModule, here are some incompatibilities that I found during some tests:

[With DeepSpeed]

ZeRO Stage 1 and 2 work well
ZeRO Stage 3 ❌

Warnings:

/usr/local/lib/python3.8/dist-packages/onnxruntime/training/ortmodule/_io.py:558: 
UserWarning: This model cannot be deep copied (or pickled), which is a required step for stateful models to be properly exported to ONNX. Compute will continue,  but  unexpected results may occur!  
warnings.warn("This model cannot be deep copied (or pickled)

BF16 ❌

Error Message:

RuntimeError: /onnxruntime_src/orttraining/orttraining/python/orttraining_pybind_state.cc:752
onnxruntime::python::addObjectMethodsForTraining(pybind11::module&, onnxruntime::python::ExecutionProviderRegistrationFn)::<lambda(onnxruntime::training::OrtModuleGraphBuilder*, 
const pybind11::bytes&, const onnxruntime::training::OrtModuleGraphBuilderConfiguration&)> 
[ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(bfloat16)' of input parameter
(_original_module.distilbert.embeddings.word_embeddings.weight) of operator (ATen) in node (ATen_17) is invalid

[With Fairscale]

Can only shard optimizer state

Environment

OS: Ubuntu 20.04
CUDA/cuDNN version: 11.3/8
onnxruntime-training: 1.11.1+cu113
torch: 1.11.0+cu113
torch-ort: 1.11.1
Python version:3.8
GPU: A100

I would like to confirm with you folks if these behaviors are intended? And concerning the compatibility with DeepSpeed stage 3 and BF16, would it be possible to have some insights on if it would be supported in the future?

Thanks a lot!

opened by JingyaHuang 6

ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

Related tags

Overview

Accelerate PyTorch models with ONNX Runtime

Pre-requisites

Install in a local Python environment

Default dependencies

Test your installation

Add ONNX Runtime for PyTorch to your PyTorch training script

Samples

License

Comments

Repro script

Package versions

Logs

Owner

Model summary in PyTorch similar to `model.summary()` in Keras

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

High-level batteries-included neural network training library for Pytorch

Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

torch-optimizer -- collection of optimizers for Pytorch

A PyTorch implementation of EfficientNet

The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

PyTorch Extension Library of Optimized Scatter Operations

PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

Reformer, the efficient Transformer, in Pytorch

PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

PyTorch extensions for fast R&D prototyping and Kaggle farming

An implementation of Performer, a linear attention-based transformer, in Pytorch