ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

Overview

Accelerate PyTorch models with ONNX Runtime

ONNX Runtime for PyTorch accelerates PyTorch model training using ONNX Runtime.

It is available via the torch-ort python package.

This repository contains the source code for the package, as well as instructions for running the package.

Pre-requisites

You need a machine with at least one NVIDIA or AMD GPU to run ONNX Runtime for PyTorch.

You can install and run torch-ort in your local environment, or with Docker.

Install in a local Python environment

Default dependencies

By default, torch-ort depends on PyTorch 1.9.0, ONNX Runtime 1.8.1 and CUDA 10.2.

  1. Install CUDA 10.2

  2. Install CuDNN 7.6

  3. Install torch-ort

    • pip install torch-ort
  4. Run post-installation script for ORTModule

    • python -m torch_ort.configure

Get install instructions for other combinations in the Get Started Easily section at https://www.onnxruntime.ai/ under the Optimize Training tab.

Test your installation

  1. Clone this repo

  2. Install extra dependencies

    • pip install wget pandas sklearn transformers
  3. Run the training script

    • python ./ort/tests/bert_for_sequence_classification.py

Add ONNX Runtime for PyTorch to your PyTorch training script

from torch_ort import ORTModule
model = ORTModule(model)

# PyTorch training script follows

Samples

To see torch-ort in action, see https://github.com/microsoft/onnxruntime-training-examples, which shows you how to train the most popular HuggingFace models.

License

This project has an MIT license, as found in the LICENSE file.

Comments
  • Running ORTModule with other EPs from ORT

    Running ORTModule with other EPs from ORT

    I am building a new wheel with the OneDNN EP using Onnx runtime training. After that is installed, I install torch_ort and then run the configure, but it does not seem to work ( I get the same error asking me to run the configure again). From the instructions, I see that there is no recipe for this combination. Is this possible or is there any other way for me to build a custom wheel and use it to train bert model with OneDNN and ORT?

    opened by chethanpk 11
  • ONNXRuntimeError after enabled fp16 mixed precision training

    ONNXRuntimeError after enabled fp16 mixed precision training

    Hi folks,

    I tested fp16 mixed precision training with ORTModule wrapped GPT2 model on a fine-tuning task. However, after enabling fp16, I encountered the following error:

    Error Message

    Traceback (most recent call last):
      File "test_onnxruntime_train.py", line 115, in test_ort_trainer
        train_result = trainer.train()
      File "/workspace/optimum/onnxruntime/trainer.py", line 498, in train
        tr_loss_step = self.training_step(model, inputs)
      File "/usr/local/lib/python3.6/dist-packages/transformers/trainer.py", line 1984, in training_step
        loss = self.compute_loss(model, inputs)
      File "/usr/local/lib/python3.6/dist-packages/transformers/trainer.py", line 2016, in compute_loss
        outputs = model(**inputs)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/ortmodule.py", line 81, in _forward
        return self._torch_module.forward(*inputs, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_torch_module_ort.py", line 32, in _forward
        return self._execution_manager(self.is_training()).forward(*inputs, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 265, in forward
        override_policy=_FallbackPolicy.FALLBACK_FORCE_TORCH_FORWARD)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_fallback.py", line 194, in handle_exception
        raise exception
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 85, in forward
        self._initialize_graph_builder(training=True)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 420, in _initialize_graph_builder
        self._onnx_models.exported_model.SerializeToString(), grad_builder_config)
    RuntimeError: /onnxruntime_src/orttraining/orttraining/python/orttraining_pybind_state.cc:707 onnxruntime::python::addObjectMethodsForTraining(pybind11::module&, onnxruntime::python::ExecutionProviderRegistrationFn)::<lambda(onnxruntime::training::OrtModuleGraphBuilder*, const pybind11::bytes&, const onnxruntime::training::OrtModuleGraphBuilderConfiguration&)> [ONNXRuntimeError] : 1 : FAIL : Type Error: Type parameter (T) of Optype (Where) bound to different types (tensor(float) and tensor(float16) in node (Where_183).
    

    It seems that the exported ONNX graph is broken due to incompatible input types. I am wondering where comes the problem. Do any insight on that?


    System information

    Docker image built with the Dockerfile-cu11 in onnxruntime-training-examples.

    • OS: Ubuntu 18.04
    • CUDA/cuDNN version: 11/8
    • onnxruntime-training: 1.9.0+cu111
    • torch: 1.9.0+cu111
    • torch-ort: 1.9.0
    • Python version:3.6
    • GPU: A100

    Additional Information

    • I actually have a work version under the environment: torch 1.8.1+torch-ort 1.9.0+onnxruntime-training1.11.0.dev20220113001+cu102, so I wonder if the error comes from the fact that what's in the Dockerfile are outdated. However, I can't find how to install onnxruntime-training1.11.0.dev20220113001+cu102 anymore.
    • Here is the onnx graph exported with DebugOptions, not sure if that could help image
    opened by JingyaHuang 8
  • RecursionError: maximum recursion depth exceeded in comparison

    RecursionError: maximum recursion depth exceeded in comparison

    I use ort like this:

    ...
    model = nn.SyncBatchNorm.convert_sync_batchnorm(model)
    model = ORTModule(model)
    model = nn.parallel.DistributedDataParallel(model, find_unused_parameters=True, device_ids=[device])
    ...
    

    But found error:

    Traceback (most recent call last):
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
        fn(i, *args)
      File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/hdlt/engine/ddp_trainer.py", line 156, in _main_func
        main_func(local_rank, *args)
      File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/tools/train.py", line 163, in train_entrance
        trainer.fit()
      File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/tools/trainer_wrapper.py", line 225, in fit
        self._trainer.fit()
      File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/hdlt/engine/trainer.py", line 298, in fit
        profiler=self.profiler,
      File "/home/users/min.du/hdlt/feature_j5fsd_configs/HDLT/hdlt/engine/processors/processor.py", line 265, in __call__
        model_outs = model(*_as_list(batch_i))
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
        output = self.module(*inputs[0], **kwargs[0])
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
        result = self.forward(*input, **kwargs)
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/ortmodule.py", line 41, in _forward
        return self._execution_manager(self._is_training()).forward(*inputs, **kwargs)
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_training_manager.py", line 67, in forward
        build_gradient_graph = self._export_model(*inputs, **kwargs)
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_graph_execution_manager.py", line 206, in _export_model
        schema = _io._extract_schema({'args': copy.copy(inputs), 'kwargs': copy.copy(kwargs)})
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 300, in _extract_schema
        data[key] = _extract_schema(data[key])
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 291, in _extract_schema
        data[idx] = _extract_schema(data[idx])
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 291, in _extract_schema
        data[idx] = _extract_schema(data[idx])
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 291, in _extract_schema
        data[idx] = _extract_schema(data[idx])
      [Previous line repeated 949 more times]
      File "/home/users/min.du/venvs/pytorch1.8/lib/python3.6/site-packages/onnxruntime/training/ortmodule/_io.py", line 287, in _extract_schema
        if isinstance(data, abc.Sequence):
      File "/home/users/min.du/venvs/pytorch1.8/lib64/python3.6/abc.py", line 184, in __instancecheck__
        if subclass in cls._abc_cache:
      File "/home/users/min.du/venvs/pytorch1.8/lib64/python3.6/_weakrefset.py", line 75, in __contains__
        return wr in self.data
    RecursionError: maximum recursion depth exceeded in comparison
    

    Any suggestion?

    opened by DuinoDu 7
  • Refactor torch_ort and ort_moe into their respective directories

    Refactor torch_ort and ort_moe into their respective directories

    Moving torch_ort and ort_moe code including setup.*, tests, docker, docs into respective directory - moving files, update documentations and directory only, no functional code change

    CLA Signed module: rocm 
    opened by jingyanwangms 6
  • MaxPool op resolved as Aten OP

    MaxPool op resolved as Aten OP

    I created a small model training script, to test out maxpool gradient op for OneDNN EP, and the model definition below was used. But for some reason, the maxpool was resolved to Aten Op in the onnx graph. Is there a way to force torch-ort to use maxpool instead aten op? (maybe by disabling use of aten ops?)

    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
            self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
            self.maxpool1 = nn.MaxPool2d(2)
            self.maxpool2 = nn.MaxPool2d(2)
            #self.conv2_drop = nn.Dropout2d()
            self.fc1 = nn.Linear(320, 50)
            self.fc2 = nn.Linear(50, 10)
    
        def forward(self, x):
            x = F.relu(self.maxpool1(self.conv1(x)))
            x = F.relu(self.maxpool2(self.conv2(x)))
            x = x.view(-1, 320)
            x = F.relu(self.fc1(x))
            x = F.dropout(x, training=self.training)
            x = self.fc2(x)
            return F.log_softmax(x)
    
    opened by chethanpk 6
  • Lack of speed improvement when using custom GPT model with ORT

    Lack of speed improvement when using custom GPT model with ORT

    hey guys! In my investigation to try figure out why there is a speed regression for #56, I created a simple minimal script to benchmark ORT vs no ORT.

    With the script I'm seeing basically the same time between ORT and no ORT. Any ideas on what is causing performance issue? I'm also seeing a few warnings which I've included below!

    No ORT Time taken: 85.2842013835907 seconds ORT Time taken 85.33545899391174 seconds

    Warnings:

    /usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_logger.py:52: UserWarning: There were one or more warnings or errors raised while exporting the PyTorch model. Please enable INFO level logging to view all warnings and errors.
      "model. Please enable INFO level logging to view all warnings and errors.", UserWarning)
    Warning: Unsupported operator ATenOp. No schema registered for this operator.
    Warning: Unsupported operator ATenOp. No schema registered for this operator.
    

    script:

    import math
    import os
    import time
    
    import numpy as np
    import torch
    import torch.nn as nn
    from torch.cuda.amp import autocast
    from torch.nn import functional as F
    from torch.utils.data import Dataset, DataLoader
    from tqdm import tqdm
    from torch_ort import ORTModule
    
    
    class GPTConfig:
        """ base GPT config, params common to all GPT versions """
        embd_pdrop = 0.1
        resid_pdrop = 0.1
        attn_pdrop = 0.1
    
        def __init__(self, vocab_size, block_size, **kwargs):
            self.vocab_size = vocab_size
            self.block_size = block_size
            for k, v in kwargs.items():
                setattr(self, k, v)
    
    
    class CausalSelfAttention(nn.Module):
        """
        A vanilla multi-head masked self-attention layer with a projection at the end.
        I believe I could have just used torch.nn.MultiheadAttention but their documentation
        is all but absent and code ugly so I don't trust it, rolling my own here.
        """
    
        def __init__(self, config):
            super().__init__()
            assert config.n_embd % config.n_head == 0
            # key, query, value projections for all heads
            self.key = nn.Linear(config.n_embd, config.n_embd)
            self.query = nn.Linear(config.n_embd, config.n_embd)
            self.value = nn.Linear(config.n_embd, config.n_embd)
            # regularization
            self.attn_drop = nn.Dropout(config.attn_pdrop)
            self.resid_drop = nn.Dropout(config.resid_pdrop)
            # output projection
            self.proj = nn.Linear(config.n_embd, config.n_embd)
            # causal mask to ensure that attention is only applied to the left in the input sequence
            self.register_buffer("mask", torch.tril(torch.ones(config.block_size, config.block_size))
                                 .view(1, 1, config.block_size, config.block_size))
            self.n_head = config.n_head
    
        def forward(self, x, layer_past=None):
            B, T, C = x.size()
    
            # calculate query, key, values for all heads in batch and move head forward to be the batch dim
            k = self.key(x).view(B, T, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
            q = self.query(x).view(B, T, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
            v = self.value(x).view(B, T, self.n_head, C // self.n_head).transpose(1, 2)  # (B, nh, T, hs)
    
            # causal self-attention; Self-attend: (B, nh, T, hs) x (B, nh, hs, T) -> (B, nh, T, T)
            att = (q @ k.transpose(-2, -1)) * (1.0 / math.sqrt(k.size(-1)))
            att = att.masked_fill(self.mask[:, :, :T, :T] == 0, float('-inf'))
            att = F.softmax(att, dim=-1)
            att = self.attn_drop(att)
            y = att @ v  # (B, nh, T, T) x (B, nh, T, hs) -> (B, nh, T, hs)
            y = y.transpose(1, 2).contiguous().view(B, T, C)  # re-assemble all head outputs side by side
    
            # output projection
            y = self.resid_drop(self.proj(y))
            return y
    
    
    class Block(nn.Module):
        """ an unassuming Transformer block """
    
        def __init__(self, config):
            super().__init__()
            self.ln1 = nn.LayerNorm(config.n_embd)
            self.ln2 = nn.LayerNorm(config.n_embd)
            self.attn = CausalSelfAttention(config)
            self.mlp = nn.Sequential(
                nn.Linear(config.n_embd, 4 * config.n_embd),
                nn.GELU(),
                nn.Linear(4 * config.n_embd, config.n_embd),
                nn.Dropout(config.resid_pdrop),
            )
    
        def forward(self, x):
            x = x + self.attn(self.ln1(x))
            x = x + self.mlp(self.ln2(x))
            return x
    
    
    class GPT(torch.nn.Module):
        def __init__(self, vocab_size, n_embd, block_size, embd_pdrop, n_layer, config):
            # input embedding stem
            super().__init__()
            self.tok_emb = nn.Embedding(vocab_size, n_embd)
            self.pos_emb = nn.Parameter(torch.zeros(1, block_size, n_embd))
            self.drop = nn.Dropout(embd_pdrop)
            self.config = config
    
            # decoder head
            self.ln_f = nn.LayerNorm(n_embd)
            self.head = nn.Linear(n_embd, vocab_size, bias=False)
    
            self.block_size = block_size
    
            blocks = []
            for x in range(n_layer):
                layer = Block(self.config)
                blocks.append(layer)
            self.blocks = nn.Sequential(*blocks)
    
        def forward(self, idx):
            b, t = idx.size()
            assert t <= self.block_size, "Cannot forward, model block size is exhausted."
    
            # forward the GPT model
            token_embeddings = self.tok_emb(idx)  # each index maps to a (learnable) vector
            position_embeddings = self.pos_emb[:, :t, :]  # each position maps to a (learnable) vector
            x = self.drop(token_embeddings + position_embeddings)
            x = self.blocks(x)
            x = self.ln_f(x)
            logits = self.head(x)
            return logits
    
    
    class CharDataset(Dataset):
    
        def __init__(self, data, block_size):
            chars = list(set(data))
            data_size, vocab_size = len(data), len(chars)
    
            self.stoi = {ch: i for i, ch in enumerate(chars)}
            self.itos = {i: ch for i, ch in enumerate(chars)}
            self.block_size = block_size
            self.vocab_size = vocab_size
            self.data = data
    
        def __len__(self):
            return math.ceil(len(self.data) / (self.block_size + 1))
    
        def __getitem__(self, idx):
            # we're actually going to "cheat" and pick a spot in the dataset at random
            i = np.random.randint(0, len(self.data) - (self.block_size + 1))
            chunk = self.data[i:i + self.block_size + 1]
            dix = [self.stoi[s] for s in chunk]
            x = torch.tensor(dix[:-1], dtype=torch.long)
            y = torch.tensor(dix[1:], dtype=torch.long)
            return x, y
    
    
    if __name__ == '__main__':
        n_embd = 2048
        block_size = 128
        n_layer = 6
        batch_size = 8
        num_workers = 0
        n_head = 16
        n_warmup = 20
        enable_ort = True
    
        device = torch.device("cuda:0")
    
        if not os.path.exists("input.txt"):
            os.system("wget https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt")
    
        file = 'input.txt'
        text = open(file, 'r').read()
        train_dataset = CharDataset(text, block_size)  # one line of poem is roughly 50 characters
        train_loader = DataLoader(train_dataset, batch_size=batch_size, num_workers=num_workers)
        vocab_size = train_dataset.vocab_size
    
        model = GPT(
            vocab_size=vocab_size,
            n_embd=n_embd,
            embd_pdrop=0.1,
            block_size=block_size,
            n_layer=n_layer,
            config=GPTConfig(
                vocab_size=vocab_size,
                block_size=block_size,
                n_layer=n_layer,
                n_head=n_head,
                n_embd=n_embd,
            )
        )
        if enable_ort:
            model = ORTModule(model)
    
        model.to(device)
        optimizer = torch.optim.AdamW(model.parameters(), lr=1e-5)
    
        torch.cuda.synchronize()
        # warmup before measuring
        for x, (idx, targets) in tqdm(enumerate(train_loader), total=len(train_loader)):
            if x == n_warmup:
                break
            idx = idx.to(device)
            targets = targets.to(device)
            with autocast():
                logits = model(idx)
                loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1))
    
        torch.cuda.synchronize()
        start_time = time.time()
    
        for idx, targets in tqdm(train_loader, total=len(train_loader)):
            idx = idx.to(device)
            targets = targets.to(device)
            with autocast():
                logits = model(idx)
                loss = F.cross_entropy(logits.view(-1, logits.size(-1)), targets.view(-1))
            loss.backward()
            optimizer.step()
    
        torch.cuda.synchronize()
        print("Time taken", time.time() - start_time)
    
    

    cc @natke @ashbhandare

    opened by SeanNaren 6
  • PyTorch Lightning Integration

    PyTorch Lightning Integration

    Hey guys!

    Really epic work in this repo! I'm currently working on integrating this into Lightning (any assistance would be appreciated). From what I see the ORTModule just wraps the forward function, converting it into ONNX format? As a result I've internally in Lightning wrapped the model to ensure that user defined functions (training_step validation_step test_step) are placed in a wrapped modules' forward function.

    Currently I'm running into an error:

    /usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_io.py:473: UserWarning: This model cannot be deep copied (or pickled), which is a required step for stateful models to be properly exported to ONNX. Compute will continue, but unexpected results may occur!
      warnings.warn("This model cannot be deep copied (or pickled), "
    2021-07-19 13:20:57.590381944 [E:onnxruntime:, inference_session.cc:1341 operator()] Exception during initialization: /onnxruntime_src/onnxruntime/core/framework/session_state_utils.cc:143 onnxruntime::common::Status onnxruntime::session_state_utils::SaveInitializedTensors(const onnxruntime::Env&, const std::basic_string<char>&, const onnxruntime::GraphViewer&, const AllocatorPtr&, const onnxruntime::OrtValueNameIdxMap&, const std::vector<int>&, onnxruntime::ITensorAllocator&, const std::function<onnxruntime::common::Status(int, const OrtValue&, const onnxruntime::OrtCallback&, bool)>&, const onnxruntime::logging::Logger&, const onnxruntime::DataTransferManager&, const onnxruntime::ExecutionPlanBase&, const onnxruntime::SessionOptions&) ort_value_name_idx_map.MaxIdx() > -1 was false. OrtValue indexes should have been populated.
    
    Traceback (most recent call last):
      File "reproduce_test.py", line 99, in <module>
        run()
      File "reproduce_test.py", line 94, in run
        trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
      File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 515, in fit
        self._run(model)
      File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 896, in _run
        self._dispatch()
      File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 963, in _dispatch
        self.accelerator.start_training(self)
      File "/data/pytorch-lightning/pytorch_lightning/accelerators/accelerator.py", line 97, in start_training
        self.training_type_plugin.start_training(trainer)
      File "/data/pytorch-lightning/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 161, in start_training
        self._results = trainer.run_stage()
      File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 973, in run_stage
        return self._run_train()
      File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1008, in _run_train
        self._run_sanity_check(self.lightning_module)
      File "/data/pytorch-lightning/pytorch_lightning/trainer/trainer.py", line 1084, in _run_sanity_check
        self._evaluation_loop.run()
      File "/data/pytorch-lightning/pytorch_lightning/loops/base.py", line 112, in run
        self.advance(*args, **kwargs)
      File "/data/pytorch-lightning/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 122, in advance
        self.num_dataloaders,
      File "/data/pytorch-lightning/pytorch_lightning/loops/base.py", line 112, in run
        self.advance(*args, **kwargs)
      File "/data/pytorch-lightning/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 122, in advance
        output = self.evaluation_step(batch, batch_idx, dataloader_idx)
      File "/data/pytorch-lightning/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 162, in evaluation_step
        output = self.trainer.accelerator.validation_step(step_kwargs)
      File "/data/pytorch-lightning/pytorch_lightning/accelerators/accelerator.py", line 220, in validation_step
        return self.training_type_plugin.validation_step(*step_kwargs.values())
      File "reproduce_test.py", line 74, in validation_step
        return self.model(*args, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
        return forward_call(*input, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/ortmodule.py", line 41, in _forward
        return self._execution_manager(self._is_training()).forward(*inputs, **kwargs)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_inference_manager.py", line 86, in forward
        self._create_execution_agent()
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_inference_manager.py", line 115, in _create_execution_agent
        session_options, providers, provider_options)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_execution_agent.py", line 52, in __init__
        self.create_inference_agent(path_or_bytes, session_options, providers, provider_options)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_execution_agent.py", line 56, in create_inference_agent
        providers, provider_options)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 283, in __init__
        self._create_inference_session(providers, provider_options, disabled_optimizers)
      File "/usr/local/lib/python3.6/dist-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 321, in _create_inference_session
        sess.initialize_session(providers, provider_options, disabled_optimizers)
    onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: /onnxruntime_src/onnxruntime/core/framework/session_state_utils.cc:143 onnxruntime::common::Status onnxruntime::session_state_utils::SaveInitializedTensors(const onnxruntime::Env&, const std::basic_string<char>&, const onnxruntime::GraphViewer&, const AllocatorPtr&, const onnxruntime::OrtValueNameIdxMap&, const std::vector<int>&, onnxruntime::ITensorAllocator&, const std::function<onnxruntime::common::Status(int, const OrtValue&, const onnxruntime::OrtCallback&, bool)>&, const onnxruntime::logging::Logger&, const onnxruntime::DataTransferManager&, const onnxruntime::ExecutionPlanBase&, const onnxruntime::SessionOptions&) ort_value_name_idx_map.MaxIdx() > -1 was false. OrtValue indexes should have been populated.
    

    With the script (requires you to install pytorch lightning, pip install pytorch-lightning):

    import os
    import pickle
    
    import torch
    from torch.utils.data import DataLoader, Dataset
    from torch_ort import ORTModule
    
    from pytorch_lightning import LightningModule, Trainer
    from pytorch_lightning.overrides import LightningDistributedModule
    from pytorch_lightning.plugins import SingleDevicePlugin
    
    
    class RandomDataset(Dataset):
    
        def __init__(self, size, length):
            self.len = length
            self.data = torch.randn(length, size)
    
        def __getitem__(self, index):
            return self.data[index]
    
        def __len__(self):
            return self.len
    
    
    class BoringModel(LightningModule):
    
        def __init__(self):
            super().__init__()
            self.layer = torch.nn.Linear(32, 2)
    
        def forward(self, x):
            return self.layer(x)
    
        def training_step(self, batch, batch_idx):
            loss = self(batch).sum()
            self.log("train_loss", loss)
            return {"loss": loss}
    
        def validation_step(self, batch, batch_idx):
            loss = self(batch).sum()
            self.log("valid_loss", loss)
    
        def test_step(self, batch, batch_idx):
            loss = self(batch).sum()
            self.log("test_loss", loss)
    
        def configure_optimizers(self):
            return torch.optim.SGD(self.layer.parameters(), lr=0.1)
    
    
    def unwrap_lightning_module(wrapped_model) -> 'pl.LightningModule':
        model = wrapped_model
        if isinstance(model, LightningDistributedModule):
            model = unwrap_lightning_module(model.module)
        if isinstance(model, ORTModule):
            model = unwrap_lightning_module(model._module_metadata.original_module)
        return model
    
    
    class ORTPlugin(SingleDevicePlugin):
        def setup(self, model: torch.nn.Module) -> torch.nn.Module:
            pickle.dumps(model)
            import pdb;pdb.set_trace()
    
            self.model = ORTModule(LightningDistributedModule(self.model))
            self.model_to_device()
            return self.model
    
        @property
        def lightning_module(self):
            return unwrap_lightning_module(self._model)
    
        def training_step(self, *args, **kwargs):
            return self.model(*args, **kwargs)
    
        def validation_step(self, *args, **kwargs):
            return self.model(*args, **kwargs)
    
        def test_step(self, *args, **kwargs):
            return self.model(*args, **kwargs)
    
    
    def run():
        train_data = DataLoader(RandomDataset(32, 64), batch_size=2)
        val_data = DataLoader(RandomDataset(32, 64), batch_size=2)
        test_data = DataLoader(RandomDataset(32, 64), batch_size=2)
    
        model = BoringModel()
        trainer = Trainer(
            default_root_dir=os.getcwd(),
            limit_train_batches=1,
            limit_val_batches=1,
            max_epochs=1,
            plugins=ORTPlugin(device=torch.device('cuda:0')),
            gpus=1,
        )
        trainer.fit(model, train_dataloaders=train_data, val_dataloaders=val_data)
        trainer.test(model, dataloaders=test_data)
    
    
    if __name__ == '__main__':
        run()
    

    I'll continue to debug in the meantime :)

    opened by SeanNaren 6
  • AttributeError: 'ORTModule' object has no attribute 'resize_token_embeddings'

    AttributeError: 'ORTModule' object has no attribute 'resize_token_embeddings'

    Hi, I am using ort to run transformers/examples/pytorch/language-modeling/run_clm.py (fine-tuning GPT-2 on WikiText-2, using the raw WikiText-2 no tokens were replaced before the tokenization). I am running it on rocm platform. I edited the script like this

    from torch_ort import ORTModule

        if model_args.model_name_or_path:
            model = AutoModelForCausalLM.from_pretrained(
                model_args.model_name_or_path,
                from_tf=bool(".ckpt" in model_args.model_name_or_path),
                config=config,
                cache_dir=model_args.cache_dir,
                revision=model_args.model_revision,
                use_auth_token=True if model_args.use_auth_token else None,
            )
            model = ORTModule(model)
        else:
            model = AutoModelForCausalLM.from_config(config)
            model = ORTModule(model)
            n_params = sum(dict((p.data_ptr(), p.numel()) for p in model.parameters()).values())
            logger.info(f"Training new model from scratch - Total size={n_params/2**20:.2f}M params")
    

    I am getting this error

    Traceback (most recent call last):
      File "./examples/pytorch/language-modeling/run_clm.py", line 519, in <module>
        main()
      File "./examples/pytorch/language-modeling/run_clm.py", line 353, in main
        model.resize_token_embeddings(len(tokenizer))
      File "/opt/conda/lib/python3.6/site-packages/torch/nn/modules/module.py", line 948, in __getattr__
        type(self).__name__, name))
    AttributeError: 'ORTModule' object has no attribute 'resize_token_embeddings'
    

    Could you kindly help me in resolving it Thank you Bhavya

    opened by bmedishe 6
  • Update nvidia docker images with new gpg keys and update the blob upload logic

    Update nvidia docker images with new gpg keys and update the blob upload logic

    nvidia recently refreshed its signing keys (https://developer.nvidia.com/blog/updating-the-cuda-linux-gpg-repository-key/), causing problems in our CI pipelines. Some of the docker images have been updated while others have not. Moving our pipelines to use the docker images that were updated.

    In addition, this PR uses the new azure APIs for uploading whl blobs to our container registry.

    CLA Signed 
    opened by baijumeswani 5
  • tests/bert_for_sequence_classification.py reports

    tests/bert_for_sequence_classification.py reports "This is an invalid model"

    Hi, I installed torch-ort as the instruction. 'python -m torch_ort.configure' does not report any error. However, when I run the verification, it reports the following errors:

    ======== Epoch 1 / 4 with batch size 32 ========
    Warning: Unsupported operator ATenOp. No schema registered for this operator.
    Warning: Unsupported operator ATenOp. No schema registered for this operator.
    Warning: Unsupported operator ATenOp. No schema registered for this operator.
    Warning: Unsupported operator ATenOp. No schema registered for this operator.
    Warning: Unsupported operator ATenOp. No schema registered for this operator.
    Warning: Unsupported operator ATenOp. No schema registered for this operator.
    Warning: Unsupported operator SoftmaxCrossEntropyLossInternal. No schema registered for this operator.
    2021-11-16 23:56:39.446639117 [W:onnxruntime:Default, graph.cc:2538 InitFunctionBodyForNode] Function body initialization failed for node 'Softmax_131_Grad/SoftmaxGrad_0' optype SoftmaxGrad. Error message /onnxruntime_src/onnxruntime/core/graph/function.cc:749 onnxruntime::FunctionImpl::FunctionImpl(onnxruntime::Graph&, const NodeIndex&, const onnx::FunctionProto&, const std::unordered_map<std::basic_string<char>, const onnx::FunctionProto*>&, std::vector<std::unique_ptr<onnxruntime::Function> >&, const onnxruntime::logging::Logger&, bool) status.IsOK() was false. Resolve subgraph failed:This is an invalid model. Error in Node:0x557a5b74e130 : Node (0x557a5b74e130) has input size 2 not in range [min=1, max=1].
    . Execution will fail if ORT does not have a specialized kernel for this op
    2021-11-16 23:56:39.464632288 [W:onnxruntime:, graph.cc:2538 InitFunctionBodyForNode] Function body initialization failed for node 'Softmax_131_Grad/SoftmaxGrad_0' optype SoftmaxGrad. Error message /onnxruntime_src/onnxruntime/core/graph/function.cc:749 onnxruntime::FunctionImpl::FunctionImpl(onnxruntime::Graph&, const NodeIndex&, const onnx::FunctionProto&, const std::unordered_map<std::basic_string<char>, const onnx::FunctionProto*>&, std::vector<std::unique_ptr<onnxruntime::Function> >&, const onnxruntime::logging::Logger&, bool) status.IsOK() was false. Resolve subgraph failed:This is an invalid model. Error in Node:0x557a60d5e230 : Node (0x557a60d5e230) has input size 2 not in range [min=1, max=1].
    . Execution will fail if ORT does not have a specialized kernel for this op
    Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!
    

    Testbed: V100-32GB CUDA-10.2 torch==1.9.0 torch-ort==1.9.0 onnxruntime-training==1.9.0

    opened by xysmlx 4
  • Fallback not kicking in with non-contiguous tensors

    Fallback not kicking in with non-contiguous tensors

    I tried running the following training code: https://github.com/natke/onnxruntime-training-examples/blob/034a5b73ce804d55c120308804fda6b08b016a8d/orttrainer/getting-started/train_ort.py (added ORTModule to the previous getting started example)

    It fails when the input tensor is non-contiguous but fallback is not getting initiated: https://github.com/natke/onnxruntime-training-examples/blob/034a5b73ce804d55c120308804fda6b08b016a8d/orttrainer/getting-started/train_ort.py#L112, even when the policy is explicitly set to FALLBACK_FORCE_TORCH_FORWARD.

    Error message

    File "train_ort.py", line 168, in <module>
    train(model)
    File "train_ort.py", line 116, in train
    output = model(data, src_mask)
    File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/ortmodule.py", line 81, in _forward
    return self._torch_module.forward(*inputs, **kwargs)
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_torch_module_ort.py", line 32, in _forward
    return self._execution_manager(self.is_training()).forward(*inputs, **kwargs)
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 265, in forward
    override_policy=_FallbackPolicy.FALLBACK_FORCE_TORCH_FORWARD)
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_fallback.py", line 194, in handle_exception
    raise exception
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 256, in forward
    self._device)))
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 149, in forward
    *inputs)
    File "/usr/local/lib/python3.6/dist-packages/onnxruntime/training/ortmodule/_training_manager.py", line 42, in execution_session_run_forward
    forward_inputs.push_back(to_dlpack(input), input.dtype == torch.bool)
    RuntimeError: /onnxruntime_src/onnxruntime/core/dlpack/dlpack_converter.cc:223 OrtValue onnxruntime::dlpack::DlpackToOrtValue(DLManagedTensor*, bool) IsContiguousTensor(dlpack->dl_tensor) was false. ORT only supports contiguous tensor for now.
    
    opened by natke 4
  • Warning: Checker does not support models with experimental ops: ATen

    Warning: Checker does not support models with experimental ops: ATen

    Even though I only see warnings below (no error) the trace does not get created

    Repro script

    import torch
    import torch.nn as nn
    import torch.backends.cudnn as cudnn
    import torch.optim
    import torch.utils.data
    import torchvision
    import torchvision.transforms as T
    import torchvision.models as models
    
    import torch.profiler
    
    model = models.resnet50(pretrained=True)
    model.cuda()
    cudnn.benchmark = True
    
    transform = T.Compose([T.Resize(256), T.CenterCrop(224), T.ToTensor()])
    trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                            download=True, transform=transform)
    trainloader = torch.utils.data.DataLoader(trainset, batch_size=32,
                                              shuffle=True, num_workers=4)
    
    criterion = nn.CrossEntropyLoss().cuda()
    optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)
    device = torch.device("cuda:0")
    
    from torch_ort import ORTModule
    model = ORTModule(model)
    
    model.train()
    
    with torch.profiler.profile(
        activities=[
            torch.profiler.ProfilerActivity.CPU,
            torch.profiler.ProfilerActivity.CUDA],
        schedule=torch.profiler.schedule(
            wait=1,
            warmup=1,
            active=2),
        on_trace_ready=torch.profiler.tensorboard_trace_handler('./result', worker_name='worker0'),
        record_shapes=True,
        profile_memory=True,  # This will take 1 to 2 minutes. Setting it to False could greatly speedup.
        with_stack=True
    ) as p:
        for step, data in enumerate(trainloader, 0):
            print("step:{}".format(step))
            inputs, labels = data[0].to(device=device), data[1].to(device=device)
    
            outputs = model(inputs)
            loss = criterion(outputs, labels)
    
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            if step + 1 >= 4:
                break
            p.step()
    

    Package versions

    (serve) ubuntu@ip-172-31-17-70:~$ pip list
    Package                Version
    ---------------------- -------------------
    brotlipy               0.7.0
    Cerberus               1.3.4
    certifi                2020.12.5
    cffi                   1.14.5
    chardet                4.0.0
    conda                  4.10.1
    conda-package-handling 1.7.3
    cryptography           3.4.7
    flatbuffers            22.9.24
    h5py                   3.7.0
    idna                   2.10
    mamba                  0.13.0
    mpmath                 1.2.1
    numpy                  1.23.3
    onnx                   1.12.0
    onnxruntime-training   1.12.0
    packaging              21.3
    Pillow                 9.2.0
    pip                    21.1.2
    protobuf               3.20.1
    pycosat                0.6.3
    pycparser              2.20
    pyOpenSSL              20.0.1
    pyparsing              3.0.9
    PySocks                1.7.1
    requests               2.25.1
    ruamel-yaml-conda      0.15.80
    setuptools             49.6.0.post20210108
    six                    1.16.0
    sympy                  1.11.1
    torch                  1.12.1+cu116
    torch-ort              1.12.0
    torchaudio             0.12.1+cu116
    torchvision            0.13.1+cu116
    tqdm                   4.61.0
    typing-extensions      4.4.0
    urllib3                1.26.4
    wheel                  0.36.2
    

    Logs

    (serve) ubuntu@ip-172-31-17-70:~$ python resnet.py 
    /opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and will be removed in 0.15, please use 'weights' instead.
      warnings.warn(
    /opt/conda/lib/python3.9/site-packages/torchvision/models/_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and will be removed in 0.15. The current behavior is equivalent to passing `weights=ResNet50_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet50_Weights.DEFAULT` to get the most up-to-date weights.
      warnings.warn(msg)
    Files already downloaded and verified
    /opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:118: UserWarning: onnxruntime training package info: package_name: onnxruntime-training
      warnings.warn("onnxruntime training package info: package_name: %s" % package_name)
    /opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:119: UserWarning: onnxruntime training package info: __version__: 1.12.0
      warnings.warn("onnxruntime training package info: __version__: %s" % version)
    /opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:120: UserWarning: onnxruntime training package info: cuda_version: 10.2
      warnings.warn("onnxruntime training package info: cuda_version: %s" % cuda_version)
    /opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:121: UserWarning: onnxruntime build info: cudart_version: 10020
      warnings.warn("onnxruntime build info: cudart_version: %s" % cudart_version)
    /opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:129: UserWarning: WARNING: failed to find cudart version that matches onnxruntime build info
      warnings.warn("WARNING: failed to find cudart version that matches onnxruntime build info")
    /opt/conda/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_validation.py:130: UserWarning: WARNING: found cudart versions: [11060]
      warnings.warn("WARNING: found cudart versions: %s" % local_cudart_versions)
    step:0
    /opt/conda/lib/python3.9/site-packages/onnxruntime/training/ortmodule/_training_manager.py:190: UserWarning: Fast path enabled - skipping checks. Rebuild graph: True, Execution agent: True, Device check: True
      warnings.warn(
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.0.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.0.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.0.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.0.downsample.1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.1.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.1.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.1.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.2.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.2.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer1.2.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.0.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.0.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.0.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.0.downsample.1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.1.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.1.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.1.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.2.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.2.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.2.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.3.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.3.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer2.3.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.0.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.0.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.0.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.0.downsample.1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.1.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.1.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.1.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.2.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.2.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.2.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.3.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.3.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.3.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.4.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.4.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.4.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.5.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.5.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer3.5.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.0.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.0.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.0.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.0.downsample.1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.1.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.1.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.1.bn3.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.2.bn1.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.2.bn2.num_batches_tracked'. This changes graph semantics.
    Warning: ONNX Preprocess - Removing mutation from node aten::add_ on block input: '_original_module.layer4.2.bn3.num_batches_tracked'. This changes graph semantics.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    WARNING: The shape inference of org.pytorch.aten::ATen type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function.
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Warning: Checker does not support models with experimental ops: ATen
    Inconsistency detected by ld.so: dl-version.c: 205: _dl_check_map_versions: Assertion `needed != NULL' failed!
    
    opened by msaroufim 2
  • RuntimeError: Error in execution: At least one output should be requested.

    RuntimeError: Error in execution: At least one output should be requested.

    Getting this error with pretty simple model. This is direct error from ONNX, but I couldn’t find any methods to register output in ORTInferenceModule

    Versions: torch Version: 1.12.1 onnx Version: 1.12.0 torch-ort-infer Version: 1.12.0

    Reproduction steps:

    import torch
    from torch import nn
    from torch_ort import ORTInferenceModule, OpenVINOProviderOptions
    
    class Block(nn.Module):
        def __init__(self, size):
            super().__init__()
            self.size = size
            self.ff1 = nn.Linear(size, size)
    
        def forward(self, x):
            second = self.ff1(x)
            return second
    
    model = Block(1024)
    model.eval()
    
    model = ORTInferenceModule(model, provider_options=OpenVINOProviderOptions(backend="CPU", precision="FP32"))
    
    with torch.inference_mode():
        print("start")
    
        x = torch.randn(1, 1024, dtype=torch.float32)
        x = model(x)
        print(x.mean())
    
    opened by StochasticRomanAgeev 0
  • [torch-ort-infer] Aten fallback doesn't work

    [torch-ort-infer] Aten fallback doesn't work

    Aten op doesn't fallback to native pytorch runtime as expected.

    Versions: Torch - 1.12.0 OnnxRuntime - 1.12.0 Torch-ort-infer - 1.12.0

    Reproduction steps:

    import torch
    from torch_ort import ORTInferenceModule
    
    def test_numpy_T(input_shape):
        class NeuralNet(torch.nn.Module):
            def __init__(self):
                super(NeuralNet, self).__init__()
            def forward(self, input):
                return input.T
    
        device = "cpu"
        ort_model = ORTInferenceModule(NeuralNet().to(device))
    
        def run_step(model, input):
            prediction = model(input)
            return prediction
    
        ort_input = torch.rand(input_shape, dtype=torch.float, device=device)
        ort_prediction = run_step(ort_model, ort_input)
    
    if __name__ == "__main__":
        test_numpy_T([3, 2, 5])
    

    Error log

    Traceback (most recent call last): File "unit_test_atenop.py", line 23, in test_numpy_T([3, 2, 5]) File "unit_test_atenop.py", line 20, in test_numpy_T ort_prediction = run_step(ort_model, ort_input) File "unit_test_atenop.py", line 16, in run_step prediction = model(input) File "/ort_aten_fb/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/ort_aten_fb/lib/python3.8/site-packages/torch_ort/ortinferencemodule/_utils_infer.py", line 98, in _forward return ortinferencemodule._forward_call(*inputs, **kwargs) File "/ort_aten_fb/lib/python3.8/site-packages/torch_ort/ortinferencemodule/ortinferencemodule.py", line 107, in _forward_call self._inference_session = onnxruntime.InferenceSession( File "/ort_aten_fb/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 347, in init self._create_inference_session(providers, provider_options, disabled_optimizers) File "/ort_aten_fb/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 386, in create_inference_session sess = C.InferenceSession(session_options, self.model_bytes, False, self.read_config_from_model) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Node (ATen_0) output arg (data) type inference failed.

    Tested with symbolic shape inference call from ORTModule(ref: symbolic_shape). Fails with Exception("Incomplete symbolic shape inference").

    opened by saipj 6
  • torch-ort cannot be installed on windows: onnxruntime-training not found

    torch-ort cannot be installed on windows: onnxruntime-training not found

    When running pip install torch-ort in a conda environment on Windows, I get the following error:

    ERROR: Could not find a version that satisfies the requirement onnxruntime-training (from versions: none) ERROR: No matching distribution found for onnxruntime-training

    However, if I run the same command in a conda environment in WSL, it works just fine. Other people could repro. Seems to be a Windows issue.

    opened by alpatots 5
  • Compatibility between ORTModule and DeepSpeed

    Compatibility between ORTModule and DeepSpeed

    Hi folks,

    I am recently working on validating distributed training features while using ORTModule, here are some incompatibilities that I found during some tests:

    [With DeepSpeed]

    • ZeRO Stage 1 and 2 work well
    • ZeRO Stage 3 ❌

    Warnings:

    /usr/local/lib/python3.8/dist-packages/onnxruntime/training/ortmodule/_io.py:558: 
    UserWarning: This model cannot be deep copied (or pickled), which is a required step for stateful models to be properly exported to ONNX. Compute will continue,  but  unexpected results may occur!  
    warnings.warn("This model cannot be deep copied (or pickled)  
    
    • BF16 ❌

    Error Message:

    RuntimeError: /onnxruntime_src/orttraining/orttraining/python/orttraining_pybind_state.cc:752
    onnxruntime::python::addObjectMethodsForTraining(pybind11::module&, onnxruntime::python::ExecutionProviderRegistrationFn)::<lambda(onnxruntime::training::OrtModuleGraphBuilder*, 
    const pybind11::bytes&, const onnxruntime::training::OrtModuleGraphBuilderConfiguration&)> 
    [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Type Error: Type 'tensor(bfloat16)' of input parameter
    (_original_module.distilbert.embeddings.word_embeddings.weight) of operator (ATen) in node (ATen_17) is invalid
    

    [With Fairscale]

    • Can only shard optimizer state

    Environment

    • OS: Ubuntu 20.04
    • CUDA/cuDNN version: 11.3/8
    • onnxruntime-training: 1.11.1+cu113
    • torch: 1.11.0+cu113
    • torch-ort: 1.11.1
    • Python version:3.8
    • GPU: A100

    I would like to confirm with you folks if these behaviors are intended? And concerning the compatibility with DeepSpeed stage 3 and BF16, would it be possible to have some insights on if it would be supported in the future?

    Thanks a lot!

    opened by JingyaHuang 6
Owner
null
Model summary in PyTorch similar to `model.summary()` in Keras

Keras style model.summary() in PyTorch Keras has a neat API to view the visualization of the model which is very helpful while debugging your network.

Shubham Chandel 3.7k Dec 29, 2022
Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Unofficial PyTorch implementation of DeepMind's Perceiver IO with PyTorch Lightning scripts for distributed training

Martin Krasser 251 Dec 25, 2022
High-level batteries-included neural network training library for Pytorch

Pywick High-Level Training framework for Pytorch Pywick is a high-level Pytorch training framework that aims to get you up and running quickly with st

null 382 Dec 6, 2022
Training RNNs as Fast as CNNs (https://arxiv.org/abs/1709.02755)

News SRU++, a new SRU variant, is released. [tech report] [blog] The experimental code and SRU++ implementation are available on the dev branch which

ASAPP Research 2.1k Jan 1, 2023
Tez is a super-simple and lightweight Trainer for PyTorch. It also comes with many utils that you can use to tackle over 90% of deep learning projects in PyTorch.

Tez: a simple pytorch trainer NOTE: Currently, we are not accepting any pull requests! All PRs will be closed. If you want a feature or something does

abhishek thakur 1.1k Jan 4, 2023
A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

A lightweight wrapper for PyTorch that provides a simple declarative API for context switching between devices, distributed modes, mixed-precision, and PyTorch extensions.

Fidelity Investments 56 Sep 13, 2022
A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

A PyTorch repo for data loading and utilities to be shared by the PyTorch domain libraries.

null 878 Dec 30, 2022
PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

PyTorch framework A simple and complete framework for PyTorch, providing a variety of data loading and simple task solutions that are easy to extend and migrate

Cong Cai 12 Dec 19, 2021
Pretrained ConvNets for pytorch: NASNet, ResNeXt, ResNet, InceptionV4, InceptionResnetV2, Xception, DPN, etc.

Pretrained models for Pytorch (Work in progress) The goal of this repo is: to help to reproduce research papers results (transfer learning setups for

Remi 8.7k Dec 31, 2022
torch-optimizer -- collection of optimizers for Pytorch

torch-optimizer torch-optimizer -- collection of optimizers for PyTorch compatible with optim module. Simple example import torch_optimizer as optim

Nikolay Novik 2.6k Jan 3, 2023
A PyTorch implementation of EfficientNet

EfficientNet PyTorch Quickstart Install with pip install efficientnet_pytorch and load a pretrained EfficientNet with: from efficientnet_pytorch impor

Luke Melas-Kyriazi 7.2k Jan 6, 2023
The easiest way to use deep metric learning in your application. Modular, flexible, and extensible. Written in PyTorch.

News March 3: v0.9.97 has various bug fixes and improvements: Bug fixes for NTXentLoss Efficiency improvement for AccuracyCalculator, by using torch i

Kevin Musgrave 5k Jan 2, 2023
A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch

Torchmeta A collection of extensions and data-loaders for few-shot learning & meta-learning in PyTorch. Torchmeta contains popular meta-learning bench

Tristan Deleu 1.7k Jan 6, 2023
PyTorch Extension Library of Optimized Scatter Operations

PyTorch Scatter Documentation This package consists of a small extension library of highly optimized sparse update (scatter and segment) operations fo

Matthias Fey 1.2k Jan 7, 2023
PyTorch Extension Library of Optimized Autograd Sparse Matrix Operations

PyTorch Sparse This package consists of a small extension library of optimized sparse matrix operations with autograd support. This package currently

Matthias Fey 757 Jan 4, 2023
Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

Phil Wang 1.8k Jan 6, 2023
PyTorch implementation of TabNet paper : https://arxiv.org/pdf/1908.07442.pdf

README TabNet : Attentive Interpretable Tabular Learning This is a pyTorch implementation of Tabnet (Arik, S. O., & Pfister, T. (2019). TabNet: Attent

DreamQuark 2k Dec 27, 2022
PyTorch extensions for fast R&D prototyping and Kaggle farming

Pytorch-toolbelt A pytorch-toolbelt is a Python library with a set of bells and whistles for PyTorch for fast R&D prototyping and Kaggle farming: What

Eugene Khvedchenya 1.3k Jan 5, 2023
An implementation of Performer, a linear attention-based transformer, in Pytorch

Performer - Pytorch An implementation of Performer, a linear attention-based transformer variant with a Fast Attention Via positive Orthogonal Random

Phil Wang 900 Dec 22, 2022