PyTorch/TorchScript compiler for NVIDIA GPUs using TensorRT

Overview

Torch-TensorRT

Documentation

Ahead of Time (AOT) compiling for PyTorch JIT

Torch-TensorRT is a compiler for PyTorch/TorchScript, targeting NVIDIA GPUs via NVIDIA's TensorRT Deep Learning Optimizer and Runtime. Unlike PyTorch's Just-In-Time (JIT) compiler, Torch-TensorRT is an Ahead-of-Time (AOT) compiler, meaning that before you deploy your TorchScript code, you go through an explicit compile step to convert a standard TorchScript program into an module targeting a TensorRT engine. Torch-TensorRT operates as a PyTorch extention and compiles modules that integrate into the JIT runtime seamlessly. After compilation using the optimized graph should feel no different than running a TorchScript module. You also have access to TensorRT's suite of configurations at compile time, so you are able to specify operating precision (FP32/FP16/INT8) and other settings for your module.

More Information / System Architecture:

Building a docker container for Torch-TensorRT

We provide a Dockerfile in docker/ directory. It expects a PyTorch NGC container as a base but can easily be modified to build on top of any container that provides, PyTorch, CUDA, cuDNN and TensorRT. The dependency libraries in the container can be found in the release notes.

Please follow this instruction to build a Docker container.

docker build --build-arg BASE=<CONTAINER VERSION e.g. 21.11> -f docker/Dockerfile -t torch_tensorrt:latest .

In the case of building on top of a custom base container, you first must determine the version of the PyTorch C++ ABI. If your source of PyTorch is pytorch.org, likely this is the pre-cxx11-abi in which case you must modify //docker/dist-build.sh to not build the C++11 ABI version of Torch-TensorRT.

You can then build the container using:

docker build --build-arg BASE_IMG=<IMAGE> -f docker/Dockerfile -t torch_tensorrt:latest .

If you would like to build outside a docker container, please follow the section Compiling Torch-TensorRT

Example Usage

C++

#include "torch/script.h"
#include "torch_tensorrt/torch_tensorrt.h"

...
// Set input datatypes. Allowerd options torch::{kFloat, kHalf, kChar, kInt32, kBool}
// Size of input_dtypes should match number of inputs to the network.
// If input_dtypes is not set, default precision follows traditional PyT / TRT rules
auto input = torch_tensorrt::Input(dims, torch::kHalf)
auto compile_settings = torch_tensorrt::ts::CompileSpec({input});
// FP16 execution
compile_settings.enabled_precisions = {torch::kHalf};
// Compile module
auto trt_mod = torch_tensorrt::ts::compile(ts_mod, compile_settings);
// Run like normal
auto results = trt_mod.forward({in_tensor});
// Save module for later
trt_mod.save("trt_torchscript_module.ts");
...

Python

import torch_tensorrt

...

trt_ts_module = torch_tensorrt.compile(torch_script_module,
    inputs = [example_tensor, # Provide example tensor for input shape or...
        torch_tensorrt.Input( # Specify input object with shape and dtype
            min_shape=[1, 3, 224, 224],
            opt_shape=[1, 3, 512, 512],
            max_shape=[1, 3, 1024, 1024],
            # For static size shape=[1, 3, 224, 224]
            dtype=torch.half) # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
    ],
    enabled_precisions = {torch.half}, # Run with FP16)

result = trt_ts_module(input_data) # run inference
torch.jit.save(trt_ts_module, "trt_torchscript_module.ts") # save the TRT embedded Torchscript

Notes on running in lower precisions:

  • Enabled lower precisions with compile_spec.enabled_precisions
  • The module should be left in FP32 before compilation (FP16 can support half tensor models)
  • Provided input tensors dtype should be the same as module before compilation, regardless of enabled_precisions. This can be overrided by setting Input::dtype

Platform Support

Platform Support
Linux AMD64 / GPU Supported
Linux aarch64 / GPU Native Compilation Supported on JetPack-4.4+
Linux aarch64 / DLA Native Compilation Supported on JetPack-4.4+
Windows / GPU Unofficial Support
Linux ppc64le / GPU -
NGC Containers Included in PyTorch NGC Containers 21.11+

Torch-TensorRT will be included in NVIDIA NGC containers (https://ngc.nvidia.com/catalog/containers/nvidia:pytorch) starting in 21.11.

Note: Refer NVIDIA NGC container(https://ngc.nvidia.com/catalog/containers/nvidia:l4t-pytorch) for PyTorch libraries on JetPack.

Dependencies

These are the following dependencies used to verify the testcases. Torch-TensorRT can work with other versions, but the tests are not guaranteed to pass.

  • Bazel 4.2.1
  • Libtorch 1.10.0 (built with CUDA 11.3)
  • CUDA 11.3 (10.2 on Jetson)
  • cuDNN 8.2
  • TensorRT 8.0.3.4 (TensorRT 8.0.1.6 on Jetson)

Prebuilt Binaries and Wheel files

Releases: https://github.com/NVIDIA/Torch-TensorRT/releases

Compiling Torch-TensorRT

Installing Dependencies

0. Install Bazel

If you don't have bazel installed, the easiest way is to install bazelisk using the method of you choosing https://github.com/bazelbuild/bazelisk

Otherwise you can use the following instructions to install binaries https://docs.bazel.build/versions/master/install.html

Finally if you need to compile from source (e.g. aarch64 until bazel distributes binaries for the architecture) you can use these instructions

export BAZEL_VERSION=<VERSION>
mkdir bazel
cd bazel
curl -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-dist.zip
unzip bazel-$BAZEL_VERSION-dist.zip
bash ./compile.sh

You need to start by having CUDA installed on the system, LibTorch will automatically be pulled for you by bazel, then you have two options.

1. Building using cuDNN & TensorRT tarball distributions

This is recommended so as to build Torch-TensorRT hermetically and insures any bugs are not caused by version issues

Make sure when running Torch-TensorRT that these versions of the libraries are prioritized in your $LD_LIBRARY_PATH

  1. You need to download the tarball distributions of TensorRT and cuDNN from the NVIDIA website.
  2. Place these files in a directory (the directories third_party/dist_dir/[x86_64-linux-gnu | aarch64-linux-gnu] exist for this purpose)
  3. Compile using:
bazel build //:libtorchtrt --compilation_mode opt --distdir third_party/dist_dir/[x86_64-linux-gnu | aarch64-linux-gnu]

2. Building using locally installed cuDNN & TensorRT

If you find bugs and you compiled using this method please disclose you used this method in the issue (an ldd dump would be nice too)

  1. Install TensorRT, CUDA and cuDNN on the system before starting to compile.
  2. In WORKSPACE comment out
",], build_file = "@//third_party/cudnn/archive:BUILD", sha256 = " ", strip_prefix = "cuda" ) http_archive( name = "tensorrt", urls = [" ",], build_file = "@//third_party/tensorrt/archive:BUILD", sha256 = " ", strip_prefix = "TensorRT- " ) ">
# Downloaded distributions to use with --distdir
http_archive(
    name = "cudnn",
    urls = ["
       
        "
       ,],

    build_file = "@//third_party/cudnn/archive:BUILD",
    sha256 = "
       
        "
       ,
    strip_prefix = "cuda"
)

http_archive(
    name = "tensorrt",
    urls = ["
       
        "
       ,],

    build_file = "@//third_party/tensorrt/archive:BUILD",
    sha256 = "
       
        "
       ,
    strip_prefix = "TensorRT-
       
        "
       
)

and uncomment

# Locally installed dependencies
new_local_repository(
    name = "cudnn",
    path = "/usr/",
    build_file = "@//third_party/cudnn/local:BUILD"
)

new_local_repository(
   name = "tensorrt",
   path = "/usr/",
   build_file = "@//third_party/tensorrt/local:BUILD"
)
  1. Compile using:
bazel build //:libtorchtrt --compilation_mode opt

Debug build

bazel build //:libtorchtrt --compilation_mode=dbg

Native compilation on NVIDIA Jetson AGX

We performed end to end testing on Jetson platform using Jetpack SDK 4.6.

bazel build //:libtorchtrt --platforms //toolchains:jetpack_4.6

Note: Please refer installation instructions for Pre-requisites

A tarball with the include files and library can then be found in bazel-bin

Running Torch-TensorRT on a JIT Graph

Make sure to add LibTorch to your LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(pwd)/bazel-Torch-TensorRT/external/libtorch/lib

bazel run //cpp/bin/torchtrtc -- $(realpath <PATH TO GRAPH>) out.ts <input-size>

Compiling the Python Package

To compile the python package for your local machine, just run python3 setup.py install in the //py directory. To build wheel files for different python versions, first build the Dockerfile in //py then run the following command

docker run -it -v$(pwd)/..:/workspace/Torch-TensorRT build_torch_tensorrt_wheel /bin/bash /workspace/Torch-TensorRT/py/build_whl.sh

Python compilation expects using the tarball based compilation strategy from above.

How do I add support for a new op...

In Torch-TensorRT?

Thanks for wanting to contribute! There are two main ways to handle supporting a new op. Either you can write a converter for the op from scratch and register it in the NodeConverterRegistry or if you can map the op to a set of ops that already have converters you can write a graph rewrite pass which will replace your new op with an equivalent subgraph of supported ops. Its preferred to use graph rewriting because then we do not need to maintain a large library of op converters. Also do look at the various op support trackers in the issues for information on the support status of various operators.

In my application?

The Node Converter Registry is not exposed in the top level API but in the internal headers shipped with the tarball.

You can register a converter for your op using the NodeConverterRegistry inside your application.

Structure of the repo

Component Description
core Main JIT ingest, lowering, conversion and runtime implementations
cpp C++ API and CLI source
examples Example applications to show different features of Torch-TensorRT
py Python API for Torch-TensorRT
tests Unit tests for Torch-TensorRT

Contributing

Take a look at the CONTRIBUTING.md

License

The Torch-TensorRT license can be found in the LICENSE file. It is licensed with a BSD Style licence

Comments
  • 🐛 [Bug] Torch Tensor RT crash when trying to compile a script module on Windows (C++)

    🐛 [Bug] Torch Tensor RT crash when trying to compile a script module on Windows (C++)

    Bug Description

    I can't compile a script module with TorchTensorRT. This is my code:

    #include <iostream>
    #include <vector>
    #include <ATen/Context.h>
    #include <torch/torch.h>
    #include <torch/script.h>
    #include "torch_tensorrt/torch_tensorrt.h"
    
    void compile(std::string model_path) {
    
        const torch::Device device = torch::Device(torch::kCUDA, 0);
        torch::jit::script::Module model;
    
        std::cout << "Trying to load the model" << std::endl;
        try {
            model = torch::jit::load(model_path, device);
            model.to(device);
            model.eval();
            std::cout << "AI model loaded successfully." << std::endl;
        }
        catch (const c10::Error& e) {
            std::cerr << e.what() << std::endl;
        }
    
        auto input = torch_tensorrt::Input(std::vector<int64_t>{ 1, 3, 512, 512 });
        std::cout << "Creating compile settings" << std::endl;
        auto compile_settings = torch_tensorrt::ts::CompileSpec({ input });
        // Compile module
        std::cout << "Compiling..." << std::endl;
        auto trt_mod = torch_tensorrt::ts::compile(model, compile_settings);  <-- CRASHES HERE.
        // Run like normal
        std::cout << "Create tensor" << std::endl;
        auto in = torch::randn({ 1, 3, 512, 512 }, device);
        std::cout << "Forward pass..." << std::endl;
        auto results = trt_mod.forward({ in });
        // Save module for later
        trt_mod.save("output/model/path.ts");
    
    }
    
    int main() {
    
        compile("path/to/traced_script_module.pt");
    
        return 0;
    }
    

    This is the error I get:

    image

    First a WARNING get printed "WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected", and then, as you can see from the screenshot, I got an exception "read access violation. creator was nullptr." when running the following lines:

    auto creator = getPluginRegistry()->getPluginCreator("Interpolate", "1", "torch_tensorrt");
    auto interpolate_plugin = creator->createPlugin(name, &fc);
    

    The file interpolate.cpp is located at path/to/Torch-TensorRT/core/conversion/converters/impl. What am I doing wrong?

    This is my CMakeLists.txt:

    cmake_minimum_required (VERSION 3.8)
    
    project(example-app)
    
    find_package(Torch REQUIRED)
    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${TORCH_CXX_FLAGS}")
    
    add_executable(example-app create_trt_module.cpp)
    
    target_include_directories(example-app PRIVATE "path/to/Torch-TensorRT/cpp/include")
    
    target_link_libraries(example-app "${TORCH_LIBRARIES}")
    target_link_libraries(example-app  path/to/Torch-TensorRT/out/build/x64-Release/lib/torchtrt.lib) 
    target_link_libraries(example-app  path/to/Torch-TensorRT/out/build/x64-Release/lib/torchtrt_plugins.lib)
    

    I exported the traced script module with the following code:

    # Import model
    # ...
    model.to("cuda")
    model.eval()
    
    # Create dummy data for tracing and benchmarking purposes.
    shape = (1, 3, 512, 512)
    input_data = torch.randn(shape).to("cuda")
    
    # Convert model to script module
    print("Tracing PyTorch model...")
    traced_script_module = torch.jit.trace(model, input_data)
    torch.jit.save(traced_script_module, "traced_script_module.pt")
    

    Environment

    • Torch-TensorRT Version: built from source from this branch (that is currently being merged) #1058
    • TensorRT Version: 8.4.1.5
    • CUDNN: 8.3.1
    • CPU Architecture: x86-64
    • OS : Windows 11
    • LIbtorch: 1.11.0
    • CUDA version: 11.5.2
    • GPU model: NVIDIA RTX 3080 Mobile
    bug No Activity channel: windows 
    opened by andreabonvini 31
  • Fixes for CI pipeline pre-cxx11 pipeline

    Fixes for CI pipeline pre-cxx11 pipeline

    Description

    Added fixes required for CI pipeline

    Fixes # (issue)

    Type of change

    Please delete options that are not relevant and/or add your own.

    • New feature (non-breaking change which adds functionality)

    Checklist:

    • [x] My code follows the style guidelines of this project (You can use the linters)
    • [x] I have performed a self-review of my own code
    • [ ] I have commented my code, particularly in hard-to-understand areas and hacks
    • [ ] I have made corresponding changes to the documentation
    • [ ] I have added tests to verify my fix or my feature
    • [x] New and existing unit tests pass locally with my changes
    component: tests component: api [Python] 
    opened by andi4191 30
  • 🐛 [Bug] Running a model that returns a tuple or list of size 2 or greater causes segfault

    🐛 [Bug] Running a model that returns a tuple or list of size 2 or greater causes segfault

    Bug Description

    Running a model that returns a tuple (or list) of two or more tensors causes the program to segfault. Note that this does NOT occur when returning a tuple of only one tensor.

    Returning a list of only one tensor will throw the following (presumably correct) error:

    Traceback (most recent call last):                                                                                                
      File "/home/chaoz/av/experimental/chaoz/trtorch/tuple.py", line 34, in <module>                                                                               
        model_trt = torchtrt.compile(                                                                 
      File "/home/chaoz/.anaconda3/envs/trtorch/lib/python3.9/site-packages/torch_tensorrt/_compile.py", line 115, in compile
        return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)      
      File "/home/chaoz/.anaconda3/envs/trtorch/lib/python3.9/site-packages/torch_tensorrt/ts/_compiler.py", line 116, in compile
        compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))                      
    RuntimeError: [Error thrown at core/conversion/conversion.cpp:220] List type. Only a single tensor or a TensorList type is supported.   
    

    To Reproduce

    Run the following (returning input directly to output):

      import torch    
      import torch_tensorrt as torchtrt    
          
      import torch_tensorrt.logging as torchtrt_logging    
          
      torchtrt_logging.set_reportable_log_level(torchtrt_logging.Level.Graph)    
          
      torch.manual_seed(0)    
          
      DEVICE = torch.device("cuda:0")    
      SHAPE = (3, 4)    
          
          
      class Model(torch.nn.Module):    
          def __init__(self):    
              super().__init__()    
          
          def forward(self, x):    
              y = x - x    
              z = x + x    
              return (x, z)    
          
          
      if __name__ == "__main__":    
          tensor = torch.randn(SHAPE, dtype=torch.float32, device=DEVICE)    
          
          model = Model().eval().to(DEVICE)    
          out = model(tensor)    
          print(out)    
          
          model_trt = torchtrt.compile(    
              model,    
              inputs=[    
                  torchtrt.Input(shape=SHAPE),    
              ],    
              enabled_precisions={torch.float},    
          )    
          out_trt = model(tensor)    
          print(out_trt)    
          
          assert torch.max(torch.abs(out - out_trt)) < 1e-6    
    
    

    This produces the following output (using Info level logging):

    (trtorch) ~/av/experimental/chaoz/trtorch (chaoz/torch-tensorrt-modules) $ python tuple.py
    (tensor([[-0.9247, -0.4253, -2.6438,  0.1452],
            [-0.1209, -0.5797, -0.6229, -0.3284],
            [-1.0745, -0.3631, -1.6711,  2.2655]], device='cuda:0'), tensor([[-1.8493, -0.8507, -5.2877,  0.2904],
            [-0.2417, -1.1595, -1.2457, -0.6568],
            [-2.1491, -0.7263, -3.3421,  4.5310]], device='cuda:0'))
    INFO: [Torch-TensorRT] - ir was set to default, using TorchScript as ir
    INFO: [Torch-TensorRT] - Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript
    INFO: [Torch-TensorRT] - Lowered Graph: graph(%x.1 : Tensor):
      %2 : int = prim::Constant[value=1]()
      %z.1 : Tensor = aten::add(%x.1, %x.1, %2) # /home/chaoz/av/experimental/chaoz/trtorch/tuple.py:23:12
      return (%x.1, %z.1)
    
    Segmentation fault (core dumped)
    

    Run the following (all outputs have been operated on):

      import torch
      import torch_tensorrt as torchtrt
      
      
      import torch_tensorrt.logging as torchtrt_logging
      
      torchtrt_logging.set_reportable_log_level(torchtrt_logging.Level.Info)
      
      torch.manual_seed(0)
      
      DEVICE = torch.device("cuda:0")
      SHAPE = (3, 4)
      
      
      class Model(torch.nn.Module):
          def __init__(self):
              super().__init__()
      
          def forward(self, x):
              y = x - x
              z = x + x
              return (y, z)
      
      
      if __name__ == "__main__":
          tensor = torch.randn(SHAPE, dtype=torch.float32, device=DEVICE)
      
          model = Model().eval().to(DEVICE)
          out = model(tensor)
          print(out)
      
          model_trt = torchtrt.compile(
              model,
              inputs=[
                  torchtrt.Input(shape=SHAPE),
              ],
              enabled_precisions={torch.float},
          )
          out_trt = model(tensor)
          print(out_trt)
      
          assert torch.max(torch.abs(out - out_trt)) < 1e-6
    

    This produces the following output (using Info level logging):

    (trtorch-1.0) ~/av/experimental/chaoz/trtorch (chaoz/torch-tensorrt-modules) $ python tuple.py
    (tensor([[0., 0., 0., 0.],
            [0., 0., 0., 0.],
            [0., 0., 0., 0.]], device='cuda:0'), tensor([[-1.8493, -0.8507, -5.2877,  0.2904],
            [-0.2417, -1.1595, -1.2457, -0.6568],
            [-2.1491, -0.7263, -3.3421,  4.5310]], device='cuda:0'))
    INFO: [Torch-TensorRT] - ir was set to default, using TorchScript as ir
    INFO: [Torch-TensorRT] - Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript
    INFO: [Torch-TensorRT] - Lowered Graph: graph(%x.1 : Tensor):
      %2 : int = prim::Constant[value=1]()
      %y.1 : Tensor = aten::sub(%x.1, %x.1, %2) # /home/chaoz/av/experimental/chaoz/trtorch/tuple.py:23:12
      %z.1 : Tensor = aten::add(%x.1, %x.1, %2) # /home/chaoz/av/experimental/chaoz/trtorch/tuple.py:24:12
      return (%y.1, %z.1)
    
    Segmentation fault (core dumped)
    

    Expected behavior

    We expect the run to either complete, or error out because the tuple type is unsupported.

    Environment

    Build information about Torch-TensorRT can be found by turning on debug messages

    • Torch-TensorRT Version (e.g. 1.0.0): Master commit ef62f6bf26e1e282eccfb8f04e38a2b22558420b (latest as of this writing); also appears in 1.0 release.
    • PyTorch Version (e.g. 1.0): 1.10.2
    • CPU Architecture: x86-64
    • OS (e.g., Linux): Ubuntu 18.04
    • How you installed PyTorch (conda, pip, libtorch, source): conda
    • Build command you used (if compiling from source):
    • Are you using local sources or building from archives: local sources
    • Python version: 3.9
    • CUDA version: 11.6
    • GPU models and configuration: NVIDIA A10
    • Any other relevant information: Driver Version: 510.47.03 TRT Version: 8.2.3.0 cuDNN Version: 8.3.2.44

    Additional context

    This issue is very high priority for us, as all of our models return multiple tensors and at minimum will use tuples/lists to do so.

    bug 
    opened by chaoz-dev 28
  • ❓ [Question] Is it possibile to use a model optimized through TorchTensorRT in LibTorch under Windows?

    ❓ [Question] Is it possibile to use a model optimized through TorchTensorRT in LibTorch under Windows?

    ❓ Question

    I would need to optimize an already trained segmentation model through TorchTensorRT, the idea would be to optimize the model by running the newest PyTorch NGC docker image under WSL2, exporting the model and then loading it in a C++ application that uses LibTorch, e.g.

    #include <torch/script.h>
    // ...
    torch::jit::script::Module module;
    try {
      // Deserialize the ScriptModule from a file using torch::jit::load().
      module = torch::jit::load(argv[1]);
    }
    

    Would this be the right approach?

    What you have already tried

    At the moment I only tried to optimize the model through TorchTensorRT, and something weird happens. Here I'll show the results for the Python script below that I obtained on two different devices:

    • a Ubuntu desktop with a GTX1080Ti (that I use for development)
    • a Windows PC with a RTX3080 (that is my target device)

    As you can see, the optimization process under WSL gives me a lot of GPU errors, while on Ubuntu it seems to work fine. Why does this happen?

    My script:

    import torch_tensorrt
    import yaml
    import torch
    import os
    import time
    import numpy as np
    import torch.backends.cudnn as cudnn
    import argparse
    import segmentation_models_pytorch as smp
    import pytorch_lightning as pl
    cudnn.benchmark = True
    
    def benchmark(model, input_shape=(1, 3, 512, 512), dtype=torch.float, nwarmup=50, nruns=1000):
        input_data = torch.randn(input_shape)
        input_data = input_data.to("cuda")
        if dtype==torch.half:
            input_data = input_data.half()
            
        print("Warm up ...")
        with torch.no_grad():
            for _ in range(nwarmup):
                features = model(input_data)
        torch.cuda.synchronize()
        print("Start timing ...")
        timings = []
        with torch.no_grad():
            for i in range(1, nruns+1):
                start_time = time.time()
                features = model(input_data)
                torch.cuda.synchronize()
                end_time = time.time()
                timings.append(end_time - start_time)
                if i%100==0:
                    print('Iteration %d/%d, ave batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))
    
        print("Input shape:", input_data.size())
        print("Output features size:", features.size())
        
        print('Average batch time: %.2f ms'%(np.mean(timings)*1000))
        
    def load_config(config_path: str):
        with open(config_path) as f:
            config = yaml.load(f, Loader=yaml.FullLoader)
        return config
        
        
        
    def main():
        # Load target model
        parser = argparse.ArgumentParser()
        parser.add_argument("weights_path")
        parser.add_argument("config_path")
        args = parser.parse_args()
        config = load_config(args.config_path)
        model_dict = config["model"]
        model_dict["activation"] = "softmax2d"
        model = smp.create_model(**model_dict)
        state_dict = torch.load(args.weights_path)["state_dict"]
        model.load_state_dict(state_dict)
        model.to("cuda")
        model.eval()
        # Create dummy data for tracing and benchmarking purposes.
        dtype = torch.float32
        shape = (1, 3, 512, 512)
        input_data = torch.randn(shape).to("cuda")
        
        # Convert model to script module
        print("Tracing PyTorch model...")
        traced_script_module = torch.jit.trace(model, input_data)
        # torch_script_module = torch.jit.load(model_path).cuda()
        print("Script Module generated.")
        print("\nBenchmarking Script Module...")
        # First benchmark <===================================
        benchmark(traced_script_module, shape, dtype)
        
        
        # Convert to TRT Module...
        output_path = args.config_path.split(os.path.sep)[-1] + "_trt_.pt"
        print("Creating TRT module...")
        trt_ts_module = torch_tensorrt.compile(
            traced_script_module,
            inputs = [
                torch_tensorrt.Input( # Specify input object with shape and dtype
                    shape=shape,
                    dtype=dtype) # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
            ],
            enabled_precisions = {dtype},
          )
        print("TRT Module created")
        print("\nBenchmarking TRT Module...")
        benchmark(trt_ts_module, shape, dtype)
        torch.jit.save(trt_ts_module, os.path.join("models",output_path)) # save the TRT embedded Torchscript
        
    if __name__ == "__main__":
        main()
        
    

    Ubuntu desktop

    root@ca10ddc496a3:/DockerStuff# python script.py path/to/checkout.tar path/to/config.yaml
    No pretrained weights exist for this model. Using random initialization.
    Tracing PyTorch model...
    /opt/conda/lib/python3.8/site-packages/segmentation_models_pytorch/base/model.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
      if h % output_stride != 0 or w % output_stride != 0:
    Script Module generated.
    
    Benchmarking Script Module...
    Warm up ...
    Start timing ...
    Iteration 100/1000, ave batch time 7.00 ms
    Iteration 200/1000, ave batch time 6.88 ms
    Iteration 300/1000, ave batch time 6.76 ms
    Iteration 400/1000, ave batch time 6.91 ms
    Iteration 500/1000, ave batch time 6.93 ms
    Iteration 600/1000, ave batch time 6.98 ms
    Iteration 700/1000, ave batch time 6.99 ms
    Iteration 800/1000, ave batch time 6.91 ms
    Iteration 900/1000, ave batch time 6.89 ms
    Iteration 1000/1000, ave batch time 6.87 ms
    Input shape: torch.Size([1, 3, 512, 512])
    Output features size: torch.Size([1, 3, 512, 512])
    Average batch time: 6.87 ms
    Creating TRT module...
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
    WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
    WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
    [1, 256, 128, 128]
    [1, 256, 128, 128]
    WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
    [1, 3, 512, 512]
    [1, 3, 512, 512]
    [1, 256, 128, 128]
    [1, 3, 512, 512]
    [1, 256, 128, 128]
    [1, 3, 512, 512]
    [1, 256, 128, 128]
    [1, 3, 512, 512]
    [1, 256, 128, 128]
    [1, 3, 512, 512]
    TRT Module created
    
    Benchmarking TRT Module...
    Warm up ...
    Start timing ...
    Iteration 100/1000, ave batch time 3.29 ms
    Iteration 200/1000, ave batch time 3.30 ms
    Iteration 300/1000, ave batch time 3.30 ms
    Iteration 400/1000, ave batch time 3.30 ms
    Iteration 500/1000, ave batch time 3.31 ms
    Iteration 600/1000, ave batch time 3.30 ms
    Iteration 700/1000, ave batch time 3.30 ms
    Iteration 800/1000, ave batch time 3.30 ms
    Iteration 900/1000, ave batch time 3.30 ms
    Iteration 1000/1000, ave batch time 3.30 ms
    Input shape: torch.Size([1, 3, 512, 512])
    Output features size: torch.Size([1, 3, 512, 512])
    Average batch time: 3.30 ms
    

    Windows PC

    root@3130ab7d9ff8:/DockerStuff# python script.py path/to/checkout.tar path/to/config.yaml
    No pretrained weights exist for this model. Using random initialization.
    Tracing PyTorch model...
    /opt/conda/lib/python3.8/site-packages/segmentation_models_pytorch/base/model.py:16: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
      if h % output_stride != 0 or w % output_stride != 0:
    Script Module generated.
    
    Benchmarking Script Module...
    Warm up ...
    Start timing ...
    Iteration 100/1000, ave batch time 3.21 ms
    Iteration 200/1000, ave batch time 3.18 ms
    Iteration 300/1000, ave batch time 3.17 ms
    Iteration 400/1000, ave batch time 3.17 ms
    Iteration 500/1000, ave batch time 3.16 ms
    Iteration 600/1000, ave batch time 3.16 ms
    Iteration 700/1000, ave batch time 3.16 ms
    Iteration 800/1000, ave batch time 3.16 ms
    Iteration 900/1000, ave batch time 3.16 ms
    Iteration 1000/1000, ave batch time 3.15 ms
    Input shape: torch.Size([1, 3, 512, 512])
    Output features size: torch.Size([1, 3, 512, 512])
    Average batch time: 3.15 ms
    Creating TRT module...
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - Mean converter disregards dtype
    WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
    WARNING: [Torch-TensorRT] - There may be undefined behavior using dynamic shape and aten::size
    WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
    [1, 256, 128, 128]
    [1, 256, 128, 128]
    WARNING: [Torch-TensorRT] - Interpolation layer will be run through ATen, not TensorRT. Performance may be lower than expected
    [1, 3, 512, 512]
    [1, 3, 512, 512]
    [1, 256, 128, 128]
    [1, 3, 512, 512]
    [1, 256, 128, 128]
    [1, 3, 512, 512]
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.17 : Tensor = aten::_convolution(%1217, %self.encoder.model.blocks.1.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.19 : Tensor = aten::batch_norm(%input.17, %self.encoder.model.blocks.1.0.bn1.weight, %self.encoder.model.blocks.1.0.bn1.bias, %self.encoder.model.blocks.1.0.bn1.running_mean, %self.encoder.model.blocks.1.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1220 : Tensor = aten::relu(%input.19), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.29 : Tensor = aten::_convolution(%1223, %self.encoder.model.blocks.1.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.31 : Tensor = aten::batch_norm(%input.29, %self.encoder.model.blocks.1.0.bn3.weight, %self.encoder.model.blocks.1.0.bn3.bias, %self.encoder.model.blocks.1.0.bn3.running_mean, %self.encoder.model.blocks.1.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.1/__module.encoder.model.blocks.1.0/__module.encoder.model.blocks.1.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.33 : Tensor = aten::_convolution(%input.31, %self.encoder.model.blocks.2.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.35 : Tensor = aten::batch_norm(%input.33, %self.encoder.model.blocks.2.0.bn1.weight, %self.encoder.model.blocks.2.0.bn1.bias, %self.encoder.model.blocks.2.0.bn1.running_mean, %self.encoder.model.blocks.2.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1228 : Tensor = aten::relu(%input.35), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 || %input.369 : Tensor = aten::_convolution(%input.31, %self.decoder.block1.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.371 : Tensor = aten::batch_norm(%input.369, %self.decoder.block1.1.weight, %self.decoder.block1.1.bias, %self.decoder.block1.1.running_mean, %self.decoder.block1.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %high_res_features : Tensor = aten::relu(%input.371), scope: __module.decoder/__module.decoder.block1/__module.decoder.block1.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.45 : Tensor = aten::_convolution(%1231, %self.encoder.model.blocks.2.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.47 : Tensor = aten::batch_norm(%input.45, %self.encoder.model.blocks.2.0.bn3.weight, %self.encoder.model.blocks.2.0.bn3.bias, %self.encoder.model.blocks.2.0.bn3.running_mean, %self.encoder.model.blocks.2.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.0/__module.encoder.model.blocks.2.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.49 : Tensor = aten::_convolution(%input.47, %self.encoder.model.blocks.2.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.51 : Tensor = aten::batch_norm(%input.49, %self.encoder.model.blocks.2.1.bn1.weight, %self.encoder.model.blocks.2.1.bn1.bias, %self.encoder.model.blocks.2.1.bn1.running_mean, %self.encoder.model.blocks.2.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1236 : Tensor = aten::relu(%input.51), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.2/__module.encoder.model.blocks.2.1/__module.encoder.model.blocks.2.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.65 : Tensor = aten::_convolution(%1242, %self.encoder.model.blocks.3.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.67 : Tensor = aten::batch_norm(%input.65, %self.encoder.model.blocks.3.0.bn1.weight, %self.encoder.model.blocks.3.0.bn1.bias, %self.encoder.model.blocks.3.0.bn1.running_mean, %self.encoder.model.blocks.3.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1245 : Tensor = aten::relu(%input.67), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.85 : Tensor = aten::_convolution(%input.83, %self.encoder.model.blocks.3.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.87 : Tensor = aten::batch_norm(%input.85, %self.encoder.model.blocks.3.0.bn3.weight, %self.encoder.model.blocks.3.0.bn3.bias, %self.encoder.model.blocks.3.0.bn3.running_mean, %self.encoder.model.blocks.3.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.0/__module.encoder.model.blocks.3.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.89 : Tensor = aten::_convolution(%input.87, %self.encoder.model.blocks.3.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.91 : Tensor = aten::batch_norm(%input.89, %self.encoder.model.blocks.3.1.bn1.weight, %self.encoder.model.blocks.3.1.bn1.bias, %self.encoder.model.blocks.3.1.bn1.running_mean, %self.encoder.model.blocks.3.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1259 : Tensor = aten::relu(%input.91), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.1/__module.encoder.model.blocks.3.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.113 : Tensor = aten::_convolution(%1271, %self.encoder.model.blocks.3.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.115 : Tensor = aten::batch_norm(%input.113, %self.encoder.model.blocks.3.2.bn1.weight, %self.encoder.model.blocks.3.2.bn1.bias, %self.encoder.model.blocks.3.2.bn1.running_mean, %self.encoder.model.blocks.3.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1274 : Tensor = aten::relu(%input.115), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.2/__module.encoder.model.blocks.3.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.137 : Tensor = aten::_convolution(%1286, %self.encoder.model.blocks.3.3.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.139 : Tensor = aten::batch_norm(%input.137, %self.encoder.model.blocks.3.3.bn1.weight, %self.encoder.model.blocks.3.3.bn1.bias, %self.encoder.model.blocks.3.3.bn1.running_mean, %self.encoder.model.blocks.3.3.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1289 : Tensor = aten::relu(%input.139), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.3/__module.encoder.model.blocks.3.3/__module.encoder.model.blocks.3.3.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.161 : Tensor = aten::_convolution(%1301, %self.encoder.model.blocks.4.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.163 : Tensor = aten::batch_norm(%input.161, %self.encoder.model.blocks.4.0.bn1.weight, %self.encoder.model.blocks.4.0.bn1.bias, %self.encoder.model.blocks.4.0.bn1.running_mean, %self.encoder.model.blocks.4.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1304 : Tensor = aten::relu(%input.163), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.0/__module.encoder.model.blocks.4.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.185 : Tensor = aten::_convolution(%1316, %self.encoder.model.blocks.4.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.187 : Tensor = aten::batch_norm(%input.185, %self.encoder.model.blocks.4.1.bn1.weight, %self.encoder.model.blocks.4.1.bn1.bias, %self.encoder.model.blocks.4.1.bn1.running_mean, %self.encoder.model.blocks.4.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1319 : Tensor = aten::relu(%input.187), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.1/__module.encoder.model.blocks.4.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.209 : Tensor = aten::_convolution(%1331, %self.encoder.model.blocks.4.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.211 : Tensor = aten::batch_norm(%input.209, %self.encoder.model.blocks.4.2.bn1.weight, %self.encoder.model.blocks.4.2.bn1.bias, %self.encoder.model.blocks.4.2.bn1.running_mean, %self.encoder.model.blocks.4.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1334 : Tensor = aten::relu(%input.211), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.4/__module.encoder.model.blocks.4.2/__module.encoder.model.blocks.4.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.233 : Tensor = aten::_convolution(%1346, %self.encoder.model.blocks.5.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.235 : Tensor = aten::batch_norm(%input.233, %self.encoder.model.blocks.5.0.bn1.weight, %self.encoder.model.blocks.5.0.bn1.bias, %self.encoder.model.blocks.5.0.bn1.running_mean, %self.encoder.model.blocks.5.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1349 : Tensor = aten::relu(%input.235), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.253 : Tensor = aten::_convolution(%input.251, %self.encoder.model.blocks.5.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.255 : Tensor = aten::batch_norm(%input.253, %self.encoder.model.blocks.5.0.bn3.weight, %self.encoder.model.blocks.5.0.bn3.bias, %self.encoder.model.blocks.5.0.bn3.running_mean, %self.encoder.model.blocks.5.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.0/__module.encoder.model.blocks.5.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.257 : Tensor = aten::_convolution(%input.255, %self.encoder.model.blocks.5.1.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.259 : Tensor = aten::batch_norm(%input.257, %self.encoder.model.blocks.5.1.bn1.weight, %self.encoder.model.blocks.5.1.bn1.bias, %self.encoder.model.blocks.5.1.bn1.running_mean, %self.encoder.model.blocks.5.1.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1363 : Tensor = aten::relu(%input.259), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.1/__module.encoder.model.blocks.5.1.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.281 : Tensor = aten::_convolution(%1375, %self.encoder.model.blocks.5.2.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.283 : Tensor = aten::batch_norm(%input.281, %self.encoder.model.blocks.5.2.bn1.weight, %self.encoder.model.blocks.5.2.bn1.bias, %self.encoder.model.blocks.5.2.bn1.running_mean, %self.encoder.model.blocks.5.2.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1378 : Tensor = aten::relu(%input.283), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.5/__module.encoder.model.blocks.5.2/__module.encoder.model.blocks.5.2.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.305 : Tensor = aten::_convolution(%1390, %self.encoder.model.blocks.6.0.conv_pw.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.conv_pw # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.307 : Tensor = aten::batch_norm(%input.305, %self.encoder.model.blocks.6.0.bn1.weight, %self.encoder.model.blocks.6.0.bn1.bias, %self.encoder.model.blocks.6.0.bn1.running_mean, %self.encoder.model.blocks.6.0.bn1.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.bn1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1393 : Tensor = aten::relu(%input.307), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.act1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1393:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.317 : Tensor = aten::_convolution(%1396, %self.encoder.model.blocks.6.0.conv_pwl.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.conv_pwl # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.319 : Tensor = aten::batch_norm(%input.317, %self.encoder.model.blocks.6.0.bn3.weight, %self.encoder.model.blocks.6.0.bn3.bias, %self.encoder.model.blocks.6.0.bn3.running_mean, %self.encoder.model.blocks.6.0.bn3.running_var, %870, %878, %879, %873), scope: __module.encoder/__module.encoder.model/__module.encoder.model.blocks.6/__module.encoder.model.blocks.6.0/__module.encoder.model.blocks.6.0.bn3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.321 : Tensor = aten::_convolution(%input.319, %self.decoder.aspp.0.convs.0.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.323 : Tensor = aten::batch_norm(%input.321, %self.decoder.aspp.0.convs.0.1.weight, %self.decoder.aspp.0.convs.0.1.bias, %self.decoder.aspp.0.convs.0.1.running_mean, %self.decoder.aspp.0.convs.0.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1401 : Tensor = aten::relu(%input.323), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.0/__module.decoder.aspp.0.convs.0.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.327 : Tensor = aten::_convolution(%input.325, %self.decoder.aspp.0.convs.1.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.0/__module.decoder.aspp.0.convs.1.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.329 : Tensor = aten::batch_norm(%input.327, %self.decoder.aspp.0.convs.1.1.weight, %self.decoder.aspp.0.convs.1.1.bias, %self.decoder.aspp.0.convs.1.1.running_mean, %self.decoder.aspp.0.convs.1.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1405 : Tensor = aten::relu(%input.329), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.1/__module.decoder.aspp.0.convs.1.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.333 : Tensor = aten::_convolution(%input.331, %self.decoder.aspp.0.convs.2.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.0/__module.decoder.aspp.0.convs.2.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.335 : Tensor = aten::batch_norm(%input.333, %self.decoder.aspp.0.convs.2.1.weight, %self.decoder.aspp.0.convs.2.1.bias, %self.decoder.aspp.0.convs.2.1.running_mean, %self.decoder.aspp.0.convs.2.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1409 : Tensor = aten::relu(%input.335), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.2/__module.decoder.aspp.0.convs.2.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.339 : Tensor = aten::_convolution(%input.337, %self.decoder.aspp.0.convs.3.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.0/__module.decoder.aspp.0.convs.3.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.341 : Tensor = aten::batch_norm(%input.339, %self.decoder.aspp.0.convs.3.1.weight, %self.decoder.aspp.0.convs.3.1.bias, %self.decoder.aspp.0.convs.3.1.running_mean, %self.decoder.aspp.0.convs.3.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %1413 : Tensor = aten::relu(%input.341), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.convs.3/__module.decoder.aspp.0.convs.3.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.353 : Tensor = aten::_convolution(%input.351, %self.decoder.aspp.0.project.0.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.355 : Tensor = aten::batch_norm(%input.353, %self.decoder.aspp.0.project.1.weight, %self.decoder.aspp.0.project.1.bias, %self.decoder.aspp.0.project.1.running_mean, %self.decoder.aspp.0.project.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.357 : Tensor = aten::relu(%input.355), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.0/__module.decoder.aspp.0.project/__module.decoder.aspp.0.project.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.363 : Tensor = aten::_convolution(%input.361, %self.decoder.aspp.1.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.1/__module.decoder.aspp.1.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.365 : Tensor = aten::batch_norm(%input.363, %self.decoder.aspp.2.weight, %self.decoder.aspp.2.bias, %self.decoder.aspp.2.running_mean, %self.decoder.aspp.2.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.367 : Tensor = aten::relu(%input.365), scope: __module.decoder/__module.decoder.aspp/__module.decoder.aspp.3 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.377 : Tensor = aten::_convolution(%input.375, %self.decoder.block2.0.1.weight, %23, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.0/__module.decoder.block2.0.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 + %input.379 : Tensor = aten::batch_norm(%input.377, %self.decoder.block2.1.weight, %self.decoder.block2.1.bias, %self.decoder.block2.1.running_mean, %self.decoder.block2.1.running_var, %870, %878, %879, %873), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.1 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:2381:0 + %input.381 : Tensor = aten::relu(%input.379), scope: __module.decoder/__module.decoder.block2/__module.decoder.block2.2 # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:1395:0 : invalid argument
    WARNING: [Torch-TensorRT TorchScript Conversion Context] - GPU error during getBestTactic: %input.383 : Tensor = aten::_convolution(%input.381, %self.segmentation_head.0.weight, %self.segmentation_head.0.bias, %869, %871, %869, %870, %871, %25, %873, %870, %873, %873), scope: __module.segmentation_head/__module.segmentation_head.0 # /opt/conda/lib/python3.8/site-packages/torch/nn/modules/conv.py:442:0 : invalid argument
    [1, 256, 128, 128]
    [1, 3, 512, 512]
    [1, 256, 128, 128]
    [1, 3, 512, 512]
    TRT Module created
    
    Benchmarking TRT Module...
    Warm up ...
    Start timing ...
    Iteration 100/1000, ave batch time 2.74 ms
    Iteration 200/1000, ave batch time 2.75 ms
    Iteration 300/1000, ave batch time 2.74 ms
    Iteration 400/1000, ave batch time 2.75 ms
    Iteration 500/1000, ave batch time 2.74 ms
    Iteration 600/1000, ave batch time 2.74 ms
    Iteration 700/1000, ave batch time 2.75 ms
    Iteration 800/1000, ave batch time 2.75 ms
    Iteration 900/1000, ave batch time 2.75 ms
    Iteration 1000/1000, ave batch time 2.75 ms
    Input shape: torch.Size([1, 3, 512, 512])
    Output features size: torch.Size([1, 3, 512, 512])
    

    Environment

    newest PyTorch NGC docker image

    My Windows PC mounts a RTX3080. My Ubuntu desktop mounts a GTX1080Ti.

    Additional context

    question No Activity channel: windows 
    opened by andreabonvini 24
  • ❓ [Question] Building torch_tensorrt.lib on Windows

    ❓ [Question] Building torch_tensorrt.lib on Windows

    ❓ Question

    I am wondering how to build the torch_tensorrt.lib on Windows.

    What you have already tried

    I have followed #960 and #856 (with the same WORKSPACE as the latter) and managed to successfully build torch_tensorrt.dll. However, I need the .lib file in order to compile my Libtorch program. I tried linking to some of the .lib files that were created already (like bazel-out\x64_windows-opt\bin\cpp\torch_tensorrt.lo.lib), but that didn't work. I expect it's a fairly simple bazel command, but I have no idea where to put it.

    Environment

    Build information about Torch-TensorRT can be found by turning on debug messages

    • PyTorch Version (e.g., 1.0): 1.10.0 (release)
    • CPU Architecture: x86-64
    • OS (e.g., Linux): Windows 10
    • How you installed PyTorch (conda, pip, libtorch, source): libtorch from pytorch.org
    • Build command you used (if compiling from source): bazel build //:libtorchtrt --compilation_mode opt
    • CUDA version: 11.3
    • Any other relevant information: Using VS2019

    Additional context

    My libtorch program runs fine even if I include the torch-tensorrt headers, but throws the following errors as soon as I try to use torch_tensorrt::torchscript::CompileSpec and call torch_tensorrt::torchscript::compile: Error LNK1120 2 unresolved externals Omkar 1.10.0+cu113 B:\Programming_Current Projects\HelloLibTorch\x64\Release\HelloTorch.exe 1

    Error LNK2019 unresolved external symbol "public: __cdecl torch_tensorrt::torchscript::CompileSpec::CompileSpec(class std::vector<class std::vector<__int64,class std::allocator<__int64> >,class std::allocator<class std::vector<__int64,class std::allocator<__int64> > > >)" (??0CompileSpec@torchscript@torch_tensorrt@@QEAA@V?$vector@V?$vector@_JV?$allocator@_J@std@@@std@@V?$allocator@V?$vector@_JV?$allocator@_J@std@@@std@@@2@@std@@@Z) referenced in function main Omkar 1.10.0+cu113 B:\Programming_Current Projects\HelloLibTorch\main.obj 1

    Error LNK2019 unresolved external symbol "struct torch::jit::Module __cdecl torch_tensorrt::torchscript::compile(struct torch::jit::Module const &,struct torch_tensorrt::torchscript::CompileSpec)" (?compile@torchscript@torch_tensorrt@@YA?AUModule@jit@torch@@AEBU345@UCompileSpec@12@@Z) referenced in function main Omkar 1.10.0+cu113 B:\Programming_Current Projects\HelloLibTorch\main.obj 1

    question channel: windows 
    opened by jonahclarsen 20
  • Add CMake support to build the libraries

    Add CMake support to build the libraries

    Description

    This PR adds the CMake support to build the libraries and the torchtrtc executable. It also generates a torchtrtConfig.cmake, in order to be able to consume these libraries with cmake targets.

    The major advantage of the CMake support as introduced in this PR is the added ability to compile for windows as well, fixing the export of symbols, and generating both the *.dll and the *.lib libraries. To export the symbols, I reused the existing TORCHTRT_API macro, redefining it for windows, using CMake. The consumption of the dynamic libraries (both on linux and windows) is demonstrated by building the executable torchtrtc[.exe]. This has been tested locally with MSVC 2019 (19.29.30143.0), libtorch 1.11 (1.11.0+cu113), CUDA 11.6, cuDNN 8.2.1, and TensorRT 8.2.4.2. A secondary advantage is the handling of dependencies. These can be installed anywhere (not necessarily globally on the system) and by any mean and compiling the lib does not require editing config file anymore, instead, a few CMake variables are enough to have the dependencies correctly found and used. (for cuDNN and TensorRT, most of the magic happens in the CMake finders in cmake/Modules).

    This PR only adds partial CMake support to Torch-TensorRT (only for the libs and the executable torchtrtc). I started working on the unit tests as well and could consider working on adding the python support as well, if there is a clear interest. I still think that being able to compile the libraries (also on windows) has value in itself, and that's the reason for this PR.

    I didn't add any documentation about how to compile using CMake, but that should probably be done. A bit of guidance on where and how this should be best done would be appreciated.

    With this PR, I also would like to trigger the discussion around CMake and/or windows support (which, technically, are obviously totally independent). The kind of questions I have are:

    • Is there a CI for this lib (I am assuming there is)? Could we imagine also compiling the lib with Cmake on the CI, to check that any new addition does not break it in the future? If not, how do you see the maintenance of the feature?
    • Same question for windows: I am assuming the lib is not compiled on windows on the CI currently. Can that be added? Possibly with CMake?

    Thanks for reviewing and looking forward to answer questions 😃

    Type of change

    • New feature (non-breaking change which adds functionality)

    Checklist:

    • [x] My code follows the style guidelines of this project (You can use the linters)
    • [x] I have performed a self-review of my own code
    • [x] I have commented my code, particularly in hard-to-understand areas and hacks
    • [ ] I have made corresponding changes to the documentation
    • [ ] I have added tests to verify my fix or my feature
    • [ ] New and existing unit tests pass locally with my changes

    Signed-off-by: Gabriel Cuendet [email protected]

    documentation component: lowering component: conversion component: core component: converters component: build system component: api [C++] component: evaluators component: runtime component: partitioning channel: windows cla signed release: v1.2 
    opened by gcuendet 18
  • Unable to use any Torch-TensorRT methods

    Unable to use any Torch-TensorRT methods

    I'm facing this error:

    AttributeError: module 'torch_tensorrt' has no attribute 'compile'

    I also get this error when I try to use any other method like Input().

    This is how I installed Torch-TensorRT: pip install torch-tensorrt -f github.com/NVIDIA/Torch-TensorRT/releases

    Code (from official documentation):

    import torch_tensorrt
    
    model = model.eval()
    compile_settings = {
        "input_shapes": [
            {
                "min": [1, 1, 16, 16],
                "opt": [1, 1, 32, 32],
                "max": [1, 1, 64, 64]
            },
        ],
        "op_precision": torch.half # Run with fp16
    }
    enabled_precisions = {torch.float, torch.half}
    
    trt_ts_module = torch_tensorrt.compile(model, inputs=compile_settings, enabled_precisions=enabled_precisions) 
    

    Stack Trace:

    AttributeError                            Traceback (most recent call last)
    <command-3167120371910218> in <module>
         14 enabled_precisions = {torch.float, torch.half}
         15 
    ---> 16 trt_ts_module = torch_tensorrt.compile(model, inputs=compile_settings, enabled_precisions=enabled_precisions)
    
    AttributeError: module 'torch_tensorrt' has no attribute 'compile'
    

    Please let me know how I can fix this issue.

    question No Activity channel: windows 
    opened by Arjunp24 15
  • How to build from sources on Windows

    How to build from sources on Windows

    ❓ Question

    How shall I edit the WORKSPACE file in order to build tag 0.1.0 from sources on Windows?

    What you have already tried

    1. I successfully did the build from sources process for Jetson Xavier AGX, see: https://github.com/NVIDIA/TRTorch/issues/222

    2. Based on the material that I was already had from the Jetson process I tried to do the same for my Windows by editing the WORKSPACE based on my Windows setup. I changed all required new_local_repository arguments of the cuda, torch, cudnn and tensorrt based on my Windows installations

    3. Activate the following command: bazel build //:libtrtorch

    The following errors report was generated:

    INFO: Repository rules_python instantiated at: no stack (--record_rule_instantiation_callstack not enabled) Repository rule git_repository defined at: C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git.bzl:195:18: in ERROR: An error occurred during the fetch of repository 'rules_python': Traceback (most recent call last): File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git.bzl", line 177 _clone_or_update(ctx) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git.bzl", line 36, in _clone_or_update git_repo(ctx, directory) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 91, in git_repo _update(ctx, git_repo) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 103, in _update fetch(ctx, git_repo) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 129, in fetch _git_maybe_shallow(ctx, <5 more arguments>) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 171, in _git_maybe_shallow _error(ctx.name, <2 more arguments>) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 181, in _error fail(<1 more arguments>) error running 'git fetch origin refs/heads/:refs/remotes/origin/ refs/tags/:refs/tags/' while working with @rules_python: BUG: run-command.c:519: disabling cancellation: Invalid argument ERROR: no such package '@rules_python//python': Traceback (most recent call last): File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git.bzl", line 177 _clone_or_update(ctx) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git.bzl", line 36, in _clone_or_update git_repo(ctx, directory) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 91, in git_repo _update(ctx, git_repo) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 103, in _update fetch(ctx, git_repo) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 129, in fetch _git_maybe_shallow(ctx, <5 more arguments>) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 171, in _git_maybe_shallow _error(ctx.name, <2 more arguments>) File "C:/users/General/_bazel_General/zs4npqzu/external/bazel_tools/tools/build_defs/repo/git_worker.bzl", line 181, in _error fail(<1 more arguments>) error running 'git fetch origin refs/heads/:refs/remotes/origin/ refs/tags/:refs/tags/' while working with @rules_python: BUG: run-command.c:519: disabling cancellation: Invalid argument INFO: Elapsed time: 1.097s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded)

    Environment

    Build information about the TRTorch compiler can be found by turning on debug messages

    • PyTorch Version (e.g., 1.0): 1.6
    • CPU Architecture: Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 2592 Mhz, 4 Core(s), 8 Logical Processor(s)
    • OS (e.g., Linux): Windows
    • How you installed PyTorch (conda, pip, libtorch, source): pip3
    • Build command you used (if compiling from source):
    • Are you using local sources or building from archives:
    • Python version: 3.6.8
    • CUDA version: 11.0
    • GPU models and configuration: Quadro M2000M
    • Any other relevant information: TensorRT 7.2.1, CuDNN 8.0.1

    Additional context

    I have a good experience with TensorRT development on my Windows setup so I know that from NVIDIA libraries setup point of view everything should be OK

    question channel: windows 
    opened by OronG13 15
  • ❓ [Question] How to install torch_tensorrt python API in ubuntu 20.04?

    ❓ [Question] How to install torch_tensorrt python API in ubuntu 20.04?

    ❓ Question

    I want to install torch_tensorrt python API in ubuntu 20.04. could you please provide step by a step installation procedure? I tried pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases

    when I try to import the module import torch_tensorrt

    I'm getting the below error,

    Screenshot from 2022-06-19 15-41-46

    Environment

    Build information about Torch-TensorRT can be found by turning on debug messages

    • PyTorch Version (e.g., 1.0): 1.11.0
    • CPU Architecture:
    • OS (e.g., Linux): LINUX
    • How you installed PyTorch (conda, pip, libtorch, source): conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
    • Build command you used (if compiling from source):
    • Are you using local sources or building from archives:no
    • Python version: 3.7.13
    • CUDA version: 11.3.1
    • GPU models and configuration:
    • Any other relevant information:

    @narendasan @peri044

    question component: build system component: packaging component: dependencies 
    opened by IamExperimenting 14
  • 🐛 [Bug] Returning list of tensors fails when operations are applied to tensors: `RuntimeError: [Error thrown at core/conversion/conversion.cpp:220] List type. Only a single tensor or a TensorList type is supported.`

    🐛 [Bug] Returning list of tensors fails when operations are applied to tensors: `RuntimeError: [Error thrown at core/conversion/conversion.cpp:220] List type. Only a single tensor or a TensorList type is supported.`

    Bug Description

    Returning a list of tensors fails when ops are applied to the tensors prior to appending them to the list that is returned. This is not the case if tensors are directly appended to the list without applying any operations.

    RuntimeError: [Error thrown at core/conversion/conversion.cpp:220] List type. Only a single tensor or a TensorList type is supported.
    

    To Reproduce

    Run the following:

      import torch    
      import torch_tensorrt as torchtrt    
          
          
      import torch_tensorrt.logging as logging    
          
      logging.set_reportable_log_level(logging.Level.Info)    
          
      torch.manual_seed(0)    
          
      DEVICE = torch.device("cuda:0")    
      SHAPE = (1, 2)    
          
          
      class Model(torch.nn.Module):    
          def __init__(self):    
              super().__init__()    
          
          def forward(self, x):    
              tensors = []    
              for i in range(3):    
                  y = x + x    
                  tensors.append(y)    
          
              return tensors    
          
          
      if __name__ == "__main__":    
          tensor = torch.randn(SHAPE, dtype=torch.float32, device=DEVICE)    
          
          model = Model().eval().to(DEVICE)    
          out = model(tensor)    
          print(out)    
          
          model_trt = torchtrt.compile(    
              model,    
              inputs=[    
                  torchtrt.Input(shape=SHAPE),    
              ],    
              enabled_precisions={torch.float},    
          )    
          out_trt = model(tensor)    
          print(out_trt)    
    

    This throws the following error:

    (trtorch-1.0) ~/av-dbg/experimental/chaoz/trtorch (chaoz/trtorch-experiments) $ python index.py 
    [tensor([[-1.8493, -0.8507]], device='cuda:0'), tensor([[-1.8493, -0.8507]], device='cuda:0'), tensor([[-1.8493, -0.8507]], device='cuda:0')]
    INFO: [Torch-TensorRT] - ir was set to default, using TorchScript as ir
    INFO: [Torch-TensorRT] - Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript
    INFO: [Torch-TensorRT] - Lowered Graph: graph(%x.1 : Tensor):
      %2 : int = prim::Constant[value=1]()
      %y.1 : Tensor = aten::add(%x.1, %x.1, %2) # /home/chaoz/av-dbg/experimental/chaoz/trtorch/index.py:24:16
      %y.2 : Tensor = aten::add(%x.1, %x.1, %2) # /home/chaoz/av-dbg/experimental/chaoz/trtorch/index.py:24:16
      %y.4 : Tensor = aten::add(%x.1, %x.1, %2) # /home/chaoz/av-dbg/experimental/chaoz/trtorch/index.py:24:16
      %tensors.1 : Tensor[] = prim::ListConstruct(%y.1, %y.2, %y.4)
      return (%tensors.1)
    
    WARNING: [Torch-TensorRT] - Cannot infer input type from calcuations in graph for input x.1. Assuming it is Float32. If not, specify input type explicity
    INFO: [Torch-TensorRT] - Skipping partitioning since model is fully supported
    INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageChange] Init CUDA: CPU +449, GPU +0, now: CPU 3411, GPU 1873 (MiB)
    INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageSnapshot] Begin constructing builder kernel library: CPU 3411 MiB, GPU 1873 MiB
    INFO: [Torch-TensorRT TorchScript Conversion Context] - [MemUsageSnapshot] End constructing builder kernel library: CPU 3565 MiB, GPU 1915 MiB
    INFO: [Torch-TensorRT] - Settings requested for TensorRT engine:
        Enabled Precisions: Float32 
        TF32 Floating Point Computation Enabled: 1
        Truncate Long and Double: 0
        Make Refittable Engine: 0
        Debuggable Engine: 0
        Strict Types: 0
        GPU ID: 0
        Allow GPU Fallback (if running on DLA): 0
        Min Timing Iterations: 2
        Avg Timing Iterations: 1
        Max Workspace Size: 1073741824
        Max Batch Size: Not set
        Device Type: GPU
        GPU ID: 0
        Engine Capability: standard
        Calibrator Created: 0
    INFO: [Torch-TensorRT TorchScript Conversion Context] - Converting Block
    INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Input x.1 (named: input_0): Input(shape: [1, 2], dtype: Float32, format: NCHW\Contiguous\Linear) in engine (conversion.AddInputs)
    INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Layer %y.1 : Tensor = aten::add(%x.1, %x.1, %2) # /home/chaoz/av-dbg/experimental/chaoz/trtorch/index.py:24:16 (ctx.AddLayer)
    INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Layer %y.2 : Tensor = aten::add(%x.1, %x.1, %2) # /home/chaoz/av-dbg/experimental/chaoz/trtorch/index.py:24:16 (ctx.AddLayer)
    INFO: [Torch-TensorRT TorchScript Conversion Context] - Adding Layer %y.4 : Tensor = aten::add(%x.1, %x.1, %2) # /home/chaoz/av-dbg/experimental/chaoz/trtorch/index.py:24:16 (ctx.AddLayer)
    Traceback (most recent call last):
      File "/home/chaoz/av-dbg/experimental/chaoz/trtorch/index.py", line 37, in <module>
        model_trt = torchtrt.compile(
      File "/home/chaoz/.anaconda3/envs/trtorch-1.0/lib/python3.9/site-packages/torch_tensorrt/_compile.py", line 97, in compile
        return torch_tensorrt.ts.compile(ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs)
      File "/home/chaoz/.anaconda3/envs/trtorch-1.0/lib/python3.9/site-packages/torch_tensorrt/ts/_compiler.py", line 119, in compile
        compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
    RuntimeError: [Error thrown at core/conversion/conversion.cpp:220] List type. Only a single tensor or a TensorList type is supported.
    

    Expected behavior

    Graph should return a list of tensors without errors.

    Environment

    Build information about Torch-TensorRT can be found by turning on debug messages

    • Torch-TensorRT Version (e.g. 1.0.0): 1.0
    • PyTorch Version (e.g. 1.0): 1.10.2
    • CPU Architecture: x86-64
    • OS (e.g., Linux): Ubuntu 18.04
    • How you installed PyTorch (conda, pip, libtorch, source): Conda
    • Build command you used (if compiling from source):
    • Are you using local sources or building from archives: local
    • Python version: 3.9
    • CUDA version: 11.6
    • GPU models and configuration: Nvidia A10
    • Any other relevant information:

    Additional context

    Note that changing the forward function to the following definition:

          def forward(self, x):    
              tensors = []    
              for i in range(3):    
                  #  y = x + x    
                  tensors.append(x)    
          
              return tensors    
    

    will succeed with the following output:

    (trtorch-1.0) ~/av-dbg/experimental/chaoz/trtorch (chaoz/trtorch-experiments) $ python index.py
    [tensor([[-0.9247, -0.4253]], device='cuda:0'), tensor([[-0.9247, -0.4253]], device='cuda:0'), tensor([[-0.9247, -0.4253]], device='cuda:0')]
    INFO: [Torch-TensorRT] - ir was set to default, using TorchScript as ir
    INFO: [Torch-TensorRT] - Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript
    INFO: [Torch-TensorRT] - Lowered Graph: graph(%x.1 : Tensor):
      %tensors.1 : Tensor[] = prim::ListConstruct(%x.1, %x.1, %x.1)
      return (%tensors.1)
    
    WARNING: [Torch-TensorRT] - Cannot infer input type from calcuations in graph for input x.1. Assuming it is Float32. If not, specify input type explicity
    ERROR: [Torch-TensorRT] - Method requested cannot be compiled by Torch-TensorRT.TorchScript.
    There is no work to be done since the resulting compiled program will contain an engine that is empty.
    This may be because there are no operators that can be added to the TensorRT graph or all operators have a resolved compile time value.
    
    WARNING: [Torch-TensorRT] - Input type for doing shape analysis could not be determined, defaulting to F32
    INFO: [Torch-TensorRT] - Partitioned Graph: []
    INFO: [Torch-TensorRT] - Segmented Graph: graph(%x.1 : Tensor):
      return ()
    
    WARNING: [Torch-TensorRT] - Didn't generate any TensorRT engines, the compiler did nothing
    
    [tensor([[-0.9247, -0.4253]], device='cuda:0'), tensor([[-0.9247, -0.4253]], device='cuda:0'), tensor([[-0.9247, -0.4253]], device='cuda:0')]
    
    feature request component: core release: v1.2 
    opened by chaoz-dev 14
  • feat: Show PyTorch code of unsupported operators

    feat: Show PyTorch code of unsupported operators

    Signed-off-by: lamhoangtung [email protected]

    Description

    Inspired by my problem in #511, I'm trying to enable TensorRT for a TorchScript model and getting a bunch of Unsupported operators. I'm willing to change the implementation to avoid those unsupported operators or even trying to add support for it but struggling to find which line of code in my model are causing it.

    Thanks to @narendasan guidance. I have tried to add a similar traceback feature like TorchScript where TRTorch will point to the exact line of PyTorch that cause the unsupported operators.

    So instead of showing a vague operator schema like this:

    ERROR: [TRTorch] - Method requested cannot be compiled by TRTorch.
    Unsupported operators listed below:
      -  aten::__contains__.str_list(str[] l, str item) -> (bool)
      -  ...
    You can either implement converters for these ops in your application or request implementation
    https://www.github.com/nvidia/TRTorch/issues
    

    TRTorch should show the exact line of PyTorch that cause the unsupported operators like this:

    ERROR: [TRTorch] - Method requested cannot be compiled by TRTorch.
    Unsupported operators listed below:
      - aten::__contains__.str_list(str[] l, str item) -> (bool)
        Related PyTorch code:
      File "/home/techainer/anaconda3/envs/test_pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 3446
            )
    
        if mode in ("nearest", "area"):
           ~~~~~~~~~~~~~~~~~~~~~ <--- HERE
            if align_corners is not None:
                raise ValueError(
      - ...
    You can either implement converters for these ops in your application or request implementation
    https://www.github.com/nvidia/TRTorch/issues
    

    So user can have additional context to the problem.

    This should fix #511

    Type of change

    • New feature (non-breaking change which adds functionality)

    Checklist:

    • [x] My code follows the style guidelines of this project (I used the linters)
    • [x] I have performed a self-review of my own code
    • [x] I have commented my code, particularly in hard-to-understand areas and hacks
    • [x] I have made corresponding changes to the documentation: I reckon there is no documentation change required for this feature
    • [x] I have added tests to verify my fix or my feature
    • [x] New and existing unit tests pass locally with my changes
    opened by lamhoangtung 14
  • Problem when using Citrinet example notebook

    Problem when using Citrinet example notebook

    I have similar problem with https://github.com/pytorch/TensorRT/issues/763 when trying to adopt Citrinet-Example.ipynb. My model is a fine-tuned Citrinet512 from NeMo. By the way my pytorch_tesorrt version is 1.3.0. Here is my code:

    import torch
    import torch.nn as nn
    import torch_tensorrt as torchtrt
    import argparse
    
    precisions = [torch.float, torch.half]
    batch_sizes = [1,8,32,128]
    model = torch.jit.load(model_path+f"{variant}.ts")
    model.eval()
    for precision in precisions:
        for batch_size in batch_sizes:
            compile_settings = {
                "inputs": [torchtrt.Input(shape=[batch_size, 80, 1488],  dtype=torch.int), torchtrt.Input(shape=[1, batch_size],  dtype=torch.int)],
                "enabled_precisions": {precision},
                "workspace_size": 2000000000,
                "truncate_long_and_double": True,
            }
            print(f"Generating Torchscript-TensorRT module for batchsize {batch_size} precision {precision}")
            trt_ts_module = torchtrt.compile(model, **compile_settings)
            torch.jit.save(trt_ts_module, f"{variant}_bs{batch_size}_{precision}.torch-tensorrt")
    

    Output:

    RuntimeError                              Traceback (most recent call last)
    Input In [6], in <module>
        14 compile_settings = {
        15     "inputs": [torchtrt.Input(shape=[batch_size, 80, 1488],  dtype=torch.int), torchtrt.Input(shape=[1, batch_size],  dtype=torch.int)],
        16     "enabled_precisions": {precision},
        17     "workspace_size": 2000000000,
        18     "truncate_long_and_double": True,
        19 }
        20 print(f"Generating Torchscript-TensorRT module for batchsize {batch_size} precision {precision}")
    ---> 21 trt_ts_module = torchtrt.compile(model, **compile_settings)
        22 torch.jit.save(trt_ts_module, f"{variant}_bs{batch_size}_{precision}.torch-tensorrt")
    
    File ~/nvidia-nemo-asr-training/venv/lib/python3.9/site-packages/torch_tensorrt/_compile.py:125, in compile(module, ir, inputs, enabled_precisions, **kwargs)
       120         logging.log(
       121             logging.Level.Info,
       122             "Module was provided as a torch.nn.Module, trying to script the module with torch.jit.script. In the event of a failure please preconvert your module to TorchScript",
       123         )
       124         ts_mod = torch.jit.script(module)
    --> 125     return torch_tensorrt.ts.compile(
       126         ts_mod, inputs=inputs, enabled_precisions=enabled_precisions, **kwargs
       127     )
       128 elif target_ir == _IRType.fx:
       129     if (
       130         torch.float16 in enabled_precisions
       131         or torch_tensorrt.dtype.half in enabled_precisions
       132     ):
    
    File ~/nvidia-nemo-asr-training/venv/lib/python3.9/site-packages/torch_tensorrt/ts/_compiler.py:136, in compile(module, inputs, input_signature, device, disable_tf32, sparse_weights, enabled_precisions, refit, debug, capability, num_avg_timing_iters, workspace_size, dla_sram_size, dla_local_dram_size, dla_global_dram_size, calibrator, truncate_long_and_double, require_full_compilation, min_block_size, torch_executed_ops, torch_executed_modules)
       110     raise ValueError(
       111         f"require_full_compilation is enabled however the list of modules and ops to run in torch is not empty. Found: torch_executed_ops: {torch_executed_ops}, torch_executed_modules: {torch_executed_modules}"
       112     )
       114 spec = {
       115     "inputs": inputs,
       116     "input_signature": input_signature,
      (...)
       133     },
       134 }
    --> 136 compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
       137 compiled_module = torch.jit._recursive.wrap_cpp_module(compiled_cpp_mod)
       138 return compiled_module
    
    RuntimeError: [Error thrown at core/conversion/converters/impl/conv_deconv.cpp:129] Expected orig_dims.nbDims > 2 to be true but got false
    Unable to create convolution layer from node: %18096 : Tensor = aten::_convolution(%18093, %33, %32, %34, %35, %34, %18094, %18095, %30, %18094, %18094, %18094, %18094)
    

    Originally posted by @hamjam in https://github.com/pytorch/TensorRT/issues/763#issuecomment-1368486360

    opened by hamjam 0
  • ❓ [Question] How do you use dynamic shape when using fx as ir and the model is not fully lowerable

    ❓ [Question] How do you use dynamic shape when using fx as ir and the model is not fully lowerable

    ❓ Question

    I have a pytorch model that contains a Pixel Shuffle operation (which is not fully supported) and I would like to convert it to TensorRT, while being able to specify a dynamic shape as input. The "ts" path does not work as there is an issue, the "fx" path has problems too and I am not able to use a splitted model with dynamic shapes.

    What you have already tried

    • The conversion using TorchScript as "ir" is not working (see Issue #1568)
    • The conversion using torch_tensorrt.fx.compile succeeds when I use a static shape, however there is no way of specifying a dynamic shape
    • Using a manual approach (that is by manually tracing with acc_tracer, then constructing the TRTInterpreter and finally the TRTModule) fails as there is a non supported operation (a pixel shuffle layer) (Maybe I should open an Issue for this too?)
    • Using the manual approach with a TRTSplitter is maybe the way to go but I don't know how to specify the dynamic shape constraints in this situation.

    The "manual" approach that I mentioned is the one specified in examples/fx/fx2trt_example.py and in the docs.

    Here is the code as I have it now. Please note that the branch with the splitter is executed and the result is errors when I execute the trt model with different shapes. If do_split is set to False the conversion fails as nn.PixelShuffle is not supported.

    import tensorrt as trt
    import torch.fx
    import torch.nn as nn
    
    import torch_tensorrt.fx.tracer.acc_tracer.acc_tracer as acc_tracer
    import torchvision.models as models
    from torch_tensorrt.fx import InputTensorSpec, TRTInterpreter, TRTModule
    from torch_tensorrt.fx.utils import LowerPrecision
    from torch_tensorrt.fx.tools.trt_splitter import TRTSplitter
    
    
    class MyModel(nn.Module):
        def __init__(self):
            super().__init__()
            self.conv = nn.Conv2d(3, 16, kernel_size=3, padding=1)
            self.shuffle = nn.PixelShuffle(2)
    
        def forward(self, x):
            return self.shuffle(self.conv(x))
    
    
    torch.set_grad_enabled(False)
    
    # inputs
    inputs = [torch.rand(1, 3, 224, 224).cuda()]
    
    
    factory_kwargs = {"dtype": torch.float32, "device": torch.device("cuda:0")}
    model = MyModel().to(**factory_kwargs)
    
    model = model.eval()
    
    out = model(inputs[0])
    
    # sybolic trace
    acc_model = acc_tracer.trace(model, inputs)
    
    do_split = True
    
    if do_split:
        # split
        splitter = TRTSplitter(acc_model, inputs)
    
        splitter.node_support_preview(dump_graph=False)
    
        split_mod = splitter()
    
        print(split_mod.graph)
    
        def get_submod_inputs(mod, submod, inputs):
            acc_inputs = None
    
            def get_input(self, inputs):
                nonlocal acc_inputs
                acc_inputs = inputs
    
            handle = submod.register_forward_pre_hook(get_input)
            mod(*inputs)
            handle.remove()
            return acc_inputs
    
        for name, _ in split_mod.named_children():
            if "_run_on_acc" in name:
                submod = getattr(split_mod, name)
                # Get submodule inputs for fx2trt
                acc_inputs = get_submod_inputs(split_mod, submod, inputs)
    
                # fx2trt replacement
                interp = TRTInterpreter(
                    submod,
                    InputTensorSpec.from_tensors(acc_inputs),
                    explicit_batch_dimension=True,
                )
                r = interp.run(lower_precision=LowerPrecision.FP32)
                trt_mod = TRTModule(*r)
                setattr(split_mod, name, trt_mod)
    
        trt_model = split_mod
    
    else:
        # input specs
        input_specs = [
            InputTensorSpec(
                shape=(1, 3, -1, -1),
                dtype=torch.float32,
                device="cuda:0",
                shape_ranges=[((1, 3, 112, 112), (1, 3, 224, 224), (1, 3, 512, 512))],
            ),
        ]
        # input_specs = [
        #     InputTensorSpec(
        #         shape=(1, 3, 224, 224),
        #         dtype=torch.float32,
        #         device="cuda:0",
        #     ),
        # ]
    
        # TRT interpreter
        interp = TRTInterpreter(
            acc_model,
            input_specs,
            explicit_batch_dimension=True,
            explicit_precision=True,
            logger_level=trt.Logger.INFO,
        )
    
        interpreter_result = interp.run(
            max_batch_size=4, lower_precision=LowerPrecision.FP32
        )
    
        # TRT module
        trt_model = TRTModule(
            interpreter_result.engine,
            interpreter_result.input_names,
            interpreter_result.output_names,
        )
    
    trt_out = trt_model(inputs[0])
    
    
    trt_model(torch.rand(1,3, 112, 112).cuda())
    trt_model(torch.rand(1,3, 150, 150).cuda())
    trt_model(torch.rand(1,3, 400, 400).cuda())
    trt_model(torch.rand(1,3, 512, 512).cuda())
    
    print((trt_out - out).max())
    
    

    Environment

    The official NVIDIA Pytorch Docker image version 22.12 is used.

    Build information about Torch-TensorRT can be found by turning on debug messages

    • Torch-TensorRT Version (e.g. 1.0.0): 1.3.0a0
    • PyTorch Version (e.g. 1.0): 1.14.0a0+410ce96
    • CPU Architecture: AMD64
    • OS (e.g., Linux): Ubuntu
    • How you installed PyTorch (conda, pip, libtorch, source): preinstalled in the Docker image
    • Build command you used (if compiling from source):
    • Are you using local sources or building from archives:
    • Python version: 3.8.10
    • CUDA version: 12.0
    • GPU models and configuration: NVIDIA GeForce RTX 2080 Ti, driver version 525.60.11
    • Any other relevant information:
    question component: fx 
    opened by ivan94fi 0
  • 🐛 [Bug] PixelShuffle with dynamic shape error

    🐛 [Bug] PixelShuffle with dynamic shape error

    Bug Description

    Trying to convert a model with pixel shuffle and dynamic shape results in many errors, with this message reshape dimension with more than one -1 wildcard.

    Converting the model with a static shape works fine.

    To Reproduce

    Steps to reproduce the error:

    1. Use a model with a nn.PixelShuffle layer
    2. Use a torch_tensorrt.Input with min_shape, opt_shape and max_shape
    3. Try to convert with torch_tensorrt.compile

    To reproduce the error use the following code and execute the with dynamic as input:

    python pixel_shuffle_bug.py dynamic
    

    Example code

    # Save as pixel_shuffle_bug.py
    import torch
    import torch.nn as nn
    import torch_tensorrt
    
    
    class MyModel(nn.Module):
        def __init__(self):
            super().__init__()
            self.conv = nn.Conv2d(3, 16, kernel_size=3, padding=1)
            self.shuffle = nn.PixelShuffle(2)
    
        def forward(self, x):
            return self.shuffle(self.conv(x))
    
    
    @torch.no_grad()
    def main(shape_mode):
        input_data = torch.rand(1, 3, 1280, 720)
        factory_kwargs = {"dtype": torch.float32, "device": torch.device("cuda:0")}
    
        precision = torch.float32
        x = input_data.to(**factory_kwargs)
    
        m = MyModel().to(**factory_kwargs)
        m = m.eval()
        out = m(x)
    
        sm = torch.jit.trace(m, example_inputs=(x,))
        sout = sm(x)
    
        if shape_mode == "dynamic":
            inputs = [
                torch_tensorrt.Input(
                    min_shape=[1, 3, 852, 480],
                    opt_shape=[1, 3, 1280, 720],
                    max_shape=[1, 3, 3840, 2160],
                    dtype=factory_kwargs["dtype"],
                )
            ]
        else:
            inputs = [
                torch_tensorrt.Input(
                    shape=[1, 3, 1280, 720],
                    dtype=factory_kwargs["dtype"],
                )
            ]
        torch_tensorrt.logging.set_is_colored_output_on(True)
        with torch_tensorrt.logging.warnings():
            compile_spec = {
                "ir": "ts",
                "inputs": inputs,
                "enabled_precisions": {precision},
                "truncate_long_and_double": False,
                "require_full_compilation": True,
            }
            trtm = torch_tensorrt.compile(sm, **compile_spec)
    
        print("\033[92m" + "=========== Model converted ===========" + "\033[0m")
    
        trtout = trtm(x)
    
        print(f"greatest difference trt ({precision}): {(out - trtout).max().item()}")
    
    
    if __name__ == "__main__":
        import argparse
    
        parser = argparse.ArgumentParser()
        parser.add_argument("shape_mode", choices=["dynamic", "static"])
    
        args = parser.parse_args()
        main(args.shape_mode)
    

    Error messages and stack traces

    ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IShuffleLayer (Unnamed Layer* 1) [Shuffle]: reshape dimension with more than one -1 wildcard. Reshaping [1,16,(# 2 (SHAPE input_0)),(# 3 (SHAPE input_0))] to [1,4,2,2,-1,-1].)
    ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IShuffleLayer (Unnamed Layer* 1) [Shuffle]: reshape dimension with more than one -1 wildcard. Reshaping [1,16,(# 2 (SHAPE input_0)),(# 3 (SHAPE input_0))] to [1,4,2,2,-1,-1].)
    ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IShuffleLayer (Unnamed Layer* 1) [Shuffle]: reshape dimension with more than one -1 wildcard. Reshaping [1,16,(# 2 (SHAPE input_0)),(# 3 (SHAPE input_0))] to [1,4,2,2,-1,-1].)
    ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [graphShapeAnalyzer.cpp::analyzeShapes::1872] Error Code 4: Miscellaneous (IShuffleLayer (Unnamed Layer* 1) [Shuffle]: reshape dimension with more than one -1 wildcard. Reshaping [1,16,(# 2 (SHAPE input_0)),(# 3 (SHAPE input_0))] to [1,4,2,2,-1,-1].)
    ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: (Unnamed Layer* 1) [Shuffle]: at most one dimension may be inferred
    ERROR: [Torch-TensorRT TorchScript Conversion Context] - 4: [network.cpp::validate::2991] Error Code 4: Internal Error (Layer (Unnamed Layer* 1) [Shuffle] failed validation)
    ERROR: [Torch-TensorRT TorchScript Conversion Context] - 2: [builder.cpp::buildSerializedNetwork::751] Error Code 2: Internal Error (Assertion engine != nullptr failed. )
    Traceback (most recent call last):
      File "pixelshuffle_dynamic_shape_error.py", line 72, in <module>
        main(args.shape_mode)
      File "/usr/local/lib/python3.8/dist-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
        return func(*args, **kwargs)
      File "pixelshuffle_dynamic_shape_error.py", line 56, in main
        trtm = torch_tensorrt.compile(sm, **compile_spec)
      File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/_compile.py", line 125, in compile
        return torch_tensorrt.ts.compile(
      File "/usr/local/lib/python3.8/dist-packages/torch_tensorrt/ts/_compiler.py", line 136, in compile
        compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
    RuntimeError: [Error thrown at core/conversion/conversionctx/ConversionCtx.cpp:169] Building serialized network failed in TensorRT
    

    Expected behavior

    The model should be converted with dynamic shape support.

    Environment

    The official NVIDIA Pytorch Docker image version 22.12 is used.

    Build information about Torch-TensorRT can be found by turning on debug messages

    • Torch-TensorRT Version (e.g. 1.0.0): 1.3.0a0
    • PyTorch Version (e.g. 1.0): 1.14.0a0+410ce96
    • CPU Architecture: AMD64
    • OS (e.g., Linux): Ubuntu
    • How you installed PyTorch (conda, pip, libtorch, source): preinstalled in the Docker image
    • Build command you used (if compiling from source):
    • Are you using local sources or building from archives:
    • Python version: 3.8.10
    • CUDA version: 12.0
    • GPU models and configuration: NVIDIA GeForce RTX 2080 Ti, driver version 525.60.11
    • Any other relevant information:
    bug 
    opened by ivan94fi 1
  • Cannot export a torchscript model to TensorRT.

    Cannot export a torchscript model to TensorRT.

    Bug Description

    Cannot export a model to TensorRT after successfully transfered to torchscript.

    To Reproduce

    Steps to reproduce the behavior:

    1. Pull the monai image 1.1.0 from link
    2. Star a docker container with the downloaded image in step 1.
    3. Run the code below.
    import torch
    import torch_tensorrt
    from monai.networks.nets import FlexibleUNet
    import monai
    
    if __name__ == "__main__":
        input_size = (1, 3, 480, 736)
        print(monai.__file__)
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model = FlexibleUNet(
            in_channels=3, out_channels=2, backbone="efficientnet-b2", is_pad=False
        )
    
        model.to(device=device)
        model.eval()
        ts_model = torch.jit.script(model)
    
        inputs = [
            torch_tensorrt.Input(
                min_shape=input_size,
                opt_shape=input_size,
                max_shape=input_size,
            )
        ]
        enabled_precision = {torch.float, torch.half}
        with torch_tensorrt.logging.debug():
            trt_ts_model = torch_tensorrt.compile(
                ts_model, inputs=inputs, enabled_precisions=enabled_precision
            )
        torch.jit.save(trt_ts_model, "model_trt.ts")
    
    

    Output was:

    DEBUG: [Torch-TensorRT] - Settings requested for Torch Fallback:
        "enabled": True
        "min_block_size": 3
        "torch_executed_operators": [
         ]
    DEBUG: [Torch-TensorRT] - Parititioning source module into PyTorch and TensorRT sub blocks
    DEBUG: [Torch-TensorRT] - In progress TRT block does not meet minimum block size requirements, therefore folding into in progress PyTorch block
    DEBUG: [Torch-TensorRT] - Finalizing in progress Torch block
    DEBUG: [Torch-TensorRT] - Segment Block @0:
        Target: Torch
    
        Graph: graph(%x.79 : Tensor):
      %3 : float[] = prim::Constant[value=[2., 2.]]()
      %self.encoder._conv_stem.bias.39 : NoneType = prim::Constant()
      %0 : Tensor = aten::upsample_nearest1d(%x.79, %self.encoder._conv_stem.bias.39, %3) # /opt/conda/lib/python3.8/site-packages/torch/nn/functional.py:3916:15
      return ()
    
    
    DEBUG: [Torch-TensorRT] - Registering input/output torch::jit::Value for segmented graphs
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
        return _run_code(code, main_globals, None,
      File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
        exec(code, run_globals)
      File "/root/.vscode-server/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
        cli.main()
      File "/root/.vscode-server/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
        run()
      File "/root/.vscode-server/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
        runpy.run_path(target, run_name="__main__")
      File "/root/.vscode-server/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
        return _run_module_code(code, init_globals, run_name,
      File "/root/.vscode-server/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
        _run_code(code, mod_globals, init_globals,
      File "/root/.vscode-server/extensions/ms-python.python-2022.20.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
        exec(code, run_globals)
      File "/home/liubin/data/trt_bundle_experiment/export_flexible_unet_trt.py", line 32, in <module>
        trt_ts_model = torch_tensorrt.compile(
      File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/_compile.py", line 125, in compile
        return torch_tensorrt.ts.compile(
      File "/opt/conda/lib/python3.8/site-packages/torch_tensorrt/ts/_compiler.py", line 136, in compile
        compiled_cpp_mod = _C.compile_graph(module._c, _parse_compile_spec(spec))
    RuntimeError: The following operation failed in the TorchScript interpreter.
    Traceback of TorchScript (most recent call last):
      File "/opt/monai/monai/networks/nets/flexible_unet.py", line 337, in forward
            x = inputs
            enc_out = self.encoder(x)
            decoder_out = self.decoder(enc_out, self.skip_connect)
                          ~~~~~~~~~~~~ <--- HERE
            x_seg = self.segmentation_head(decoder_out)
        
      File "/opt/monai/monai/networks/nets/flexible_unet.py", line 166, in forward
                else:
                    skip = None
                x = block(x, skip)
                    ~~~~~ <--- HERE
        
            return x
      File "/opt/monai/monai/networks/nets/basic_unet.py", line 157, in forward
                x_e: features from the encoder.
            """
            x_0 = self.upsample(x)
                  ~~~~~~~~~~~~~ <--- HERE
        
            if x_e is not None:
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/container.py", line 204, in forward
        def forward(self, input):
            for module in self:
                input = module(input)
                        ~~~~~~ <--- HERE
            return input
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/upsampling.py", line 156, in forward
        def forward(self, input: Tensor) -> Tensor:
            return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
                   ~~~~~~~~~~~~~ <--- HERE
                                 recompute_scale_factor=self.recompute_scale_factor)
      File "/opt/conda/lib/python3.8/site-packages/torch/nn/functional.py", line 3916, in interpolate
    
        if input.dim() == 3 and mode == "nearest":
            return torch._C._nn.upsample_nearest1d(input, output_size, scale_factors)
                   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
        if input.dim() == 4 and mode == "nearest":
            return torch._C._nn.upsample_nearest2d(input, output_size, scale_factors)
    RuntimeError: It is expected output_size equals to 1, but got size 2
    

    Expected behavior

    Sucessfully convert the model.

    Environment

    Build information about Torch-TensorRT can be found by turning on debug messages

    • Torch-TensorRT Version (e.g. 1.0.0): 1.3.0a0
    • PyTorch Version (e.g. 1.0): 1.13.0a0
    • CPU Architecture: x86-64
    • OS (e.g., Linux): ubuntu 20.04
    • How you installed PyTorch (conda, pip, libtorch, source): conda
    • Build command you used (if compiling from source):
    • Are you using local sources or building from archives:
    • Python version:3.8.13
    • CUDA version: 11.7
    • GPU models and configuration:
    • Any other relevant information:

    Additional context

    I can transfer this model to a onnx model and then covert to a TensorRT engine by runing the command below.

    trtexec --onnx=models/model.onnx --saveEngine=models/model.trt --fp16 --minShapes=INPUT__0:1x3x736x480 --optShapes=INPUT__0:4x3x736x480 --maxShapes=INPUT__0:8x3x736x480 --shapes=INPUT__0:4x3x736x480
    
    bug 
    opened by binliunls 0
Releases(v1.3.0)
  • v1.3.0(Dec 1, 2022)

    PyTorch 1.13, CUDA 11.7, TensorRT 8.5, Support for Dynamic Batch for Partially Compiled Modules, Engine Profiling, Experimental Unified Runtime for FX and TorchScript Frontends

    Torch-TensorRT 1.3.0 targets PyTorch 1.13, CUDA 11.7, cuDNN 8.5 and TensorRT 8.5. This release focuses on adding support for Dynamic Batch Sizes for partially compiled modules using the TorchScript frontend (this is also supported with the FX frontend). It also introduces a new execution profiling utility to understand the execution of specific engine sub blocks that can be used in conjunction with PyTorch profiling tools to understand the performance of your model post compilation. Finally this release introduces a new experimental unified runtime shared by both the TorchScript and FX frontends. This allows you to start using the FX frontend to generate torch.jit.traceable compiled modules.

    Dynamic Batch Sizes for Partially Compiled Modules via the TorchScript Frontend

    A long-standing limitation of the partitioning system in the TorchScript function is lack of support for dynamic shapes. In this release we address a major subset of these use cases with support for dynamic batch sizes for modules that will be partially compiled. Usage is the same as the fully compiled workflow where using the torch_tensorrt.Input class, you may define the range of shapes that an input may take during runtime. This is represented as a set of 3 shape sizes: min, max and opt. min and max define the dynamic range of the input Tensor. opt informs TensorRT what size to optimize for provided there are multiple valid kernels available. TensorRT will select kernels that are valid for the full range of input shapes but most efficient at the opt size. In this release, partially compiled module inputs can vary in shape for the highest order dimension.

    For example:

    min_shape: (1, 3, 128, 128)
    opt_shape: (8, 3, 128, 128)
    max_shape: (32, 3, 128, 128)
    

    Is a valid shape range, however:

    min_shape: (1, 3, 128, 128)
    opt_shape: (1, 3, 256, 256)
    max_shape: (1, 3, 512, 512)
    

    is still not supported.

    Engine Profiling [Experimental]

    This release introduces a number of profiling tools to measure the performance of TensorRT sub blocks in compiled modules. This can be used in conjunction with PyTorch profiling tools to get a picture of the performance of your model. Profiling for any particular sub block can be enabled by the enabled_profiling() method of any __torch__.classes.tensorrt.Engine attribute, or of any torch_tensorrt.TRTModuleNext. The profiler will dump trace files by default in /tmp, though this path can be customized by either setting the profile_path_prefix of __torch__.classes.tensorrt.Engine or as an argument to torch_tensorrt.TRTModuleNext.enable_precision(profiling_results_dir=""). Traces can be visualized using the Perfetto tool (https://perfetto.dev)

    Screenshot 2022-11-21 at 6 23 01 PM

    Engine Layer information can also be accessed using get_layer_info which returns a JSON string with the layers / fusions that the engine contains.

    Unified Runtime for FX and TorchScript Frontends [Experimental]

    In previous versions of Torch-TensorRT, the FX and TorchScript frontends were mostly separate and each had their distinct benefits and limitations. Torch-TensorRT 1.3.0 introduces a new unified runtime to support both FX and TorchScript meaning that you can choose the compilation workflow that makes the most sense for your particular use case, be it pure Python conversion via FX or C++ Torchscript compilation. Both frontends use the same primitives to construct their compiled graphs be it fully compiled or just partially.

    Basic Usage

    The TorchScript frontend uses the new runtime by default. No additional workflow changes are necessary.

    Note: The runtime ABI version was increased to support this feature, as such models compiled with previous versions of Torch-TensorRT will need to be recompiled

    For the FX frontend, the new runtime can be chosen but setting use_experimental_fx_rt=True as part of your compile settings to either torch_tensorrt.compile(my_mod, ir="fx", use_experimental_fx_rt=True, explicit_batch_dimension=True) or torch_tensorrt.fx.compile(my_mod, use_experimental_fx_rt=True, explicit_batch_dimension=True)

    Note: The new runtime only supports explicit batch dimension

    TRTModuleNext

    The FX frontend will return a torch.nn.Module containing torch_tensorrt.TRTModuleNext submodules instead of torch_tensorrt.fx.TRTModules. The features of these modules are nearly identical but with a few key improvements.

    1. TRTModuleNext profiling dumps a trace visualizable with Perfetto (see above for more details).
    2. TRTModuleNext modules are torch.jit.trace-able, meaning you can save FX compiled modules as TorchScript for python-less / C++ deployment scenarios. Traced compiled modules have the same deployment instructions as compiled modules produced by the TorchScript frontend.
    3. TRTModuleNext maintains the same serialization workflows TRTModule supports as well (state_dict / extra_state, torch.save/torch.load)

    Examples

    model_fx = model_fx.cuda()
    inputs_fx = [i.cuda() for i in inputs_fx]
    trt_fx_module_f16 = torch_tensorrt.compile(
        model_fx,
        ir="fx",
        inputs=inputs_fx,
        enabled_precisions={torch.float16},
        use_experimental_fx_rt=True,
        explicit_batch_dimension=True
    )
    
    # Save model using torch.save 
    
    torch.save(trt_fx_module_f16, "trt.pt")
    reload_trt_mod = torch.load("trt.pt")
    
    # Trace and save the FX module in TorchScript
    scripted_fx_module = torch.jit.trace(trt_fx_module_f16, example_inputs=inputs_fx)
    scripted_fx_module.save("/tmp/scripted_fx_module.ts")
    scripted_fx_module = torch.jit.load("/tmp/scripted_fx_module.ts")
    
    ... #Get a handle for a TRTModuleNext submodule
    
    # Extract state dictionary
    st = trt_mod.state_dict()
    
    # Load the state dict into a new module
    new_trt_mod = TRTModuleNext()
    new_trt_mod.load_state_dict(st)
    

    Using TRTModuleNext as an arbirary TensorRT engine holder

    Using TorchScript you have long been able to embed an arbritrary TensorRT engine from any source in a TorchScript module using torch_tensorrt.ts.embed_engine_in_new_module. Now you can do this at the torch.nn.Module level by directly using TRTModuleNext and access all the benefits enumerated above.

    trt_mod = TRTModuleNext(
                serialized_engine,
                name="TestModule",
                input_binding_names=input_names,
                output_binding_names=output_names,
     )
    

    The intention is in a future release to have torch_tensorrt.TRTModuleNext replace torch_tensorrt.fx.TRTModule as the default TensorRT Module implementation. Feedback on this class or how it is used, the runtime in general or associated features (profiler, engine inspector) is welcomed.

    What's Changed

    • chore: Bump version to 1.2.0a0 by @narendasan in https://github.com/pytorch/TensorRT/pull/1044
    • feat: Extending nox for cxx11 ABI version by @andi4191 in https://github.com/pytorch/TensorRT/pull/1013
    • docs: Update the documentation theme to PyTorch by @narendasan in https://github.com/pytorch/TensorRT/pull/1063
    • Adding Code of Conduct file by @facebook-github-bot in https://github.com/pytorch/TensorRT/pull/1061
    • Update CONTRIBUTING.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1064
    • feat: Optimize hub.py download by @andi4191 in https://github.com/pytorch/TensorRT/pull/1022
    • Adding an action to automatically assign reviewers and assignees by @narendasan in https://github.com/pytorch/TensorRT/pull/1078
    • Add PR assigner support by @narendasan in https://github.com/pytorch/TensorRT/pull/1080
    • (//core): Align with prim::Enter in module fallback by @andi4191 in https://github.com/pytorch/TensorRT/pull/991
    • (//core): Added a variant for aten::split by @andi4191 in https://github.com/pytorch/TensorRT/pull/992
    • feat(nox): Replacing session with environment variable by @andi4191 in https://github.com/pytorch/TensorRT/pull/1057
    • Refactor the internal codebase from fx2trt_oss to torch_tensorrt by @frank-wei in https://github.com/pytorch/TensorRT/pull/1104
    • format by buildifier by @frank-wei in https://github.com/pytorch/TensorRT/pull/1106
    • [fx2trt] Modify lower setting class by @frank-wei in https://github.com/pytorch/TensorRT/pull/1107
    • Modified the notebooks directory's README file by @svenchilton in https://github.com/pytorch/TensorRT/pull/1102
    • [FX] Sync to OSS by @frank-wei in https://github.com/pytorch/TensorRT/pull/1118
    • [fx_acc] Add acc_tracer support for torch.mm by @khabinov in https://github.com/pytorch/TensorRT/pull/1120
    • Added Triton deployment instructions to documentation by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1116
    • amending triton deployment docs by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1126
    • fix: Update broken repo hyperlink by @lamhoangtung in https://github.com/pytorch/TensorRT/pull/1131
    • fix: Fix keep_dims functionality for aten::max by @peri044 in https://github.com/pytorch/TensorRT/pull/1099
    • fix(tests/core/partitioning): Fix tests of refactoring segmentation in partitioning by @peri044 in https://github.com/pytorch/TensorRT/pull/1140
    • feat(//tests): Update rtol and atol based tolerance for test cases by @andi4191 in https://github.com/pytorch/TensorRT/pull/1055
    • doc: add the explanation for partition phases on docs by @bowang007 in https://github.com/pytorch/TensorRT/pull/1090
    • feat (//cpp): Using atol and rtol based tolerance threshold for torchtrtc by @andi4191 in https://github.com/pytorch/TensorRT/pull/1052
    • CI/CD setup by @frank-wei in https://github.com/pytorch/TensorRT/pull/1137
    • Update README.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1142
    • [fx2trt] Engineholder feature improvement, test fixes by @frank-wei in https://github.com/pytorch/TensorRT/pull/1143
    • feat (//core/conversion) : Add converter for torch.bitwise_not by @blchu in https://github.com/pytorch/TensorRT/pull/1029
    • fixed typos by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1098
    • [FX] --fx-only does not need to check bazel by @frank-wei in https://github.com/pytorch/TensorRT/pull/1147
    • [FX] refactor the fx path in compile function by @frank-wei in https://github.com/pytorch/TensorRT/pull/1141
    • [FX] Create getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1145
    • [FX] move example folder by @frank-wei in https://github.com/pytorch/TensorRT/pull/1149
    • [FX] Sync enhancement done internally at Meta by @yinghai in https://github.com/pytorch/TensorRT/pull/1161
    • Update config.yml by @frank-wei in https://github.com/pytorch/TensorRT/pull/1163
    • Use py3 next() syntax by @ptrblck in https://github.com/pytorch/TensorRT/pull/1159
    • Add missing comma for proper torch versioning in setup.py by @dabauxi in https://github.com/pytorch/TensorRT/pull/1164
    • [docs] Update link to relative path by @zhiqwang in https://github.com/pytorch/TensorRT/pull/1171
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1172
    • fix: fix the model name typo error by @bowang007 in https://github.com/pytorch/TensorRT/pull/1176
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1178
    • [feat]: support slice with dynamic shape by @inocsin in https://github.com/pytorch/TensorRT/pull/1110
    • [FX] Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1184
    • [FX] Update README.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1183
    • fix: Fix PTQ calibration when there are multiple inputs by @peri044 in https://github.com/pytorch/TensorRT/pull/1191
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1194
    • [fix]: fix bug in aten::to, when network only have aten::to layer wil… by @inocsin in https://github.com/pytorch/TensorRT/pull/1108
    • Add .circleci/config.yml by @narendasan in https://github.com/pytorch/TensorRT/pull/1153
    • feat: Upgrade TRT to 8.4 by @peri044 in https://github.com/pytorch/TensorRT/pull/1152
    • feat: Update Pytorch version to 1.12 by @peri044 in https://github.com/pytorch/TensorRT/pull/1177
    • fix: converter renaming already named tensors by @bowang007 in https://github.com/pytorch/TensorRT/pull/1167
    • feat(//py): Use TensorRT to fill in .so libraries automatically if possible by @narendasan in https://github.com/pytorch/TensorRT/pull/1085
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1204
    • fix: fix the parsing related model loading bug by @bowang007 in https://github.com/pytorch/TensorRT/pull/1148
    • feat: support min_block_size != 1 caused fallback nodes re-segmentation by @bowang007 in https://github.com/pytorch/TensorRT/pull/1195
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1208
    • fix: fix the fallback related issue after merging collection by @bowang007 in https://github.com/pytorch/TensorRT/pull/1206
    • Add CMake support to build the libraries by @gcuendet in https://github.com/pytorch/TensorRT/pull/1058
    • Fix typo in EfficientNet-example by @davinnovation in https://github.com/pytorch/TensorRT/pull/1217
    • fix: fix bug that ListConstruct in TRT subgraph when it's entire graph's output by @bowang007 in https://github.com/pytorch/TensorRT/pull/1220
    • fix: fix the error that collection input segmented into trt subgraph by @bowang007 in https://github.com/pytorch/TensorRT/pull/1225
    • feat(//circleci): Adding release automation by @narendasan in https://github.com/pytorch/TensorRT/pull/1215
    • fix: support int tensor * int scaler in aten::mul by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1095
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1221
    • Fix errors in unbind and list slice by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1088
    • Adding a Resnet C++ example by @vinhngx in https://github.com/pytorch/TensorRT/pull/1175
    • [FX] disable 2 of conv3d and type_as tests by @frank-wei in https://github.com/pytorch/TensorRT/pull/1224
    • [feat] Add support for integers in aten::abs converter (#35) by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1232
    • Update PTQ example to fix new compile_spec requirements by @ncomly-nvidia in https://github.com/pytorch/TensorRT/pull/1242
    • feat: support for grouped inputs by @narendasan in https://github.com/pytorch/TensorRT/pull/1201
    • feat: Added support for custom torch operators and converters in torchtrtc by @andi4191 in https://github.com/pytorch/TensorRT/pull/1219
    • Add outputPadding in deconv by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1234
    • chore: Apply linting and ignore new bazel dirs by @narendasan in https://github.com/pytorch/TensorRT/pull/1223
    • added qat-ptq workflow notebook by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1239
    • fix: Update cmake for the new collection files by @narendasan in https://github.com/pytorch/TensorRT/pull/1246
    • chore: ignore dist dir for pre-commit by @narendasan in https://github.com/pytorch/TensorRT/pull/1249
    • chore: Aligning bazel version for consistency across different docker… by @andi4191 in https://github.com/pytorch/TensorRT/pull/1250
    • refactor: Changed the hardcoded values to macros for DLA memory sizes by @andi4191 in https://github.com/pytorch/TensorRT/pull/1247
    • chore: update jetson pytorch baase by @narendasan in https://github.com/pytorch/TensorRT/pull/1251
    • [feat] Add automatic type promotion to element-wise ops by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1240
    • Assorted small fixes by @narendasan in https://github.com/pytorch/TensorRT/pull/1259
    • [FX] remove op_lowering_disallow_list and format revert by @frank-wei in https://github.com/pytorch/TensorRT/pull/1261
    • fix: fix the "schema not found for node" error by @bowang007 in https://github.com/pytorch/TensorRT/pull/1236
    • chore: Fix contributing doc by @peri044 in https://github.com/pytorch/TensorRT/pull/1268
    • feat: support scatter.value and scatter.src by @inocsin in https://github.com/pytorch/TensorRT/pull/1252
    • Internal workspace workflow by @narendasan in https://github.com/pytorch/TensorRT/pull/1269
    • Fix typo in README by @davinnovation in https://github.com/pytorch/TensorRT/pull/1273
    • Support swin/bert with dynamic batch by @Njuapp in https://github.com/pytorch/TensorRT/pull/1270
    • correct sha256sum of cudnn by @Njuapp in https://github.com/pytorch/TensorRT/pull/1278
    • Jetson workspace by @narendasan in https://github.com/pytorch/TensorRT/pull/1280
    • chore(deps): bump @actions/core from 1.8.2 to 1.9.1 in /.github/actions/assigner by @dependabot in https://github.com/pytorch/TensorRT/pull/1287
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1288
    • chore: Fix dataloader in finetune_qat script by @andi4191 in https://github.com/pytorch/TensorRT/pull/1292
    • chore: Truncate long and double for ptq CPP path by @andi4191 in https://github.com/pytorch/TensorRT/pull/1291
    • feat: Add support for aten::square by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1286
    • fix: fix misleading skipping partitioning msg by @bowang007 in https://github.com/pytorch/TensorRT/pull/1289
    • fix: Add int support to constant_pad_nd by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1283
    • fix: Resolve non-determinism in registerSegmentsOutputs by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1284
    • docs: Update docgen task by @narendasan in https://github.com/pytorch/TensorRT/pull/1294
    • update fx notebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1297
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1299
    • fix(tools): Fix linter to not depend on docker by @narendasan in https://github.com/pytorch/TensorRT/pull/1301
    • Support multiple indices for aten::index.Tensor by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1309
    • chore: Adding CMake to the CI by @narendasan in https://github.com/pytorch/TensorRT/pull/1310
    • feat: Upgrade Pytorch to 1.12.1 and TensorRT to 8.4.3.1 by @peri044 in https://github.com/pytorch/TensorRT/pull/1315
    • Fix bug: correct the output shape of aten::index.Tensor by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1314
    • feat (//core/conversion) : Add converter for torch.repeat_interleave ( by @blchu in https://github.com/pytorch/TensorRT/pull/1313
    • chore: Adding NGC build path by @narendasan in https://github.com/pytorch/TensorRT/pull/1311
    • Update lower.py by @frank-wei in https://github.com/pytorch/TensorRT/pull/1324
    • fix!: Fixed Windows compilation failures by @andi4191 in https://github.com/pytorch/TensorRT/pull/1330
    • [feat] Add support for argmax and argmin by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1312
    • chore: Adding a guideline to build on Windows platform by @andi4191 in https://github.com/pytorch/TensorRT/pull/1337
    • chore: Fix data loader issues and nox file paths by @peri044 in https://github.com/pytorch/TensorRT/pull/1281
    • feat(//tools/perf): Refactor perf_run.py, add fx2trt backend support, usage via CLI arguments by @peri044 in https://github.com/pytorch/TensorRT/pull/1254
    • refactor(//tests) : Refactor the test suite by @peri044 in https://github.com/pytorch/TensorRT/pull/1329
    • [feat] add support for aten::reciprocal(int) by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1308
    • [FX] Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1342
    • Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1343
    • enable direct call to fx.compile() by @frank-wei in https://github.com/pytorch/TensorRT/pull/1344
    • fix: add remove_exception pass from torch to fix uninitialized tensor… by @bowang007 in https://github.com/pytorch/TensorRT/pull/1345
    • chore: apply linting to docs by @narendasan in https://github.com/pytorch/TensorRT/pull/1347
    • docs: Adding v1.2.0 and v1.1.1 docs by @narendasan in https://github.com/pytorch/TensorRT/pull/1349
    • Docs for release by @narendasan in https://github.com/pytorch/TensorRT/pull/1350
    • fix: Fixing pybind error on nightly by @andi4191 in https://github.com/pytorch/TensorRT/pull/1285
    • Centralizing Partitioning State by @narendasan in https://github.com/pytorch/TensorRT/pull/1263
    • chore: Fix centralized partititoning by @peri044 in https://github.com/pytorch/TensorRT/pull/1367
    • chore: Move master to test nightly only by @narendasan in https://github.com/pytorch/TensorRT/pull/1370
    • [fix] Avoid layer name conflicts in aten::index by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1377
    • [fix] Fix output dimensions of aten::unbind converter by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1373
    • Einsum converter by @gs-olive in https://github.com/pytorch/TensorRT/pull/1385
    • Atan2 converter by @gs-olive in https://github.com/pytorch/TensorRT/pull/1381
    • [FX] aten2trt and some pass fixes by @frank-wei in https://github.com/pytorch/TensorRT/pull/1390
    • feat: Add converter for aten::sign unary op by @gs-olive in https://github.com/pytorch/TensorRT/pull/1391
    • Add support for aten::squeeze without a dim by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1393
    • [fix] incorrect casting behavior in floor_divide by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1392
    • chore: minor fixes by @peri044 in https://github.com/pytorch/TensorRT/pull/1397
    • fix: torch.std and torch.var support multi-dimensional reductions by @gs-olive in https://github.com/pytorch/TensorRT/pull/1395
    • fix: fix missing float type in shape analysis by @bowang007 in https://github.com/pytorch/TensorRT/pull/1399
    • feat: Rsqrt lowering pass by @gs-olive in https://github.com/pytorch/TensorRT/pull/1394
    • Add correct pip install instructions by @msaroufim in https://github.com/pytorch/TensorRT/pull/1400
    • fix: aten::split behavior with negative indexing by @gs-olive in https://github.com/pytorch/TensorRT/pull/1403
    • fix: fix compilation stuck bug caused by elimination exception by @bowang007 in https://github.com/pytorch/TensorRT/pull/1409
    • [FX] Fix clamping float32 boundary values, aten2trt init check-in, fix slice issues by @frank-wei in https://github.com/pytorch/TensorRT/pull/1415
    • [feat]Add converter for aten::where by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1421
    • [feat]Add converter support for aten::frobenius_norm by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1422
    • chore: Update torch installation paths for NGC by @peri044 in https://github.com/pytorch/TensorRT/pull/1435
    • [feat] Add dependency awareness to torch-trt partitioning by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1304
    • docs: minor changes in Resnet50 example by @przemb in https://github.com/pytorch/TensorRT/pull/1427
    • fix: Ensure proper type inheritance in aten::masked_fill by @gs-olive in https://github.com/pytorch/TensorRT/pull/1430
    • chore: Nox file update from NGC 22.11 release by @peri044 in https://github.com/pytorch/TensorRT/pull/1438
    • fix: Add check to ensure einsum converter has no more than 2 tensor inputs by @gs-olive in https://github.com/pytorch/TensorRT/pull/1439
    • [feat] Add partial converter support for aten::linalg_norm by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1426
    • chore: Lint noxfile.py by @gs-olive in https://github.com/pytorch/TensorRT/pull/1443
    • fix: CUDA error 710 bugfix by @gs-olive in https://github.com/pytorch/TensorRT/pull/1424
    • scalar_to_tensor avoid scalar.to() by @Njuapp in https://github.com/pytorch/TensorRT/pull/1448
    • feat: rewriting param to a Constant if it's a introduced input by @bowang007 in https://github.com/pytorch/TensorRT/pull/1298
    • feat: support int64 <=> int32 auto conversion by @bowang007 in https://github.com/pytorch/TensorRT/pull/1407
    • fix: Device casting issues with certain aten operators by @gs-olive in https://github.com/pytorch/TensorRT/pull/1416
    • feat(//core/partitioning) : Dynamic shapes + fallback by @peri044 in https://github.com/pytorch/TensorRT/pull/1414
    • [fix] unmangle_cls_name for variable length mangled tags by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1454
    • fix: Error with aten::div when using truncation with Int32 tensor inputs by @gs-olive in https://github.com/pytorch/TensorRT/pull/1442
    • fix: fix failed test cases caused by partition API changes by @bowang007 in https://github.com/pytorch/TensorRT/pull/1460
    • fix: Update floor division schema replacement in lowering by @gs-olive in https://github.com/pytorch/TensorRT/pull/1464
    • feat: Add functionality to performance tooling by @gs-olive in https://github.com/pytorch/TensorRT/pull/1451
    • Unifying the FX and TS Frontends by @narendasan in https://github.com/pytorch/TensorRT/pull/1404

    New Contributors

    • @facebook-github-bot made their first contribution in https://github.com/pytorch/TensorRT/pull/1061
    • @frank-wei made their first contribution in https://github.com/pytorch/TensorRT/pull/1064
    • @khabinov made their first contribution in https://github.com/pytorch/TensorRT/pull/1120
    • @blchu made their first contribution in https://github.com/pytorch/TensorRT/pull/1029
    • @yinghai made their first contribution in https://github.com/pytorch/TensorRT/pull/1161
    • @ptrblck made their first contribution in https://github.com/pytorch/TensorRT/pull/1159
    • @dabauxi made their first contribution in https://github.com/pytorch/TensorRT/pull/1164
    • @zhiqwang made their first contribution in https://github.com/pytorch/TensorRT/pull/1171
    • @gcuendet made their first contribution in https://github.com/pytorch/TensorRT/pull/1058
    • @davinnovation made their first contribution in https://github.com/pytorch/TensorRT/pull/1217
    • @dependabot made their first contribution in https://github.com/pytorch/TensorRT/pull/1287
    • @msaroufim made their first contribution in https://github.com/pytorch/TensorRT/pull/1400
    • @przemb made their first contribution in https://github.com/pytorch/TensorRT/pull/1427

    Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.3.0

    Source code(tar.gz)
    Source code(zip)
    libtorchtrt-1.3.0-cudnn8.5-tensorrt8.5-cuda11.7-libtorch1.13.0-x86_64-linux.tar.gz(2.27 MB)
    libtorchtrt-1.3.0-pre-cxx11-abi-cudnn8.5-tensorrt8.5-cuda11.7-libtorch1.13.0-x86_64-linux.tar.gz(2.35 MB)
    torch_tensorrt-1.3.0-cp310-cp310-linux_x86_64.whl(14.31 MB)
    torch_tensorrt-1.3.0-cp37-cp37m-linux_x86_64.whl(14.30 MB)
    torch_tensorrt-1.3.0-cp38-cp38-linux_x86_64.whl(14.32 MB)
    torch_tensorrt-1.3.0-cp39-cp39-linux_x86_64.whl(14.28 MB)
  • v1.2.0(Sep 14, 2022)

    PyTorch 1.12, Collections based I/O, FX Frontend, torchtrtc custom op support, CMake build system and Community Window Support

    Torch-TensorRT 1.2.0 targets PyTorch 1.12, CUDA 11.6, cuDNN 8.4 and TensorRT 8.4. This release focuses on a couple key new APIs to handle function I/O that uses collection types which should enable whole new model classes to be compiled by Torch-TensorRT without source code modification. It also introduces the "FX Frontend", a new frontend for Torch-TensorRT which leverages FX, a high level IR built into PyTorch with extensive Python APIs. For uses cases which do not need to be run outside of Python this may be a strong option to try as it is easily extensible in a familar development enviornment. In Torch-TensorRT 1.2.0, the FX frontend should be considered beta level in stability. torchtrtc has received improvements which target the ability to handle operators outside of the core PyTorch op set. This includes custom operators from libraries such as torchvision and torchtext. Similarlly users can provide custom converters to torchtrtc to extend the compilers support from the command line instead of having to write an application to do so. Finally, Torch-TensorRT introduces community supported Windows and CMake support.

    New Dependencies

    nvidia-tensorrt

    For previous versions of Torch-TensorRT, users had to install TensorRT via system package manager and modify their LD_LIBRARY_PATH in order to set up Torch-TensorRT. Now users should install the TensorRT Python API as part of the installation proceedure. This can be done via the following steps:

    pip install nvidia-pyindex
    pip install nvidia-tensorrt==8.4.3.1
    pip install torch-tensorrt==1.2.0 -f https://github.com/pytorch/tensorrt/releases
    

    Installing the TensorRT pip package will allow Torch-TensorRT to automatically load the TensorRT libraries without any modification to enviornment variables. It is also a necessary dependency for the FX Frontend.

    torchvision

    Some FX frontend converters are designed to target operators from 3rd party libraries like torchvision. As such, you must have torchvision installed in order to use them. However, this dependency is optional for cases where you do not need this support.

    Jetson

    Starting from this release we will be distributing precompiled binaries of our NGC release branches for aarch64 (as well as x86_64), starting with ngc/22.11. These releases are designed to be paired with NVIDIA distributed builds of PyTorch including the NGC containers and Jetson builds and are equivalent to the prepackaged distribution of Torch-TensorRT that comes in the containers. They represent the state of the master branch at the time of branch cutting so may lag in features by a month or so. These releases will come separately to minor version releases like this one. Therefore going forward, these NGC releases should be the primary release channel used on Jetson (including for building from source).

    NOTE: NGC PyTorch builds are not identical to builds you might install through normal channels like pytorch.org. In the past this has caused issues in portability between pytorch.org builds and NGC builds. Therefore we strongly recommend in workflows such as exporting a TorchScript module on an x86 machine and then compiling on Jetson to ensure you are using the NGC container release on x86 for your host machine operations. More information about Jetson support can be found along side the 22.07 release (https://github.com/pytorch/TensorRT/releases/tag/v1.2.0a0.nv22.07)

    Collections based I/O [Experimental]

    Torch-TensorRT previously has operated under the assumption that nn.Module forward functions can trivially be reduced to the form forward([Tensor]) -> [Tensor]. Typically this implies functions fo the form forward(Tensor, Tensor, ... Tensor) -> (Tensor, Tensor, ..., Tensor). However as model complexity increases, grouping inputs may make it easier to manage many inputs. Therefore, function signatures similar to forward([Tensor], (Tensor, Tensor)) -> [Tensor] or forward((Tensor, Tensor)) -> (Tensor, (Tensor, Tensor)) might be more common. In Torch-TensorRT 1.2.0, more of these kinds of uses cases are supported using the new experimental input_signature compile spec API. This API allows users to group Input specs similar to how they might group the input Tensors they would use to call the original module's forward function. This informs Torch-TensorRT on how to map a Tensor input from its location in a group to the engine and from the engine into its grouping returned back to the user.

    To make this concrete consider the following standard case:

    class StandardTensorInput(nn.Module):
        def __init__(self):
            super(StandardTensorInput, self).__init__()
    
        def forward(self, x, y):
            r = x + y
            return r
    
    x = torch.Tensor([1,2,3]).to("cuda")
    y = torch.Tensor([4,5,6]).to("cuda")
    module = StandardTensorInput().eval().to("cuda")
    
    trt_module = torch_tensorrt.compile(
        module,
        inputs=[
            torch_tensorrt.Input(x.shape),
            torch_tensorrt.Input(y.shape)
        ],
        min_block_size=1
    )
    
    out = trt_module(x,y)
    print(out)
    

    Here a user has defined two explicit tensor inputs and used the existing list based API to define the input specs.

    With Torch-TensorRT the following use cases are now possible using the new input_signature API:

    • Tuple based input collection
    class TupleInput(nn.Module):
        def __init__(self):
            super(TupleInput, self).__init__()
    
        def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
            r = z[0] + z[1]
            return r
    
    x = torch.Tensor([1,2,3]).to("cuda")
    y = torch.Tensor([4,5,6]).to("cuda")
    module = TupleInput().eval().to("cuda")
    
    trt_module = torch_tensorrt.compile(
        module,
        input_signature=((x, y),), # Note how inputs are grouped with the new API
        min_block_size=1
    )
    
    out = trt_module((x,y))
    print(out)
    
    • List based input collection
    class ListInput(nn.Module):
        def __init__(self):
            super(ListInput, self).__init__()
    
        def forward(self, z: List[torch.Tensor]):
            r = z[0] + z[1]
            return r
    
    x = torch.Tensor([1,2,3]).to("cuda")
    y = torch.Tensor([4,5,6]).to("cuda")
    module = ListInput().eval().to("cuda")
    
    trt_module = torch_tensorrt.compile(
        module,
        input_signature=([x,y],), # Again, note how inputs are grouped with the new API
        min_block_size=1
    )
    
    out = trt_module([x,y])
    print(out)
    

    Note how the input specs (in this case just example tensors) are provided to the compiler. The input_signature argument expects a Tuple[Union[torch.Tensor, torch_tensorrt.Input, List, Tuple]] grouped in a format representative of how the function would be called. In these cases its just a list or tuple of specs.

    More advanced cases are supported as we:

    • Tuple I/O
    class TupleInputOutput(nn.Module):
        def __init__(self):
            super(TupleInputOutput, self).__init__()
    
        def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
            r1 = z[0] + z[1]
            r2 = z[0] - z[1]
            r1 = r1 * 10
            r = (r1, r2)
            return r
    
    x = torch.Tensor([1,2,3For previous versions of Torch-TensorRT, users had to install TensorRT via ]).to("cuda")
    y = torch.Tensor([4,5,6]).to("cuda")
    module = TupleInputOutput()
    
    trt_module = torch_tensorrt.compile(
        module,
        input_signature=((x,y),), # Again, note how inputs are grouped with the new API
        min_block_size=1
    )
    
    out = trt_module((x,y))
    print(out)
    
    • List I/O
    class ListInputOutput(nn.Module):
        def __init__(self):
            super(ListInputOutput, self).__init__()
    
        def forward(self, z: List[torch.Tensor]):
            r1 = z[0] + z[1]
            r2 = z[0] - z[1]
            r = [r1, r2]
            return r
    
    x = torch.Tensor([1,2,3]).to("cuda")
    y = torch.Tensor([4,5,6]).to("cuda")
    module = ListInputOutput()
    
    trt_module = torch_tensorrt.compile(
        module,
        input_signature=([x,y],), # Again, note how inputs are grouped with the new API
        min_block_size=1
    )
    
    out = trt_module((x,y))
    print(out)
    
    • Multple Groups of Mixed Types
    class MultiGroupIO(nn.Module):
        def __init__(self):
            super(MultiGroupIO, self).__init__()
    
        def forward(self, z: List[torch.Tensor], a: Tuple[torch.Tensor, torch.Tensor]):
            r1 = z[0] + z[1]
            r2 = a[0] + a[1]
            r3 = r1 - r2
            r4 = [r1, r2]
            return (r3, r4)
        
    x = torch.Tensor([1,2,3]).to("cuda")
    y = torch.Tensor([4,5,6]).to("cuda")
    module = MultiGroupIO().eval.to("cuda")
    
    trt_module = torch_tensorrt.compile(
        module,
        input_signature=([x,y],(x,y)), # Again, note how inputs are grouped with the new API
        min_block_size=1
    )
    
    out = trt_module([x,y],(x,y))
    print(out)   
    

    These features are also supported in C++ as well:

    
    torch::jit::Module mod;
    try {
      // Deserialize the ScriptModule from a file using torch::jit::load().
      mod = torch::jit::load(path);
    } catch (const c10::Error& e) {
      std::cerr << "error loading the model\n";
    }
    mod.eval();
    mod.to(torch::kCUDA);
    
    std::vector<torch::jit::IValue> inputs_;
    
    for (auto in : inputs) {
      inputs_.push_back(torch::jit::IValue(in.clone()));
    }
    
    std::vector<torch::jit::IValue> complex_inputs;
    auto input_list = c10::impl::GenericList(c10::TensorType::get());
    input_list.push_back(inputs_[0]);
    input_list.push_back(inputs_[0]);
    
    torch::jit::IValue input_list_ivalue = torch::jit::IValue(input_list);
    
    complex_inputs.push_back(input_list_ivalue);
    
    auto input_shape = torch_tensorrt::Input(in0.sizes(), torch_tensorrt::DataType::kHalf);
    auto input_shape_ivalue = torch::jit::IValue(std::move(c10::make_intrusive<torch_tensorrt::Input>(input_shape)));
    
    c10::TypePtr elementType = input_shape_ivalue.type();
    auto list = c10::impl::GenericList(elementType);
    list.push_back(input_shape_ivalue);
    list.push_back(input_shape_ivalue);
    
    torch::jit::IValue complex_input_shape(list);
    std::tuple<torch::jit::IValue> input_tuple2(complex_input_shape);
    torch::jit::IValue complex_input_shape2(input_tuple2);
    
    auto compile_settings = torch_tensorrt::ts::CompileSpec(complex_input_shape2);
    compile_settings.min_block_size = 1;
    compile_settings.enabled_precisions = {torch::kHalf};
    
    // // Compile module
    auto trt_mod = torch_tensorrt::ts::compile(mod, compile_settings);
    auto trt_out = trt_mod.forward(complex_inputs);
    

    Currently this feature should be considered experimental, APIs may be subject to change or folded into existing APIs. There are also limitations introduced by using this feature including the following:

    • Not all collection types are supported (e.g. Dict, namedtuple)
    • Not being able to require_full_compilation while using this feature
    • Certain operators are required to run in PyTorch throughout the graph which may impact performance
    • The maximum depth of collections nesting is limited.
      These limitations will be addressed in subsequent versions.

    Adding FX frontend to Torch-TensorRT [Beta]

    This release includes the FX as one of its supported IRs to convert torch models to TensorRT through the new FX frontend. At a high level, this path transforms the model into or consumes an FX graph and similar to the TorchScript frontend converts the graph to TensorRT through the use of a library of converters. The key difference is that it is implemented purely in Python. The role of this FX frontend is to supplement the TS lowering path and to provide users better ease of use and easier extensibility in use cases where removing Python as a dependency is not strictly necessary. Detailed user instructions can be find in the document. The FX path examples are located under //examples/fx The FX path unit tests are located under //py/torch_tensorrt/fx/tests

    Custom operators and converters in Torch-TensorRT

    While both the C++ API and Python API provide systems to include and convert custom operators in your model (for instance those implemented in torchvision) torchtrtc has been limited to the core opset. In Torch-TensorRT 1.2.0 two new flags have been added to torchtrtc.

        --custom-torch-ops                (repeatable) Shared object/DLL containing custom torch operators
        --custom-converters               (repeatable) Shared object/DLL containing custom converters
    

    These arguments accept paths to .so or DLL files which define custom operators for PyTorch or custom converters for Torch-TensorRT. These files will get DL_OPEN'd at runtime to extend the op and converter libraries.

    For example:

    torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-torch-ops=<path to custom library .so file> --custom-converters=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16
    

    Community CMake and Windows support

    Thanks to the great work of @gcuendet and others, CMake and consequentially Windows support has been added to the project! Users on Linux and Windows can now build the C++ API using this system and using torch_tensorrt_runtime.dll add support for executing Torch-TensorRT programs on Windows in both Python and C++. Detailed information on how to use this build system can be found here: https://pytorch.org/TensorRT/getting_started/installation.html

    Bazel will continue to be the primary build system for the project and all testing and distributed builds will be built and run with Bazel (including future official Windows support) so users should consider this still the canonical version of Torch-TensorRT. However we aim to ensure as best as we can that the CMake system will be able to build the project properly including on Windows. Contributions to continue to grow the support for this build system and Windows as a platform are definitely welcomed.

    Known Limitations

    • Collections I/O
      • Not all collection types are supported (e.g. Dict, namedtuple)
      • Not being able to require_full_compilation while using this feature
      • Certain operators are required to run in PyTorch throughout the graph which may impact performance
      • The maximum depth of collections nesting is limited.
    • FX
      • Some of FX operators have limited dynamic shape capability. Please check here.
      • Control flow in model could not be handled
    • Python API via the CMake build system.

    Dependencies

    - Bazel 5.2.0
    - LibTorch 1.12.1
    - CUDA 11.6 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build)
    - cuDNN 8.4.1.50
    - TensorRT 8.4.3.1
    

    Operators Supported (TorchScript)

    Operators Currently Supported Through Converters

    • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
    • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
    • aten::abs(Tensor self) -> (Tensor)
    • aten::acos(Tensor self) -> (Tensor)
    • aten::acosh(Tensor self) -> (Tensor)
    • aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
    • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
    • aten::adaptive_avg_pool3d(Tensor self, int[3] output_size) -> (Tensor)
    • aten::adaptive_max_pool1d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
    • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::argmax(Tensor self, int dim, bool keepdim=False) -> (Tensor)
    • aten::argmin(Tensor self, int dim, bool keepdim=False) -> (Tensor)
    • aten::asin(Tensor self) -> (Tensor)
    • aten::asinh(Tensor self) -> (Tensor)
    • aten::atan(Tensor self) -> (Tensor)
    • aten::atanh(Tensor self) -> (Tensor)
    • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
    • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::bitwise_not(Tensor self) -> (Tensor)
    • aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
    • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::ceil(Tensor self) -> (Tensor)
    • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
    • aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
    • aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
    • aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
    • aten::cos(Tensor self) -> (Tensor)
    • aten::cosh(Tensor self) -> (Tensor)
    • aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
    • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> (Tensor)
    • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
    • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
    • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::erf(Tensor self) -> (Tensor)
    • aten::exp(Tensor self) -> (Tensor)
    • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
    • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
    • aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
    • aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
    • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
    • aten::floor(Tensor self) -> (Tensor)
    • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
    • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
    • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
    • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
    • aten::index.Tensor(Tensor self, Tensor?[] indices) -> (Tensor)
    • aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
    • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
    • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
    • aten::log(Tensor self) -> (Tensor)
    • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
    • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
    • aten::matmul(Tensor self, Tensor other) -> (Tensor)
    • aten::max(Tensor self) -> (Tensor)
    • aten::max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
    • aten::max.other(Tensor self, Tensor other) -> (Tensor)
    • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::min(Tensor self) -> (Tensor)
    • aten::min.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
    • aten::min.other(Tensor self, Tensor other) -> (Tensor)
    • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
    • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
    • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::neg(Tensor self) -> (Tensor)
    • aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
    • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
    • aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
    • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
    • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
    • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
    • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::reciprocal(Tensor self) -> (Tensor)
    • aten::reflection_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::reflection_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::relu(Tensor input) -> (Tensor)
    • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
    • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
    • aten::repeat_interleave.self_int(Tensor self, int repeats, int? dim=None, *, int? output_size=None) -> (Tensor)
    • aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
    • aten::reshape(Tensor self, int[] shape) -> (Tensor)
    • aten::roll(Tensor self, int[1] shifts, int[1] dims=[]) -> (Tensor)
    • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::scatter.src(Tensor self, int dim, Tensor index, Tensor src) -> (Tensor)
    • aten::scatter.value(Tensor self, int dim, Tensor index, Scalar value) -> (Tensor)
    • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
    • aten::sigmoid(Tensor input) -> (Tensor)
    • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
    • aten::sin(Tensor self) -> (Tensor)
    • aten::sinh(Tensor self) -> (Tensor)
    • aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
    • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
    • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
    • aten::split.sizes(Tensor(a -> *) self, int[] split_size, int dim=0) -> (Tensor[])
    • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::sqrt(Tensor self) -> (Tensor)
    • aten::square(Tensor self) -> (Tensor)
    • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::t(Tensor self) -> (Tensor)
    • aten::tan(Tensor self) -> (Tensor)
    • aten::tanh(Tensor input) -> (Tensor)
    • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
    • aten::to.device(Tensor(a) self, Device device, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor(a))
    • aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
    • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
    • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
    • aten::unbind.int(Tensor(a -> *) self, int dim=0) -> (Tensor[])
    • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
    • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
    • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
    • trt::const(Tensor self) -> (Tensor)

    Operators Currently Supported Through Evaluators

    • aten::Bool.float(float b) -> (bool)
    • aten::Bool.int(int a) -> (bool)
    • aten::Float.Scalar(Scalar a) -> float
    • aten::Float.bool(bool a) -> float
    • aten::Float.int(int a) -> float
    • aten::Int.Scalar(Scalar a) -> int
    • aten::Int.bool(bool a) -> int
    • aten::Int.float(float a) -> int
    • aten::Int.int(int a) -> int
    • aten::and(int a, int b) -> (bool)
    • aten::and.bool(bool a, bool b) -> (bool)
    • aten::__derive_index(int idx, int start, int step) -> int
    • aten::getitem.t(t list, int idx) -> (t(*))
    • aten::is(t1 self, t2 obj) -> bool
    • aten::isnot(t1 self, t2 obj) -> bool
    • aten::not(bool self) -> bool
    • aten::or(int a, int b) -> (bool)
    • aten::__range_length(int lo, int hi, int step) -> int
    • aten::__round_to_zero_floordiv(int a, int b) -> (int)
    • aten::xor(int a, int b) -> (bool)
    • aten::add.float(float a, float b) -> (float)
    • aten::add.int(int a, int b) -> (int)
    • aten::add.str(str a, str b) -> (str)
    • aten::add_.t(t self, t[] b) -> (t[])
    • aten::append.t(t self, t(c -> *) el) -> (t)
    • aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
    • aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
    • aten::dim(Tensor self) -> int
    • aten::div.float(float a, float b) -> (float)
    • aten::div.int(int a, int b) -> (float)
    • aten::eq.bool(bool a, bool b) -> (bool)
    • aten::eq.float(float a, float b) -> (bool)
    • aten::eq.float_int(float a, int b) -> (bool)
    • aten::eq.int(int a, int b) -> (bool)
    • aten::eq.int_float(int a, float b) -> (bool)
    • aten::eq.str(str a, str b) -> (bool)
    • aten::extend.t(t self, t[] other) -> ()
    • aten::floor.float(float a) -> (int)
    • aten::floor.int(int a) -> (int)
    • aten::floordiv.float(float a, float b) -> (int)
    • aten::floordiv.int(int a, int b) -> (int)
    • aten::format(str self, ...) -> (str)
    • aten::ge.bool(bool a, bool b) -> (bool)
    • aten::ge.float(float a, float b) -> (bool)
    • aten::ge.float_int(float a, int b) -> (bool)
    • aten::ge.int(int a, int b) -> (bool)
    • aten::ge.int_float(int a, float b) -> (bool)
    • aten::gt.bool(bool a, bool b) -> (bool)
    • aten::gt.float(float a, float b) -> (bool)
    • aten::gt.float_int(float a, int b) -> (bool)
    • aten::gt.int(int a, int b) -> (bool)
    • aten::gt.int_float(int a, float b) -> (bool)
    • aten::is_floating_point(Tensor self) -> (bool)
    • aten::le.bool(bool a, bool b) -> (bool)
    • aten::le.float(float a, float b) -> (bool)
    • aten::le.float_int(float a, int b) -> (bool)
    • aten::le.int(int a, int b) -> (bool)
    • aten::le.int_float(int a, float b) -> (bool)
    • aten::len.t(t[] a) -> (int)
    • aten::lt.bool(bool a, bool b) -> (bool)
    • aten::lt.float(float a, float b) -> (bool)
    • aten::lt.float_int(float a, int b) -> (bool)
    • aten::lt.int(int a, int b) -> (bool)
    • aten::lt.int_float(int a, float b) -> (bool)
    • aten::mul.float(float a, float b) -> (float)
    • aten::mul.int(int a, int b) -> (int)
    • aten::ne.bool(bool a, bool b) -> (bool)
    • aten::ne.float(float a, float b) -> (bool)
    • aten::ne.float_int(float a, int b) -> (bool)
    • aten::ne.int(int a, int b) -> (bool)
    • aten::ne.int_float(int a, float b) -> (bool)
    • aten::neg.int(int a) -> (int)
    • aten::numel(Tensor self) -> int
    • aten::pow.float(float a, float b) -> (float)
    • aten::pow.float_int(float a, int b) -> (float)
    • aten::pow.int(int a, int b) -> (float)
    • aten::pow.int_float(int a, float b) -> (float)
    • aten::size(Tensor self) -> (int[])
    • aten::size.int(Tensor self, int dim) -> (int)
    • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
    • aten::sqrt.float(float a) -> (float)
    • aten::sqrt.int(int a) -> (float)
    • aten::sub.float(float a, float b) -> (float)
    • aten::sub.int(int a, int b) -> (int)
    • aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
    • prim::TupleIndex(Any tup, int i) -> (Any)
    • prim::dtype(Tensor a) -> (int)
    • prim::max.bool(bool a, bool b) -> (bool)
    • prim::max.float(float a, float b) -> (bool)
    • prim::max.float_int(float a, int b) -> (bool)
    • prim::max.int(int a, int b) -> (bool)
    • prim::max.int_float(int a, float b) -> (bool)
    • prim::max.self_int(int[] self) -> (int)
    • prim::min.bool(bool a, bool b) -> (bool)
    • prim::min.float(float a, float b) -> (bool)
    • prim::min.float_int(float a, int b) -> (bool)
    • prim::min.int(int a, int b) -> (bool)
    • prim::min.int_float(int a, float b) -> (bool)
    • prim::min.self_int(int[] self) -> (int)
    • prim::shape(Tensor a) -> (int[])

    What's Changed

    • chore: Bump version to 1.2.0a0 by @narendasan in https://github.com/pytorch/TensorRT/pull/1044
    • feat: Extending nox for cxx11 ABI version by @andi4191 in https://github.com/pytorch/TensorRT/pull/1013
    • docs: Update the documentation theme to PyTorch by @narendasan in https://github.com/pytorch/TensorRT/pull/1063
    • Adding Code of Conduct file by @facebook-github-bot in https://github.com/pytorch/TensorRT/pull/1061
    • Update CONTRIBUTING.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1064
    • feat: Optimize hub.py download by @andi4191 in https://github.com/pytorch/TensorRT/pull/1022
    • Adding an action to automatically assign reviewers and assignees by @narendasan in https://github.com/pytorch/TensorRT/pull/1078
    • Add PR assigner support by @narendasan in https://github.com/pytorch/TensorRT/pull/1080
    • (//core): Align with prim::Enter in module fallback by @andi4191 in https://github.com/pytorch/TensorRT/pull/991
    • (//core): Added a variant for aten::split by @andi4191 in https://github.com/pytorch/TensorRT/pull/992
    • feat(nox): Replacing session with environment variable by @andi4191 in https://github.com/pytorch/TensorRT/pull/1057
    • Refactor the internal codebase from fx2trt_oss to torch_tensorrt by @frank-wei in https://github.com/pytorch/TensorRT/pull/1104
    • format by buildifier by @frank-wei in https://github.com/pytorch/TensorRT/pull/1106
    • [fx2trt] Modify lower setting class by @frank-wei in https://github.com/pytorch/TensorRT/pull/1107
    • Modified the notebooks directory's README file by @svenchilton in https://github.com/pytorch/TensorRT/pull/1102
    • [FX] Sync to OSS by @frank-wei in https://github.com/pytorch/TensorRT/pull/1118
    • [fx_acc] Add acc_tracer support for torch.mm by @khabinov in https://github.com/pytorch/TensorRT/pull/1120
    • Added Triton deployment instructions to documentation by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1116
    • amending triton deployment docs by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1126
    • fix: Update broken repo hyperlink by @lamhoangtung in https://github.com/pytorch/TensorRT/pull/1131
    • fix: Fix keep_dims functionality for aten::max by @peri044 in https://github.com/pytorch/TensorRT/pull/1099
    • fix(tests/core/partitioning): Fix tests of refactoring segmentation in partitioning by @peri044 in https://github.com/pytorch/TensorRT/pull/1140
    • feat(//tests): Update rtol and atol based tolerance for test cases by @andi4191 in https://github.com/pytorch/TensorRT/pull/1055
    • doc: add the explanation for partition phases on docs by @bowang007 in https://github.com/pytorch/TensorRT/pull/1090
    • feat (//cpp): Using atol and rtol based tolerance threshold for torchtrtc by @andi4191 in https://github.com/pytorch/TensorRT/pull/1052
    • CI/CD setup by @frank-wei in https://github.com/pytorch/TensorRT/pull/1137
    • Update README.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1142
    • [fx2trt] Engineholder feature improvement, test fixes by @frank-wei in https://github.com/pytorch/TensorRT/pull/1143
    • feat (//core/conversion) : Add converter for torch.bitwise_not by @blchu in https://github.com/pytorch/TensorRT/pull/1029
    • fixed typos by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1098
    • [FX] --fx-only does not need to check bazel by @frank-wei in https://github.com/pytorch/TensorRT/pull/1147
    • [FX] refactor the fx path in compile function by @frank-wei in https://github.com/pytorch/TensorRT/pull/1141
    • [FX] Create getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1145
    • [FX] move example folder by @frank-wei in https://github.com/pytorch/TensorRT/pull/1149
    • [FX] Sync enhancement done internally at Meta by @yinghai in https://github.com/pytorch/TensorRT/pull/1161
    • Update config.yml by @frank-wei in https://github.com/pytorch/TensorRT/pull/1163
    • Use py3 next() syntax by @ptrblck in https://github.com/pytorch/TensorRT/pull/1159
    • Add missing comma for proper torch versioning in setup.py by @dabauxi in https://github.com/pytorch/TensorRT/pull/1164
    • [docs] Update link to relative path by @zhiqwang in https://github.com/pytorch/TensorRT/pull/1171
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1172
    • fix: fix the model name typo error by @bowang007 in https://github.com/pytorch/TensorRT/pull/1176
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1178
    • [feat]: support slice with dynamic shape by @inocsin in https://github.com/pytorch/TensorRT/pull/1110
    • [FX] Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1184
    • [FX] Update README.md by @frank-wei in https://github.com/pytorch/TensorRT/pull/1183
    • fix: Fix PTQ calibration when there are multiple inputs by @peri044 in https://github.com/pytorch/TensorRT/pull/1191
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1194
    • [fix]: fix bug in aten::to, when network only have aten::to layer wil… by @inocsin in https://github.com/pytorch/TensorRT/pull/1108
    • Add .circleci/config.yml by @narendasan in https://github.com/pytorch/TensorRT/pull/1153
    • feat: Upgrade TRT to 8.4 by @peri044 in https://github.com/pytorch/TensorRT/pull/1152
    • feat: Update Pytorch version to 1.12 by @peri044 in https://github.com/pytorch/TensorRT/pull/1177
    • fix: converter renaming already named tensors by @bowang007 in https://github.com/pytorch/TensorRT/pull/1167
    • feat(//py): Use TensorRT to fill in .so libraries automatically if possible by @narendasan in https://github.com/pytorch/TensorRT/pull/1085
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1204
    • fix: fix the parsing related model loading bug by @bowang007 in https://github.com/pytorch/TensorRT/pull/1148
    • feat: support min_block_size != 1 caused fallback nodes re-segmentation by @bowang007 in https://github.com/pytorch/TensorRT/pull/1195
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1208
    • fix: fix the fallback related issue after merging collection by @bowang007 in https://github.com/pytorch/TensorRT/pull/1206
    • Add CMake support to build the libraries by @gcuendet in https://github.com/pytorch/TensorRT/pull/1058
    • Fix typo in EfficientNet-example by @davinnovation in https://github.com/pytorch/TensorRT/pull/1217
    • fix: fix bug that ListConstruct in TRT subgraph when it's entire graph's output by @bowang007 in https://github.com/pytorch/TensorRT/pull/1220
    • fix: fix the error that collection input segmented into trt subgraph by @bowang007 in https://github.com/pytorch/TensorRT/pull/1225
    • feat(//circleci): Adding release automation by @narendasan in https://github.com/pytorch/TensorRT/pull/1215
    • fix: support int tensor * int scaler in aten::mul by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1095
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1221
    • Fix errors in unbind and list slice by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1088
    • Adding a Resnet C++ example by @vinhngx in https://github.com/pytorch/TensorRT/pull/1175
    • [FX] disable 2 of conv3d and type_as tests by @frank-wei in https://github.com/pytorch/TensorRT/pull/1224
    • [feat] Add support for integers in aten::abs converter (#35) by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1232
    • Update PTQ example to fix new compile_spec requirements by @ncomly-nvidia in https://github.com/pytorch/TensorRT/pull/1242
    • feat: support for grouped inputs by @narendasan in https://github.com/pytorch/TensorRT/pull/1201
    • feat: Added support for custom torch operators and converters in torchtrtc by @andi4191 in https://github.com/pytorch/TensorRT/pull/1219
    • Add outputPadding in deconv by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1234
    • chore: Apply linting and ignore new bazel dirs by @narendasan in https://github.com/pytorch/TensorRT/pull/1223
    • added qat-ptq workflow notebook by @tanayvarshney in https://github.com/pytorch/TensorRT/pull/1239
    • fix: Update cmake for the new collection files by @narendasan in https://github.com/pytorch/TensorRT/pull/1246
    • chore: ignore dist dir for pre-commit by @narendasan in https://github.com/pytorch/TensorRT/pull/1249
    • chore: Aligning bazel version for consistency across different docker… by @andi4191 in https://github.com/pytorch/TensorRT/pull/1250
    • refactor: Changed the hardcoded values to macros for DLA memory sizes by @andi4191 in https://github.com/pytorch/TensorRT/pull/1247
    • chore: update jetson pytorch baase by @narendasan in https://github.com/pytorch/TensorRT/pull/1251
    • [feat] Add automatic type promotion to element-wise ops by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1240
    • Assorted small fixes by @narendasan in https://github.com/pytorch/TensorRT/pull/1259
    • [FX] remove op_lowering_disallow_list and format revert by @frank-wei in https://github.com/pytorch/TensorRT/pull/1261
    • fix: fix the "schema not found for node" error by @bowang007 in https://github.com/pytorch/TensorRT/pull/1236
    • chore: Fix contributing doc by @peri044 in https://github.com/pytorch/TensorRT/pull/1268
    • feat: support scatter.value and scatter.src by @inocsin in https://github.com/pytorch/TensorRT/pull/1252
    • Internal workspace workflow by @narendasan in https://github.com/pytorch/TensorRT/pull/1269
    • Fix typo in README by @davinnovation in https://github.com/pytorch/TensorRT/pull/1273
    • Support swin/bert with dynamic batch by @Njuapp in https://github.com/pytorch/TensorRT/pull/1270
    • Update release 1.2 by @narendasan in https://github.com/pytorch/TensorRT/pull/1275
    • correct sha256sum of cudnn by @Njuapp in https://github.com/pytorch/TensorRT/pull/1278
    • Update release branch by @narendasan in https://github.com/pytorch/TensorRT/pull/1279
    • Jetson workspace by @narendasan in https://github.com/pytorch/TensorRT/pull/1280
    • chore(deps): bump @actions/core from 1.8.2 to 1.9.1 in /.github/actions/assigner by @dependabot in https://github.com/pytorch/TensorRT/pull/1287
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1288
    • chore: Fix dataloader in finetune_qat script by @andi4191 in https://github.com/pytorch/TensorRT/pull/1292
    • chore: Truncate long and double for ptq CPP path by @andi4191 in https://github.com/pytorch/TensorRT/pull/1291
    • feat: Add support for aten::square by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1286
    • fix: fix misleading skipping partitioning msg by @bowang007 in https://github.com/pytorch/TensorRT/pull/1289
    • fix: Add int support to constant_pad_nd by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1283
    • fix: Resolve non-determinism in registerSegmentsOutputs by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1284
    • docs: Update docgen task by @narendasan in https://github.com/pytorch/TensorRT/pull/1294
    • update fx notebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1297
    • [FX] Changes done internally at Facebook by @frank-wei in https://github.com/pytorch/TensorRT/pull/1299
    • fix(tools): Fix linter to not depend on docker by @narendasan in https://github.com/pytorch/TensorRT/pull/1301
    • Update release branch by @narendasan in https://github.com/pytorch/TensorRT/pull/1300
    • Update release branch by @narendasan in https://github.com/pytorch/TensorRT/pull/1307
    • Support multiple indices for aten::index.Tensor by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1309
    • chore: Adding CMake to the CI by @narendasan in https://github.com/pytorch/TensorRT/pull/1310
    • feat: Upgrade Pytorch to 1.12.1 and TensorRT to 8.4.3.1 by @peri044 in https://github.com/pytorch/TensorRT/pull/1315
    • Fix bug: correct the output shape of aten::index.Tensor by @ruoqianguo in https://github.com/pytorch/TensorRT/pull/1314
    • feat (//core/conversion) : Add converter for torch.repeat_interleave ( by @blchu in https://github.com/pytorch/TensorRT/pull/1313
    • chore: Adding NGC build path by @narendasan in https://github.com/pytorch/TensorRT/pull/1311
    • Update release by @narendasan in https://github.com/pytorch/TensorRT/pull/1320
    • Update lower.py by @frank-wei in https://github.com/pytorch/TensorRT/pull/1324
    • fix!: Fixed Windows compilation failures by @andi4191 in https://github.com/pytorch/TensorRT/pull/1330
    • [feat] Add support for argmax and argmin by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1312
    • chore: Adding a guideline to build on Windows platform by @andi4191 in https://github.com/pytorch/TensorRT/pull/1337
    • chore: Fix data loader issues and nox file paths by @peri044 in https://github.com/pytorch/TensorRT/pull/1281
    • feat(//tools/perf): Refactor perf_run.py, add fx2trt backend support, usage via CLI arguments by @peri044 in https://github.com/pytorch/TensorRT/pull/1254
    • refactor(//tests) : Refactor the test suite by @peri044 in https://github.com/pytorch/TensorRT/pull/1329
    • [feat] add support for aten::reciprocal(int) by @mfeliz-cruise in https://github.com/pytorch/TensorRT/pull/1308
    • Update release branch with latest test fixes by @narendasan in https://github.com/pytorch/TensorRT/pull/1339
    • [FX] Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1342
    • Update getting_started_with_fx_path.rst by @frank-wei in https://github.com/pytorch/TensorRT/pull/1343
    • enable direct call to fx.compile() by @frank-wei in https://github.com/pytorch/TensorRT/pull/1344
    • fix: add remove_exception pass from torch to fix uninitialized tensor… by @bowang007 in https://github.com/pytorch/TensorRT/pull/1345
    • chore: apply linting to docs by @narendasan in https://github.com/pytorch/TensorRT/pull/1347
    • Update release branch by @narendasan in https://github.com/pytorch/TensorRT/pull/1348

    New Contributors

    • @facebook-github-bot made their first contribution in https://github.com/pytorch/TensorRT/pull/1061
    • @frank-wei made their first contribution in https://github.com/pytorch/TensorRT/pull/1064
    • @khabinov made their first contribution in https://github.com/pytorch/TensorRT/pull/1120
    • @blchu made their first contribution in https://github.com/pytorch/TensorRT/pull/1029
    • @yinghai made their first contribution in https://github.com/pytorch/TensorRT/pull/1161
    • @ptrblck made their first contribution in https://github.com/pytorch/TensorRT/pull/1159
    • @dabauxi made their first contribution in https://github.com/pytorch/TensorRT/pull/1164
    • @zhiqwang made their first contribution in https://github.com/pytorch/TensorRT/pull/1171
    • @gcuendet made their first contribution in https://github.com/pytorch/TensorRT/pull/1058
    • @davinnovation made their first contribution in https://github.com/pytorch/TensorRT/pull/1217
    • @dependabot made their first contribution in https://github.com/pytorch/TensorRT/pull/1287

    Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.2.0

    Source code(tar.gz)
    Source code(zip)
    libtorchtrt-1.2.0-cudnn8.4-tensorrt8.4-cuda11.6-libtorch1.12.1-x86_64-linux.tar.gz(1.94 MB)
    libtorchtrt-1.2.0-pre-cxx11-abi-cudnn8.4-tensorrt8.4-cuda11.6-libtorch1.12.1-x86_64-linux.tar.gz(1.89 MB)
    torch_tensorrt-1.2.0-cp310-cp310-linux_x86_64.whl(12.54 MB)
    torch_tensorrt-1.2.0-cp37-cp37m-linux_x86_64.whl(12.60 MB)
    torch_tensorrt-1.2.0-cp38-cp38-linux_x86_64.whl(12.67 MB)
    torch_tensorrt-1.2.0-cp39-cp39-linux_x86_64.whl(12.53 MB)
  • v1.1.1(Jul 16, 2022)

    Adding support for Torch-TensorRT on Jetpack 5.0 Developer Preview

    Torch-TensorRT 1.1.1 is a patch release for Torch-TensorRT 1.1 that targets PyTorch 1.11, CUDA 11.4/11.3, TensorRT 8.4 EA/8.2 and cuDNN 8.3/8.2 intended to add support for Torch-TensorRT on Jetson / Jetpack 5.0 DP. As this release is primarily targeted at adding support for Jetpack 5.0DP for the 1.1 feature set we will not be distributing pre-compiled binaries for this release so as not to break compatibility with the current stack for existing users who install directly from GitHub. Please follow the instructions for installation on Jetson in the documentation to install this release: https://pytorch.org/TensorRT/tutorials/installation.html#compiling-from-source

    Known Limitations

    • We have observed in testing, higher than normal numerical instability on Jetpack 5.0 DP. These issues are not observed on x86_64 based platforms. This numerical instability has not been found to decrease model accuracy in our test suite.

    What's Changed

    • feat: Upgrade TensorRT to 8.4 EA by @peri044 in https://github.com/pytorch/TensorRT/pull/1158

    Full Changelog: https://github.com/pytorch/TensorRT/compare/v1.1.0...v1.1.1

    Operators Supported

    Operators Currently Supported Through Converters

    • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
    • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
    • aten::abs(Tensor self) -> (Tensor)
    • aten::acos(Tensor self) -> (Tensor)
    • aten::acosh(Tensor self) -> (Tensor)
    • aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
    • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
    • aten::adaptive_avg_pool3d(Tensor self, int[3] output_size) -> (Tensor)
    • aten::adaptive_max_pool1d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
    • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::asin(Tensor self) -> (Tensor)
    • aten::asinh(Tensor self) -> (Tensor)
    • aten::atan(Tensor self) -> (Tensor)
    • aten::atanh(Tensor self) -> (Tensor)
    • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
    • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
    • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::ceil(Tensor self) -> (Tensor)
    • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
    • aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
    • aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
    • aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
    • aten::cos(Tensor self) -> (Tensor)
    • aten::cosh(Tensor self) -> (Tensor)
    • aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
    • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> (Tensor)
    • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
    • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
    • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::erf(Tensor self) -> (Tensor)
    • aten::exp(Tensor self) -> (Tensor)
    • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
    • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
    • aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
    • aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
    • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
    • aten::floor(Tensor self) -> (Tensor)
    • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
    • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
    • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
    • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
    • aten::index.Tensor(Tensor self, Tensor?[] indices) -> (Tensor)
    • aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
    • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
    • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
    • aten::log(Tensor self) -> (Tensor)
    • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
    • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
    • aten::matmul(Tensor self, Tensor other) -> (Tensor)
    • aten::max(Tensor self) -> (Tensor)
    • aten::max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
    • aten::max.other(Tensor self, Tensor other) -> (Tensor)
    • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::min(Tensor self) -> (Tensor)
    • aten::min.other(Tensor self, Tensor other) -> (Tensor)
    • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
    • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
    • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::neg(Tensor self) -> (Tensor)
    • aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
    • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
    • aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
    • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
    • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
    • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
    • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::reciprocal(Tensor self) -> (Tensor)
    • aten::reflection_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::reflection_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::relu(Tensor input) -> (Tensor)
    • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
    • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
    • aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
    • aten::reshape(Tensor self, int[] shape) -> (Tensor)
    • aten::roll(Tensor self, int[1] shifts, int[1] dims=[]) -> (Tensor)
    • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
    • aten::sigmoid(Tensor input) -> (Tensor)
    • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
    • aten::sin(Tensor self) -> (Tensor)
    • aten::sinh(Tensor self) -> (Tensor)
    • aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
    • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
    • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
    • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::sqrt(Tensor self) -> (Tensor)
    • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::t(Tensor self) -> (Tensor)
    • aten::tan(Tensor self) -> (Tensor)
    • aten::tanh(Tensor input) -> (Tensor)
    • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
    • aten::to.device(Tensor(a) self, Device device, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor(a))
    • aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
    • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
    • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
    • aten::unbind.int(Tensor(a -> *) self, int dim=0) -> (Tensor[])
    • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
    • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
    • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
    • trt::const(Tensor self) -> (Tensor)

    Operators Currently Supported Through Evaluators

    • aten::Bool.float(float b) -> (bool)
    • aten::Bool.int(int a) -> (bool)
    • aten::Float.Scalar(Scalar a) -> float
    • aten::Float.bool(bool a) -> float
    • aten::Float.int(int a) -> float
    • aten::Int.Scalar(Scalar a) -> int
    • aten::Int.bool(bool a) -> int
    • aten::Int.float(float a) -> int
    • aten::Int.int(int a) -> int
    • aten::and(int a, int b) -> (bool)
    • aten::and.bool(bool a, bool b) -> (bool)
    • aten::getitem.t(t list, int idx) -> (t(*))
    • aten::is(t1 self, t2 obj) -> bool
    • aten::isnot(t1 self, t2 obj) -> bool
    • aten::not(bool self) -> bool
    • aten::or(int a, int b) -> (bool)
    • aten::__range_length(int lo, int hi, int step) -> int
    • aten::__round_to_zero_floordiv(int a, int b) -> (int)
    • aten::xor(int a, int b) -> (bool)
    • aten::add.float(float a, float b) -> (float)
    • aten::add.int(int a, int b) -> (int)
    • aten::add.str(str a, str b) -> (str)
    • aten::add_.t(t self, t[] b) -> (t[])
    • aten::append.t(t self, t(c -> *) el) -> (t)
    • aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
    • aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
    • aten::dim(Tensor self) -> int
    • aten::div.float(float a, float b) -> (float)
    • aten::div.int(int a, int b) -> (float)
    • aten::eq.bool(bool a, bool b) -> (bool)
    • aten::eq.float(float a, float b) -> (bool)
    • aten::eq.float_int(float a, int b) -> (bool)
    • aten::eq.int(int a, int b) -> (bool)
    • aten::eq.int_float(int a, float b) -> (bool)
    • aten::eq.str(str a, str b) -> (bool)
    • aten::extend.t(t self, t[] other) -> ()
    • aten::floor.float(float a) -> (int)
    • aten::floor.int(int a) -> (int)
    • aten::floordiv.float(float a, float b) -> (int)
    • aten::floordiv.int(int a, int b) -> (int)
    • aten::format(str self, ...) -> (str)
    • aten::ge.bool(bool a, bool b) -> (bool)
    • aten::ge.float(float a, float b) -> (bool)
    • aten::ge.float_int(float a, int b) -> (bool)
    • aten::ge.int(int a, int b) -> (bool)
    • aten::ge.int_float(int a, float b) -> (bool)
    • aten::gt.bool(bool a, bool b) -> (bool)
    • aten::gt.float(float a, float b) -> (bool)
    • aten::gt.float_int(float a, int b) -> (bool)
    • aten::gt.int(int a, int b) -> (bool)
    • aten::gt.int_float(int a, float b) -> (bool)
    • aten::is_floating_point(Tensor self) -> (bool)
    • aten::le.bool(bool a, bool b) -> (bool)
    • aten::le.float(float a, float b) -> (bool)
    • aten::le.float_int(float a, int b) -> (bool)
    • aten::le.int(int a, int b) -> (bool)
    • aten::le.int_float(int a, float b) -> (bool)
    • aten::len.t(t[] a) -> (int)
    • aten::lt.bool(bool a, bool b) -> (bool)
    • aten::lt.float(float a, float b) -> (bool)
    • aten::lt.float_int(float a, int b) -> (bool)
    • aten::lt.int(int a, int b) -> (bool)
    • aten::lt.int_float(int a, float b) -> (bool)
    • aten::mul.float(float a, float b) -> (float)
    • aten::mul.int(int a, int b) -> (int)
    • aten::ne.bool(bool a, bool b) -> (bool)
    • aten::ne.float(float a, float b) -> (bool)
    • aten::ne.float_int(float a, int b) -> (bool)
    • aten::ne.int(int a, int b) -> (bool)
    • aten::ne.int_float(int a, float b) -> (bool)
    • aten::neg.int(int a) -> (int)
    • aten::numel(Tensor self) -> int
    • aten::pow.float(float a, float b) -> (float)
    • aten::pow.float_int(float a, int b) -> (float)
    • aten::pow.int(int a, int b) -> (float)
    • aten::pow.int_float(int a, float b) -> (float)
    • aten::size(Tensor self) -> (int[])
    • aten::size.int(Tensor self, int dim) -> (int)
    • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
    • aten::sqrt.float(float a) -> (float)
    • aten::sqrt.int(int a) -> (float)
    • aten::sub.float(float a, float b) -> (float)
    • aten::sub.int(int a, int b) -> (int)
    • aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
    • prim::dtype(Tensor a) -> (int)
    • prim::max.bool(bool a, bool b) -> (bool)
    • prim::max.float(float a, float b) -> (bool)
    • prim::max.float_int(float a, int b) -> (bool)
    • prim::max.int(int a, int b) -> (bool)
    • prim::max.int_float(int a, float b) -> (bool)
    • prim::max.self_int(int[] self) -> (int)
    • prim::min.bool(bool a, bool b) -> (bool)
    • prim::min.float(float a, float b) -> (bool)
    • prim::min.float_int(float a, int b) -> (bool)
    • prim::min.int(int a, int b) -> (bool)
    • prim::min.int_float(int a, float b) -> (bool)
    • prim::min.self_int(int[] self) -> (int)
    • prim::shape(Tensor a) -> (int[])
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(May 10, 2022)

    Support for PyTorch 1.11, Various Bug Fixes, Partial aten::Int support, New Debugging Tools, Removing Max Batch Size

    Torch-TensorRT 1.1.0 targets PyTorch 1.11, CUDA 11.3, cuDNN 8.2 and TensorRT 8.2. Due to recent JetPack upgrades, this release does not support Jetson (Jetpack 5.0DP or otherwise). Jetpack 5.0DP support will arrive in a mid-cycle release (Torch-TensorRT 1.1.x) along with support for TensorRT 8.4. 1.1.0 also drops support for Python 3.6 as it has reached end of life. Following 1.0.0, this release is focused on stabilizing and improving the core of Torch-TensorRT. Many improvements have been made to the partitioning system addressing limitation many users hit while trying to partially compile PyTorch modules. Torch-TensorRT 1.1.0 also addresses a long standing issue with aten::Int operators (albeit) partially. Now certain common patterns which use aten::Int can be handled by the compiler without resorting to partial compilation. Most notably, this means that models like BERT can be run end to end with Torch-TensorRT, resulting in significant performance gains.

    New Debugging Tools

    With this release we are introducing new syntax sugar that can be used to more easily debug Torch-TensorRT compilation and execution through the use of context managers. For example, in Torch-TensorRT 1.0.0 this may be a common pattern to turn on then turn off debug info:

    import torch_tensorrt
    ...
    torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Debug)
    trt_module = torch_tensorrt.compile(my_module, ...)
    torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Warning)
    results = trt_module(input_tensors)
    

    With Torch-TensorRT 1.1.0, this now can be done with the following code:

    import torch_tensorrt
    ...
    with torch_tensorrt.logging.debug():
        trt_module = torch_tensorrt.compile(my_module,...)
    results = trt_module(input_tensors)
    

    You can also use this API to debug the Torch-TensorRT runtime as well:

    import torch_tensorrt
    torch_tensorrt.logging.set_reportable_log_level(torch_tensorrt.logging.Level.Error)
    ...
    trt_module = torch_tensorrt.compile(my_module,...)
    with torch_tensorrt.logging.warnings():
        results = trt_module(input_tensors)
    

    The following levels are available:

    
    # Only internal TensorRT failures will be logged
    with torch_tensorrt.logging.internal_errors():
    
    # Internal TensorRT failures + Torch-TensorRT errors will be logged
    with torch_tensorrt.logging.errors():
    
    # All Errors plus warnings will be logged
    with torch_tensorrt.logging.warnings():
    
    # First verbosity level, information about major steps occurring during compilation and execution
    with torch_tensorrt.logging.info():
    
    # Second verbosity level, each step is logged + information about compiler state will be outputted
    with torch_tensorrt.logging.debug():
    
    # Third verbosity level, all above information + intermediate transformations of the graph during lowering
    with torch_tensorrt.logging.graphs():
    

    Removing Max Batch Size, Strict Types

    In this release we are removing the max_batch_size and strict_types settings. These settings directly corresponded to the TensorRT settings, however were not always respected which often lead to confusion. Therefore we thought it best to disable these features as deterministic behavior could not be ensured.

    Porting forward from max_batch_size, strict_types:

    • max_batch_size: The first dim in shapes provided to Torch-TensorRT are considered batch dimensions, therefore instead of setting max_batch_size, you can just use the Input objects directly
    • strict_types: A replacement with more deterministic behavior will come with an upcoming TensorRT release.

    Dependencies

    - Bazel 5.1.1
    - LibTorch 1.11.0
    - CUDA 11.3 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build)
    - cuDNN 8.2.4.15
    - TensorRT 8.2.4.2
    

    1.1.0 (2022-05-10)

    Bug Fixes

    • add at::adaptive_avg_pool1d in interpolate plugin and fix #791 (deb9f74)
    • Added ipywidget dependency to notebook (0b2040a)
    • Added test case names (296e98a)
    • Added truncate_long_and_double (417c096)
    • Adding truncate_long_and_double to ptq tests (3a0640a)
    • Avoid resolving non-tensor inputs to torch segment_blocks unneccessarily (3e090ee)
    • Considering rtol and atol in threshold comparison for floating point numbers (0b0ba8d)
    • Disabled mobilenet_v2 test for DLFW CI (40c611f)
    • fix bug that python api doesn't pass truncate_long_and_double value to internal.partition_info (828336d)
    • fix bugs in aten::to (2ecd187)
    • Fix BUILD file for tests/accuracy (8b0170e)
    • Fix existing uninstallation of Torch-TRT (9ddd7a8)
    • Fix for torch scripted module faiure with DLFW (88c02d9)
    • Fix fuse addmm pass (58e9ea0)
    • Fix pre_built name change in bazelrc (3ecee21)
    • fix the bug that introduces kLong Tensor in prim::NumToTensor (2c3e1d9)
    • Fix when TRT prunes away an output (9465e1d)
    • Fixed bugs and addressed review comments (588e1d1)
    • Fixed failures for host deps sessions (ec2232f)
    • Fixed typo in the path (43fab56)
    • Getting unsupported ops will now bypass non-schema ops avoiding redundant failures (d7d1511)
    • Guard test activation for CI testing (6d1a1fd)
    • Implement a patch for gelu schema change in older NGC containers (9ee3a04)
    • Missing log severity (6a4daef)
    • Preempt torch package override via timm in nox session (8964d1b)
    • refactor the resegmentation for TensorRT segments in ResolveNonTensorInput (3cc2dfb)
    • remove outdated member variables (0268da2)
    • Removed models directory dependencies (c4413e1)
    • Resolve issues in exception elmination pass (99cea1b)
    • Review comments incorporated (962660d)
    • Review comments incorporated (e9865c2)
    • support dict type for input in shape analysis (630f9c4)
    • truncate_long_and_double incur torchscript inference issues (c83aa15)
    • Typo fix for test case name (2a516b2)
    • Update "reduceAxes" variable in GlobalPoolingConverter function and add corresponding uTests (f6f5e3e)
    • //core/conversion/evaluators: Change how schemas are handled (20e5d41)
    • Update base container for dockerfile (1b3245a)
    • //core: Take user setting in the case we can't determine the (01c89d1), closes #814
    • Update test for new Exception syntax (2357099)
    • //core/conversion: Add special case for If and Loop (eacde8d)
    • //core/runtime: Support more delimiter variants (819c911)
    • //cpp/bin/torchtrtc: Fix mbs (aca175f)
    • //docsrc: Fix dependencies for docgen (806e663)
    • //notebooks: Render citrinet (12dbda1)
    • //py: Constrain the CUDA version in container builds (a21a045)
    • Use user provided dtype when we can't infer it from the graph (14650d1)

    Code Refactoring

    • removing the strict_types and max_batch_size apis (b30cbd9)
    • Rename enabled precisions arugment to (10957eb)
    • Removing the max-batch-size argument (03bafc5)

    Features

    • //core/conversion: Better tooling for debugging (c5c5c47)
    • //core/conversion/evaluators: aten::pow support (c4fdfcb)
    • //docker: New base container to let master build in container (446bf18)
    • //py: Context managers to quickly switch logging level (12e470f)
    • Add converter files for reflection pad 1d and 2d (406d860)
    • Add converter files for torch::max (f628aca)
    • Add converter files for torch::max (569bcde)
    • Add converter files for torch::max (dd7a44e)
    • Add converter for reflection pad 1d and 2d operation (2484a43)
    • Added comprehensive perf benchmark script (a8016ff)
    • Added compute capability for Orin (af3d0ff)
    • Added env var for TOP_DIR (c26180e)
    • Added Python accuracy tests using Nox (6ae8652)
    • Enable prim::DictConstruct to fallback without conversion check error (01d98c7)
    • Handle empty schemas for unsupported ops (bf6c929)
    • Implement fast approximation of Gelu as lowering pass to improve performance (8024ea2)
    • Implement lowering for aten::to.dtype schema (4b3ae3a)
    • Implement test case for aten::to.dtype lowering (bde8ee0)
    • Perf benchmark initial draft (f2d1655)
    • replace view with reshape during lowering (d39b918)
    • Review comment incorporated (161ef3d)
    • support aten::adaptive_max_pool1d, aten::adaptive_avg_pool3d and aten::adaptive_max_pool3d operators (e554dbd)
    • support aten::div.Tensor_mode (bb3046a)
    • support aten::extend evaluator (33c523d)
    • support aten::format evaluator (3a33d33)
    • Update Pytorch version to 1.11 (c009a1f)
    • Upgrade TensorRT to 8.2.4.2 (f1f151b)
    • //tests: Adding BERT to the test suite (7996a10)
    • aten::__range_length: Adding range length evaluator (11c4608)
    • aten::add: adding string concat evaluator (65dbf90)
    • aten::Int: Adding a new pass to remove single use (46ac757)
    • aten::Int: Lowers out aten::Int (908340f)
    • core//conversion: Implement converter for torch unbind (268a49b)

    BREAKING CHANGES

    • This commit removes the strict types and max_batch_size apis. We are doing this because the functionality of these APIs in TRT is convoluted and likely to be ignored during building. A replacement for strict types with actual guarantees will be added at a later date.

    Signed-off-by: Dheeraj Peri [email protected]

    • This is a minor change but may cause scripts using torchtrtc to fail. We are renaming enabled-precisions to enable-precision since it makes more sense as the argument can be repeated

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This PR removes --max-batch-size from the CLI as it has no real functional effect

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    Operators Supported

    Operators Currently Supported Through Converters

    • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
    • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
    • aten::abs(Tensor self) -> (Tensor)
    • aten::acos(Tensor self) -> (Tensor)
    • aten::acosh(Tensor self) -> (Tensor)
    • aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
    • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
    • aten::adaptive_avg_pool3d(Tensor self, int[3] output_size) -> (Tensor)
    • aten::adaptive_max_pool1d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
    • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::asin(Tensor self) -> (Tensor)
    • aten::asinh(Tensor self) -> (Tensor)
    • aten::atan(Tensor self) -> (Tensor)
    • aten::atanh(Tensor self) -> (Tensor)
    • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
    • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
    • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::ceil(Tensor self) -> (Tensor)
    • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
    • aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
    • aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
    • aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
    • aten::cos(Tensor self) -> (Tensor)
    • aten::cosh(Tensor self) -> (Tensor)
    • aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
    • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> (Tensor)
    • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
    • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
    • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::erf(Tensor self) -> (Tensor)
    • aten::exp(Tensor self) -> (Tensor)
    • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
    • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
    • aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
    • aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
    • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
    • aten::floor(Tensor self) -> (Tensor)
    • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
    • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
    • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
    • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
    • aten::index.Tensor(Tensor self, Tensor?[] indices) -> (Tensor)
    • aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
    • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
    • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
    • aten::log(Tensor self) -> (Tensor)
    • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
    • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
    • aten::matmul(Tensor self, Tensor other) -> (Tensor)
    • aten::max(Tensor self) -> (Tensor)
    • aten::max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
    • aten::max.other(Tensor self, Tensor other) -> (Tensor)
    • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::min(Tensor self) -> (Tensor)
    • aten::min.other(Tensor self, Tensor other) -> (Tensor)
    • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
    • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
    • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::neg(Tensor self) -> (Tensor)
    • aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
    • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
    • aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
    • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
    • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
    • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
    • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::reciprocal(Tensor self) -> (Tensor)
    • aten::reflection_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::reflection_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::relu(Tensor input) -> (Tensor)
    • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
    • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
    • aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
    • aten::reshape(Tensor self, int[] shape) -> (Tensor)
    • aten::roll(Tensor self, int[1] shifts, int[1] dims=[]) -> (Tensor)
    • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
    • aten::sigmoid(Tensor input) -> (Tensor)
    • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
    • aten::sin(Tensor self) -> (Tensor)
    • aten::sinh(Tensor self) -> (Tensor)
    • aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
    • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
    • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
    • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::sqrt(Tensor self) -> (Tensor)
    • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::t(Tensor self) -> (Tensor)
    • aten::tan(Tensor self) -> (Tensor)
    • aten::tanh(Tensor input) -> (Tensor)
    • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
    • aten::to.device(Tensor(a) self, Device device, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor(a))
    • aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
    • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
    • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
    • aten::unbind.int(Tensor(a -> *) self, int dim=0) -> (Tensor[])
    • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
    • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
    • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
    • trt::const(Tensor self) -> (Tensor)

    Operators Currently Supported Through Evaluators

    • aten::Bool.float(float b) -> (bool)
    • aten::Bool.int(int a) -> (bool)
    • aten::Float.Scalar(Scalar a) -> float
    • aten::Float.bool(bool a) -> float
    • aten::Float.int(int a) -> float
    • aten::Int.Scalar(Scalar a) -> int
    • aten::Int.bool(bool a) -> int
    • aten::Int.float(float a) -> int
    • aten::Int.int(int a) -> int
    • aten::and(int a, int b) -> (bool)
    • aten::and.bool(bool a, bool b) -> (bool)
    • aten::getitem.t(t list, int idx) -> (t(*))
    • aten::is(t1 self, t2 obj) -> bool
    • aten::isnot(t1 self, t2 obj) -> bool
    • aten::not(bool self) -> bool
    • aten::or(int a, int b) -> (bool)
    • aten::__range_length(int lo, int hi, int step) -> int
    • aten::__round_to_zero_floordiv(int a, int b) -> (int)
    • aten::xor(int a, int b) -> (bool)
    • aten::add.float(float a, float b) -> (float)
    • aten::add.int(int a, int b) -> (int)
    • aten::add.str(str a, str b) -> (str)
    • aten::add_.t(t self, t[] b) -> (t[])
    • aten::append.t(t self, t(c -> *) el) -> (t)
    • aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
    • aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
    • aten::dim(Tensor self) -> int
    • aten::div.float(float a, float b) -> (float)
    • aten::div.int(int a, int b) -> (float)
    • aten::eq.bool(bool a, bool b) -> (bool)
    • aten::eq.float(float a, float b) -> (bool)
    • aten::eq.float_int(float a, int b) -> (bool)
    • aten::eq.int(int a, int b) -> (bool)
    • aten::eq.int_float(int a, float b) -> (bool)
    • aten::eq.str(str a, str b) -> (bool)
    • aten::extend.t(t self, t[] other) -> ()
    • aten::floor.float(float a) -> (int)
    • aten::floor.int(int a) -> (int)
    • aten::floordiv.float(float a, float b) -> (int)
    • aten::floordiv.int(int a, int b) -> (int)
    • aten::format(str self, ...) -> (str)
    • aten::ge.bool(bool a, bool b) -> (bool)
    • aten::ge.float(float a, float b) -> (bool)
    • aten::ge.float_int(float a, int b) -> (bool)
    • aten::ge.int(int a, int b) -> (bool)
    • aten::ge.int_float(int a, float b) -> (bool)
    • aten::gt.bool(bool a, bool b) -> (bool)
    • aten::gt.float(float a, float b) -> (bool)
    • aten::gt.float_int(float a, int b) -> (bool)
    • aten::gt.int(int a, int b) -> (bool)
    • aten::gt.int_float(int a, float b) -> (bool)
    • aten::is_floating_point(Tensor self) -> (bool)
    • aten::le.bool(bool a, bool b) -> (bool)
    • aten::le.float(float a, float b) -> (bool)
    • aten::le.float_int(float a, int b) -> (bool)
    • aten::le.int(int a, int b) -> (bool)
    • aten::le.int_float(int a, float b) -> (bool)
    • aten::len.t(t[] a) -> (int)
    • aten::lt.bool(bool a, bool b) -> (bool)
    • aten::lt.float(float a, float b) -> (bool)
    • aten::lt.float_int(float a, int b) -> (bool)
    • aten::lt.int(int a, int b) -> (bool)
    • aten::lt.int_float(int a, float b) -> (bool)
    • aten::mul.float(float a, float b) -> (float)
    • aten::mul.int(int a, int b) -> (int)
    • aten::ne.bool(bool a, bool b) -> (bool)
    • aten::ne.float(float a, float b) -> (bool)
    • aten::ne.float_int(float a, int b) -> (bool)
    • aten::ne.int(int a, int b) -> (bool)
    • aten::ne.int_float(int a, float b) -> (bool)
    • aten::neg.int(int a) -> (int)
    • aten::numel(Tensor self) -> int
    • aten::pow.float(float a, float b) -> (float)
    • aten::pow.float_int(float a, int b) -> (float)
    • aten::pow.int(int a, int b) -> (float)
    • aten::pow.int_float(int a, float b) -> (float)
    • aten::size(Tensor self) -> (int[])
    • aten::size.int(Tensor self, int dim) -> (int)
    • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
    • aten::sqrt.float(float a) -> (float)
    • aten::sqrt.int(int a) -> (float)
    • aten::sub.float(float a, float b) -> (float)
    • aten::sub.int(int a, int b) -> (int)
    • aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
    • prim::dtype(Tensor a) -> (int)
    • prim::max.bool(bool a, bool b) -> (bool)
    • prim::max.float(float a, float b) -> (bool)
    • prim::max.float_int(float a, int b) -> (bool)
    • prim::max.int(int a, int b) -> (bool)
    • prim::max.int_float(int a, float b) -> (bool)
    • prim::max.self_int(int[] self) -> (int)
    • prim::min.bool(bool a, bool b) -> (bool)
    • prim::min.float(float a, float b) -> (bool)
    • prim::min.float_int(float a, int b) -> (bool)
    • prim::min.int(int a, int b) -> (bool)
    • prim::min.int_float(int a, float b) -> (bool)
    • prim::min.self_int(int[] self) -> (int)
    • prim::shape(Tensor a) -> (int[])
    Source code(tar.gz)
    Source code(zip)
    libtorchtrt-v1.1.0-cudnn8.2-tensorrt8.2-cuda11.3-libtorch1.11.0.tar.gz(1.78 MB)
    libtorchtrt-v1.1.0-pre-cxx11-abi-cudnn8.2-tensorrt8.2-cuda11.3-libtorch1.11.0.tar.gz(1.79 MB)
    torch_tensorrt-1.1.0-cp310-cp310-linux_x86_64.whl(11.01 MB)
    torch_tensorrt-1.1.0-cp37-cp37m-linux_x86_64.whl(11.04 MB)
    torch_tensorrt-1.1.0-cp38-cp38-linux_x86_64.whl(11.16 MB)
    torch_tensorrt-1.1.0-cp39-cp39-linux_x86_64.whl(11.02 MB)
  • v1.0.0(Nov 9, 2021)

    New Name!, Support for PyTorch 1.10, CUDA 11.3, New Packaging and Distribution Options, Stabilized APIs, Stabilized Partial Compilation, Adjusted Default Behavior, Usability Improvements, New Converters, Bug Fixes

    This is the first stable release of Torch-TensorRT targeting PyTorch 1.10, CUDA 11.3 (on x86_64, CUDA 10.2 on aarch64), cuDNN 8.2 and TensorRT 8.0 with backwards compatible source for TensorRT 7.1. On aarch64 TRTorch targets Jetpack 4.6 primarily with backwards compatible source for Jetpack 4.5. This version also removes deprecated APIs such as InputRange and op_precision

    New Name

    TRTorch is now Torch-TensorRT! TRTorch started out as a small experimental project compiling TorchScript to TensorRT almost two years ago and now as we are hitting v1.0.0 with APIs and major features stabilizing we felt that the name of the project should reflect the ecosystem of tools it is joining with this release, namely TF-TRT (https://blog.tensorflow.org/2021/01/leveraging-tensorflow-tensorrt-integration.html) and MXNet-TensorRT(https://mxnet.apache.org/versions/1.8.0/api/python/docs/tutorials/performance/backend/tensorrt/tensorrt). Since we were already significantly changing APIs with this release to reflect what we learned over the last two years of using TRTorch, we felt this is was the right time to change the name as well.

    The overall process to port forward from TRTorch is as follows:

    • Python

      • The library has been renamed from trtorch to torch_tensorrt
      • Components that used to all live under the trtorch namespace have now been separated. IR agnostic components: torch_tensorrt.Input, torch_tensorrt.Device, torch_tensorrt.ptq, torch_tensorrt.logging will continue to live under the top level namespace. IR specific components like torch_tensorrt.ts.compile, torch_tensorrt.ts.convert_method_to_trt_engine, torch_tensorrt.ts.TensorRTCompileSpec will live in a TorchScript specific namespace. This gives us space to explore the other IRs that might be relevant to the project in the future. In the place of the old top level compile and convert_method_to_engine are new ones which will call the IR specific versions based on what is provided to them. This also means that you can now provide a raw torch.nn.Module to torch_tensorrt.compile and Torch-TensorRT will handle the TorchScripting step for you. For the most part the sole change that will be needed to change over namespaces is to exchange trtorch to torch_tensorrt
    • C++

      • Similar to Python the namespaces in C++ have changed from trtorch to torch_tensorrt and components specific to the IR like compile, convert_method_to_trt_engine and CompileSpec are in a torchscript namespace, while agnostic components are at the top level. Namespace aliases for torch_tensorrt -> torchtrt and torchscript -> ts are included. Again the port forward process for namespaces should be a find and replace. Finally the libraries libtrtorch.so, libtrtorchrt.so and libtrtorch_plugins.so have been renamed to libtorchtrt.so, libtorchtrt_runtime.so and libtorchtrt_plugins.so respectively.
    • CLI:

      • trtorch has been renamed to torchtrtc

    New Distribution Options and Packaging

    Starting with nvcr.io/nvidia/pytorch:21.11, Torch-TensorRT will be distributed as part of the container (https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch). The version of Torch-TensorRT in container will be the state of the master at the time of building. Torch-TensorRT will be validated to run correctly with the version of PyTorch, CUDA, cuDNN and TensorRT in the container. This will serve as the easiest way to have a full validated PyTorch end to end training to inference stack and serves as a great starting point for building DL applications.

    Also as part of Torch-TensorRT we are now starting to distribute the full C++ package within the wheel files for the Python packages. By installing the wheel you now get the Python API, the C++ libraries + headers and the CLI binary. This is going to be the easiest way to install Torch-TensorRT on your stack. After installing with pip

    pip3 install torch-tensorrt -f https://github.com/NVIDIA/Torch-TensorRT/releases
    

    You can add the following to your PATH to set up the CLI

    PATH=$PATH:<PATH TO TORCHTRT PYTHON PACKAGE>/bin
    

    Stabilized APIs

    Python

    Many of the APIs have change slighly in this release to be more self consistent and more usable. These changes begin with the Python API for which compile, convert_method_to_trt_engine and TensorRTCompileSpec now instead of dictionaries use kwargs. As features many features came out of beta and experimental stability the necessity to have multiple levels of nesting in settings has decreased, therefore kwargs make much more sense. You can simply port forward to the new APIs by unwrapping your existing compile_spec dict in the arguments to compile or similar functions.

    Example:
    compile_settings = {
        "inputs": [torch_tensorrt.Input(
            min_shape=[1, 3, 224, 224],
            opt_shape=[1, 3, 512, 512],
            max_shape=[1, 3, 1024, 1024],
            # For static size shape=[1, 3, 224, 224]
            dtype=torch.half, # Datatype of input tensor. Allowed options torch.(float|half|int8|int32|bool)
        )],
        "enabled_precisions": {torch.half}, # Run with FP16
    }
    
    trt_ts_module = torch_tensorrt.compile(torch_script_module, **compile_settings)
    

    This release also introduces support for providing tensors as examples to Torch-TensorRT. In place of a torch_tensorrt.Input in the list of inputs you can pass a Tensor. This can only be used to set a static input size. There are also some things to be aware of which will be discussed later in the release notes.

    Now that Torch-TensorRT separates components specific to particular IRs to their own namespaces, there is now a replacement for the old compile and convert_method_to_trt_engine functions on the top level. These functions take any PyTorch generated format including torch.nn.Modules and decides the best way to compile it down to TensorRT. In v1.0.0 this means to go through TorchScript and return a Torch.jit.ScriptModule. You can specify the IR to try using the ir arg for these functions.

    Due to partial compilation becoming stable in v1.0.0, there are now four new fields which replace the old torch_fallback struct.

    • old:
    complie_spec = {
      "torch_fallback": {
          "enabled": True, # Turn on or turn off falling back to PyTorch if operations are not supported in TensorRT
          "force_fallback_ops": [
              "aten::max_pool2d" # List of specific ops to require running in PyTorch
          ],
          "force_fallback_modules": [
              "mypymod.mytorchmod" # List of specific torch modules to require running in PyTorch
          ],
          "min_block_size": 3 # Minimum number of ops an engine must incapsulate to be run in TensorRT
      }
    }
    
    • new:
    torch_tensorrt.compile(...,
        require_full_compilation=False, 
        min_block_size=3, 
        torch_executed_ops=[ "aten::max_pool2d" ], 
        torch_executed_modules=["mypymod.mytorchmod"])
    

    C++

    The changes for the C++ API other than the reorganization and renaming of the namespaces, mostly serve to make Torch-TensorRT consistent between Python and C++ namely by renaming trtorch::CompileGraph to torch_tensorrt::ts::compile and trtorch::ConvertGraphToTRTEngine to torch_tensorrt::ts::convert_method_to_trt_engine. Beyond that similar to Python, the partial compilation struct TorchFallback has been removed and replaced by four fields in torch_tensorrt::ts::CompileSpec

    • old:
      /**
       * @brief A struct to hold fallback info
       */
      struct TRTORCH_API TorchFallback {
        /// enable the automatic fallback feature
        bool enabled = false;
    
        /// minimum consecutive operation number that needs to be satisfied to convert to TensorRT
        uint64_t min_block_size = 1;
    
        /// A list of names of operations that will explicitly run in PyTorch
        std::vector<std::string> forced_fallback_ops;
    
        /// A list of names of modules that will explicitly run in PyTorch
        std::vector<std::string> forced_fallback_modules;
    
        /**
         * @brief Construct a default Torch Fallback object, fallback will be off
         */
        TorchFallback() = default;
    
        /**
         * @brief Construct from a bool
         */
        TorchFallback(bool enabled) : enabled(enabled) {}
    
        /**
         * @brief Constructor for setting min_block_size
         */
        TorchFallback(bool enabled, uint64_t min_size) : enabled(enabled), min_block_size(min_size) {}
      };
    
    • new:
      /**
       * Require the full module be compiled to TensorRT instead of potentially running unsupported operations in PyTorch
       */
      bool require_full_compilation = false;
    
      /**
       * Minimum number of contiguous supported operators to compile a subgraph to TensorRT
       */
      uint64_t min_block_size = 3;
    
      /**
       * List of aten operators that must be run in PyTorch. An error will be thrown if this list is not empty but
       * ``require_full_compilation`` is True
       */
      std::vector<std::string> torch_executed_ops;
    
      /**
       * List of modules that must be run in PyTorch. An error will be thrown if this list is not empty but
       * ``require_full_compilation`` is True
       */
      std::vector<std::string> torch_executed_modules;
    

    CLI

    Similarly these partial compilation fields have been renamed in torchtrtc:

        --require-full-compilation        Require that the model should be fully
                                          compiled to TensorRT or throw an error
        --teo=[torch-executed-ops...],
        --torch-executed-ops=[torch-executed-ops...]
                                          (Repeatable) Operator in the graph that
                                          should always be run in PyTorch for
                                          execution (partial compilation must be
                                          enabled)
        --tem=[torch-executed-mods...],
        --torch-executed-mods=[torch-executed-mods...]
                                          (Repeatable) Module that should always
                                          be run in Pytorch for execution (partial
                                          compilation must be enabled)
        --mbs=consecutive_ops,
        --min-block-size=consecutive_ops
                                          Minimum number of contiguous TensorRT
                                          supported ops to compile a subgraph to
                                          TensorRT
    

    Going forward breaking changes to the API the sort of magnitude seen in this release will be accompanied by a major version bump.

    Stabilized Partial Compilation

    Partial compilation should be considered stable for static input shape and is now enabled by default. In the case of dynamic shape, set require_full_compilation to True.

    Adjusted Defaults

    Input Types

    Default behavior of Torch-TensorRT has shifted slightly. The most important of these changes is the changes to inferred input type. In prior versions the expected input type for a Tensor barring it being set explicitly was based on the op_precision. With that field being removed in this release and being replaced with enabled_precisions introduced in v0.4.0 this sort of behavior no longer makes sense. Therefore now Torch-TensorRT follows these rules to determine Input type for a Tensor.

    1. If no dtype is specified for an Input, Torch-TensorRT will determine the input type by inspecting the uses of this Input. It will trace the lifetime of this tensor to the first tensor operation using weights stored in the provided module. The type of the weights is the inferred type of the Input using the rule that PyTorch requires like types for Tensor operations. The goal with this behavior is to maintain the concept that Torch-TensorRT modules should feel no different than normal PyTorch modules. Therefore you can expect | Weight Type of Model | Expected Input Type For Tensor | |--------------------------------|--------------------------------------------| | FP32 | FP32 | | FP16 | FP16 | | Quantization Workflows | FP32 | | Unknown / Ambiguous | FP32 w/ Warning |

    2. Users can override this behavior to set the Input type to whatever they wish using the dtype field of torch_tensorrt.Input. Torch-TensorRT will always respect the user setting but may throw a warning stating that the model provided expects a different input type. This is mainly to notify you that just dropping the compiled module in place of the raw torch.nn.Module might throw errors and casting before inference might be necessary.

      • With Torch-TensorRT v1.0.0 you can provide example torch Tensors to set the input shape. However, this not only sets the input shape but also the input dtype and tensor format as well. So if you provide a half precision 1x3x32x32 contiguous tensor the expected input would be Input(shape=(1, 3, 32, 32), dtype=dtype.half, format=TensorFormat.contiguous). This is subject to the behavior in 2.

    Workspace Size

    Now by default the workspace size is set to 1GB for all GPUs Pascal based and newer (SM capability 6 or above). Maxwell and older cards including Jetson Nano have a workspace of 256MB by default. This value is user settable.

    Dependencies

    - Bazel 4.2.1
    - LibTorch 1.10.0
    - CUDA 11.3 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
    - cuDNN 8.2.4.15
    - TensorRT 8.0.3.4
    

    1.0.0 (2021-11-09)

    Bug Fixes

    • aten::gelu call was wrong in test (40bc4e3)
    • Fix a core partitioning algo bug where non-tensor input segments are not updated correctly (cc10876)
    • Fix modules_as_engines test case to use trt_mod instead of pyt_mod (282e98a)
    • Fix plugin registration macro (8afab22)
    • Fix python API tests for mobilenet v2 (e5a38ff)
    • Partial compilation translation to internal settings was incorrect (648bad3)
    • //py: Don't crash harshly on import when CUDA is not available (07e16fd)
    • Renable backtrace and make it less repetitive (1435845)
    • //core/lowering: Fixes module level fallback recursion (f94ae8f)
    • //core/partitioing: Fixing support for paritally compiling (748ecf3)
    • //docker: Update docker container build script to use release path (9982855)
    • //py: Add new dirs to remove during clean (d2cc1e9)
    • //py: Fix some api import issues (840ca89)
    • //py: Fix trtorch.Device alternate contructor options (fa08311)
    • //py: Fix trtorch.Device alternate contructor options (ac26841)
    • Update notebooks with new library name Torch-TensorRT (8274fd9)
    • aten::conv1d: Update namespace, fix typo in dest IR for conv1d (d53f136)
    • eval: Rollback 1.11a0 change + namespace issues (ba743f5)
    • Use scripting instead of tracing for module fallback tests (32e8b53)
    • Workspace defaults for other apis and centralize cuda api use (930321e)

    Features

    • Add functionality for tests to use precompiled libraries (b5c324a)

    • Add QAT patch which modifies scale factor dtype to INT32 (4a10673)

    • Add TF32 override flag in bazelrc for CI-Testing (7a0c9a5)

    • Add VGG QAT sample notebook which demonstrates end-end workflow for QAT models (8bf6dd6)

    • Augment python package to include bin, lib, include directories (ddc0685)

    • handle scalar type of size [] in shape_analysis (fca53ce)

    • support aten::and.bool evaluator (6d73e43)

    • support aten::conv1d and aten::conv_transpose1d (c8dc6e9)

    • support aten::eq.str evaluator (5643972)

    • support setting input types of subgraph in fallback, handle Tensor type in evaluated_value_map branch in MarkOutputs (4778b2b)

    • support truncate_long_and_double in fallback subgraph input type (0bc3c05)

    • Update documentation with new library name Torch-TensorRT (e5f96d9)

    • Updating the pre_built to prebuilt (51412c7)

    • //:libtrtorch: Ship a WORKSPACE file and BUILD file with the (7ac6f1c)

    • //core/partitioning: Improved logging and code org for the (8927e77)

    • //cpp: Adding example tensors as a way to set input spec (70a7bb3)

    • //py: Add the git revision to non release builds (4a0a918)

    • //py: Allow example tensors from torch to set shape (01d525d)

    • feat!: Changing the default behavior for selecting the input type (a234335)

    • refactor!: Removing deprecated InputRange, op_precision and input_shapes (621bc67)

    • feat(//py)!: Porting forward the API to use kwargs (17e0e8a)

    • refactor(//py)!: Kwargs updates and support for shifting internal apis (2a0d1c8)

    • refactor!(//cpp): Inlining partial compilation settings since the (19ecc64)

    • refactor! : Update default workspace size based on platforms. (391a4c0)

    • feat!: Turning on partial compilation by default (52e2f05)

    • refactor!: API level rename (483ef59)

    • refactor!: Changing the C++ api to be snake case (f34e230)

    • refactor! : Update Pytorch version to 1.10 (cc7d0b7)

    • refactor!: Updating bazel version for py build container (06533fe)

    BREAKING CHANGES

    • This removes the InputRange Class and op_precision and input shape fields which were deprecated in TRTorch v0.4.0

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This change updates the bazel version to build Torch-TensorRT to 4.2.1.

    This was done since the only version of bazel available in our build container for python apis is 4.2.1

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This changes the API for compile settings from a dictionary of settings to a set of kwargs for the various compilation functions. This will break existing code. However there is simple guidance to port forward your code:

    Given a dict of valid TRTorch CompileSpec settings

    spec = {
    	"inputs": ...
    	...
    }
    

    You can use this same dict with the new APIs by changing your code from:

    trtorch.compile(mod, spec)
    

    to:

    trtorch.compile(mod, **spec)
    

    which will unpack the dictionary as arguments to the function

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This commit changes the APIs from a dictionary of arguements to a set of kwargs. You can port forward using
    trtorch.compile(mod, **spec)
    

    Also in preparation for partial compilation to be enabled by default settings related to torch fallback have been moved to the top level

    instead of

    "torch_fallback": {
      "enabled": True,
      "min_block_size" " 3,
      "forced_fallback_ops" : ["aten::add"],
      "forced_fallback_mods" : ["MySubModule"]
    }
    

    now there are new settings

    require_full_compilation=False,
    min_block_size=3,
    torch_executed_ops=["aten::add"],
    torch_executed_modules=["MySubModule"]
    

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This commit changes the API for automatic fallback to inline settings regarding partial compilation in preparation for it to be turned on by default

    Now in the compile spec instead of a torch_fallback field with its associated struct, there are four new fields in the compile spec

    bool require_full_compilation = true;
    uint64_t min_block_size = 3;
    std::vector<std::string> torch_executed_ops = {};
    std::vector<std::string> torch_executed_modules = {};
    

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This commit sets the default workspace size to 1GB for GPU platforms and 256MB for Jetson Nano/TX1 platforms whose compute capability is < 6.

    Signed-off-by: Dheeraj Peri [email protected]

    Signed-off-by: Dheeraj Peri [email protected]

    Signed-off-by: Dheeraj Peri [email protected]

    Signed-off-by: Dheeraj Peri [email protected]

    Signed-off-by: Dheeraj Peri [email protected]

    • This commit turns on partial compilation by default. Unsupported modules will attempt to be run partially in PyTorch and partially in TensorRT

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This commit renames the namespaces of all TRTorch/Torch-TensorRT APIs. Now torchscript specific functions are segregated into their own torch_tensorrt::torchscript / torch_tensorrt.ts namespaces. Generic utils will remain in the torch_tensorrt namespace. Guidance on how to port forward will follow in the next commits
    • This changes the C++ API ::ts APIs to be snake case and for CompileModules to become just compile

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This commit updates the pytorch version to 1.10. To use python API of torch_tensorrt, please upgrade your local pytorch to 1.10 to avoid ABI incompatibility errors. WORKSPACE and requirements files are updated accordingly

    Signed-off-by: Dheeraj Peri [email protected]

    Signed-off-by: Dheeraj Peri [email protected]

    • This commit changes the default behavior of the compiler where if the user does not specify an input data type explicity instead of using the enabled precision, now the compiler will inspect the model provided to infer the data type for the input that will not cause an error if the model was run in torch. In practice this means
    • If the weights are in FP32 for the first tensor calculation then default input type is FP32
    • If the weights are in FP16 for the first tensor calculation then default input type is FP16
    • etc.

    If the data type cannot be determined the compiler will default to FP32.

    This calculation is done per input tensor so if one input is inferred to use FP32 and another INT32 then the expected types will be the same (FP32, INT32)

    As was the same before if the user defines the data type explicitly or provides an example tensor the data type specified there will be respected

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    Operators Supported

    Operators Currently Supported Through Converters

    • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
    • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
    • aten::abs(Tensor self) -> (Tensor)
    • aten::acos(Tensor self) -> (Tensor)
    • aten::acosh(Tensor self) -> (Tensor)
    • aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
    • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
    • aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::asin(Tensor self) -> (Tensor)
    • aten::asinh(Tensor self) -> (Tensor)
    • aten::atan(Tensor self) -> (Tensor)
    • aten::atanh(Tensor self) -> (Tensor)
    • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
    • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
    • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::ceil(Tensor self) -> (Tensor)
    • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
    • aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
    • aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
    • aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
    • aten::cos(Tensor self) -> (Tensor)
    • aten::cosh(Tensor self) -> (Tensor)
    • aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
    • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
    • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
    • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::erf(Tensor self) -> (Tensor)
    • aten::exp(Tensor self) -> (Tensor)
    • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
    • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
    • aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
    • aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
    • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
    • aten::floor(Tensor self) -> (Tensor)
    • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
    • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::gelu(Tensor self) -> (Tensor)
    • aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
    • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
    • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
    • aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
    • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
    • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
    • aten::log(Tensor self) -> (Tensor)
    • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
    • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
    • aten::matmul(Tensor self, Tensor other) -> (Tensor)
    • aten::max(Tensor self) -> (Tensor)
    • aten::max.other(Tensor self, Tensor other) -> (Tensor)
    • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::min(Tensor self) -> (Tensor)
    • aten::min.other(Tensor self, Tensor other) -> (Tensor)
    • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
    • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
    • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::neg(Tensor self) -> (Tensor)
    • aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
    • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
    • aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
    • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
    • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
    • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
    • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::reciprocal(Tensor self) -> (Tensor)
    • aten::relu(Tensor input) -> (Tensor)
    • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
    • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
    • aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
    • aten::reshape(Tensor self, int[] shape) -> (Tensor)
    • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
    • aten::sigmoid(Tensor input) -> (Tensor)
    • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
    • aten::sin(Tensor self) -> (Tensor)
    • aten::sinh(Tensor self) -> (Tensor)
    • aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
    • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
    • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
    • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::sqrt(Tensor self) -> (Tensor)
    • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::t(Tensor self) -> (Tensor)
    • aten::tan(Tensor self) -> (Tensor)
    • aten::tanh(Tensor input) -> (Tensor)
    • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
    • aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
    • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
    • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
    • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
    • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
    • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
    • trt::const(Tensor self) -> (Tensor)

    Operators Currently Supported Through Evaluators

    • aten::Bool.float(float b) -> (bool)
    • aten::Bool.int(int a) -> (bool)
    • aten::Float.Scalar(Scalar a) -> float
    • aten::Float.bool(bool a) -> float
    • aten::Float.int(int a) -> float
    • aten::Int.Scalar(Scalar a) -> int
    • aten::Int.bool(bool a) -> int
    • aten::Int.float(float a) -> int
    • aten::Int.int(int a) -> int
    • aten::and(int a, int b) -> (bool)
    • aten::and.bool(bool a, bool b) -> (bool)
    • aten::getitem.t(t list, int idx) -> (t(*))
    • aten::is(t1 self, t2 obj) -> bool
    • aten::isnot(t1 self, t2 obj) -> bool
    • aten::not(bool self) -> bool
    • aten::or(int a, int b) -> (bool)
    • aten::__round_to_zero_floordiv(int a, int b) -> (int)
    • aten::xor(int a, int b) -> (bool)
    • aten::add.float(float a, float b) -> (float)
    • aten::add.int(int a, int b) -> (int)
    • aten::add_.t(t self, t[] b) -> (t[])
    • aten::append.t(t self, t(c -> *) el) -> (t)
    • aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
    • aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
    • aten::dim(Tensor self) -> int
    • aten::div.float(float a, float b) -> (float)
    • aten::div.int(int a, int b) -> (float)
    • aten::eq.bool(bool a, bool b) -> (bool)
    • aten::eq.float(float a, float b) -> (bool)
    • aten::eq.float_int(float a, int b) -> (bool)
    • aten::eq.int(int a, int b) -> (bool)
    • aten::eq.int_float(int a, float b) -> (bool)
    • aten::eq.str(str a, str b) -> (bool)
    • aten::floor.float(float a) -> (int)
    • aten::floor.int(int a) -> (int)
    • aten::floordiv.float(float a, float b) -> (int)
    • aten::floordiv.int(int a, int b) -> (int)
    • aten::ge.bool(bool a, bool b) -> (bool)
    • aten::ge.float(float a, float b) -> (bool)
    • aten::ge.float_int(float a, int b) -> (bool)
    • aten::ge.int(int a, int b) -> (bool)
    • aten::ge.int_float(int a, float b) -> (bool)
    • aten::gt.bool(bool a, bool b) -> (bool)
    • aten::gt.float(float a, float b) -> (bool)
    • aten::gt.float_int(float a, int b) -> (bool)
    • aten::gt.int(int a, int b) -> (bool)
    • aten::gt.int_float(int a, float b) -> (bool)
    • aten::is_floating_point(Tensor self) -> (bool)
    • aten::le.bool(bool a, bool b) -> (bool)
    • aten::le.float(float a, float b) -> (bool)
    • aten::le.float_int(float a, int b) -> (bool)
    • aten::le.int(int a, int b) -> (bool)
    • aten::le.int_float(int a, float b) -> (bool)
    • aten::len.t(t[] a) -> (int)
    • aten::lt.bool(bool a, bool b) -> (bool)
    • aten::lt.float(float a, float b) -> (bool)
    • aten::lt.float_int(float a, int b) -> (bool)
    • aten::lt.int(int a, int b) -> (bool)
    • aten::lt.int_float(int a, float b) -> (bool)
    • aten::mul.float(float a, float b) -> (float)
    • aten::mul.int(int a, int b) -> (int)
    • aten::ne.bool(bool a, bool b) -> (bool)
    • aten::ne.float(float a, float b) -> (bool)
    • aten::ne.float_int(float a, int b) -> (bool)
    • aten::ne.int(int a, int b) -> (bool)
    • aten::ne.int_float(int a, float b) -> (bool)
    • aten::neg.int(int a) -> (int)
    • aten::numel(Tensor self) -> int
    • aten::size(Tensor self) -> (int[])
    • aten::size.int(Tensor self, int dim) -> (int)
    • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
    • aten::sqrt.float(float a) -> (float)
    • aten::sqrt.int(int a) -> (float)
    • aten::sub.float(float a, float b) -> (float)
    • aten::sub.int(int a, int b) -> (int)
    • aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
    • prim::dtype(Tensor a) -> (int)
    • prim::max.bool(bool a, bool b) -> (bool)
    • prim::max.float(float a, float b) -> (bool)
    • prim::max.float_int(float a, int b) -> (bool)
    • prim::max.int(int a, int b) -> (bool)
    • prim::max.int_float(int a, float b) -> (bool)
    • prim::max.self_int(int[] self) -> (int)
    • prim::min.bool(bool a, bool b) -> (bool)
    • prim::min.float(float a, float b) -> (bool)
    • prim::min.float_int(float a, int b) -> (bool)
    • prim::min.int(int a, int b) -> (bool)
    • prim::min.int_float(int a, float b) -> (bool)
    • prim::min.self_int(int[] self) -> (int)
    • prim::shape(Tensor a) -> (int[])
    Source code(tar.gz)
    Source code(zip)
    libtorchtrt-v1.0.0-cudnn8.2-tensorrt8.0-cuda11.3-libtorch1.10.0.tar.gz(1.73 MB)
    libtorchtrt-v1.0.0-pre-cxx11-abi-cudnn8.2-tensorrt8.0-cuda11.3-libtorch1.10.0.tar.gz(1.74 MB)
    torch_tensorrt-1.0.0-cp36-cp36m-linux_x86_64.whl(8.26 MB)
    torch_tensorrt-1.0.0-cp37-cp37m-linux_x86_64.whl(8.26 MB)
    torch_tensorrt-1.0.0-cp38-cp38-linux_x86_64.whl(8.30 MB)
    torch_tensorrt-1.0.0-cp39-cp39-linux_x86_64.whl(8.21 MB)
  • v0.4.1(Oct 6, 2021)

    TRTorch v0.4.1

    Bug Fixes for Module Ignorelist for Partial Compilation, trtorch.Device, Version updates for PyTorch, TensorRT, cuDNN

    Target Platform Changes

    This is the first patch of TRTorch v0.4. It now targets by default PyTorch 1.9.1, TensorRT 8.0.3.4 and cuDNN 8.2.4.15 and CUDA 11.1. Older versions of PyTorch, TensorRT, cuDNN are still supported in the same manner as TRTorch v0.4.0

    Module Ignorelist for Partial Compilation

    There was an issue with the pass marking modules to be ignored during compilation where it unsafely assumed that methods are named forward all the way down the module tree. While this was fine for 1.8.0, with PyTorch 1.9.0, the TorchScript codegen changed slightly to sometimes use methods of other names for modules which reduce trivially to a functional api. This fix now will identify method calls as the recursion point and then use those method calls to select modules to recurse on. It will also check to verify existence of these modules and methods before recursing. Finally this pass was run by default even if the ignore list was empty causing issues for users not using the feature. Therefore this pass is now disabled unless explicitly enabled

    trtorch.Device

    Some of the constructors for trtorch.Device would not work or incorrectly configure the device. This patch will fix those issues.

    Dependencies

    - Bazel 4.0.0
    - LibTorch 1.9.1
    - CUDA 11.1 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
    - cuDNN 8.2.3.4
    - TensorRT 8.0.3.4
    

    0.4.1 (2021-10-06)

    Bug Fixes

    • //core/lowering: Fixes module level fallback recursion (2fc612d)
    • Move some lowering passes to graph level logging (0266f41)
    • //py: Fix trtorch.Device alternate contructor options (ac26841)

    Operators Supported

    Operators Currently Supported Through Converters

    • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
    • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
    • aten::abs(Tensor self) -> (Tensor)
    • aten::acos(Tensor self) -> (Tensor)
    • aten::acosh(Tensor self) -> (Tensor)
    • aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
    • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
    • aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::asin(Tensor self) -> (Tensor)
    • aten::asinh(Tensor self) -> (Tensor)
    • aten::atan(Tensor self) -> (Tensor)
    • aten::atanh(Tensor self) -> (Tensor)
    • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
    • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
    • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::ceil(Tensor self) -> (Tensor)
    • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
    • aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
    • aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
    • aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
    • aten::cos(Tensor self) -> (Tensor)
    • aten::cosh(Tensor self) -> (Tensor)
    • aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
    • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
    • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
    • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::erf(Tensor self) -> (Tensor)
    • aten::exp(Tensor self) -> (Tensor)
    • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
    • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
    • aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
    • aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
    • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
    • aten::floor(Tensor self) -> (Tensor)
    • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
    • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::gelu(Tensor self) -> (Tensor)
    • aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
    • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
    • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
    • aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
    • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
    • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
    • aten::log(Tensor self) -> (Tensor)
    • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
    • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
    • aten::matmul(Tensor self, Tensor other) -> (Tensor)
    • aten::max(Tensor self) -> (Tensor)
    • aten::max.other(Tensor self, Tensor other) -> (Tensor)
    • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::min(Tensor self) -> (Tensor)
    • aten::min.other(Tensor self, Tensor other) -> (Tensor)
    • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
    • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
    • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::neg(Tensor self) -> (Tensor)
    • aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
    • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
    • aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
    • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
    • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
    • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
    • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::reciprocal(Tensor self) -> (Tensor)
    • aten::relu(Tensor input) -> (Tensor)
    • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
    • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
    • aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
    • aten::reshape(Tensor self, int[] shape) -> (Tensor)
    • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
    • aten::sigmoid(Tensor input) -> (Tensor)
    • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
    • aten::sin(Tensor self) -> (Tensor)
    • aten::sinh(Tensor self) -> (Tensor)
    • aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
    • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
    • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
    • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::sqrt(Tensor self) -> (Tensor)
    • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::t(Tensor self) -> (Tensor)
    • aten::tan(Tensor self) -> (Tensor)
    • aten::tanh(Tensor input) -> (Tensor)
    • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
    • aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
    • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
    • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
    • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
    • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
    • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
    • trt::const(Tensor self) -> (Tensor)

    Operators Currently Supported Through Evaluators

    • aten::Bool.float(float b) -> (bool)
    • aten::Bool.int(int a) -> (bool)
    • aten::Float.Scalar(Scalar a) -> float
    • aten::Float.bool(bool a) -> float
    • aten::Float.int(int a) -> float
    • aten::Int.Scalar(Scalar a) -> int
    • aten::Int.bool(bool a) -> int
    • aten::Int.float(float a) -> int
    • aten::Int.int(int a) -> int
    • aten::and(int a, int b) -> (bool)
    • aten::getitem.t(t list, int idx) -> (t(*))
    • aten::is(t1 self, t2 obj) -> bool
    • aten::isnot(t1 self, t2 obj) -> bool
    • aten::not(bool self) -> bool
    • aten::or(int a, int b) -> (bool)
    • aten::__round_to_zero_floordiv(int a, int b) -> (int)
    • aten::xor(int a, int b) -> (bool)
    • aten::add.float(float a, float b) -> (float)
    • aten::add.int(int a, int b) -> (int)
    • aten::add_.t(t self, t[] b) -> (t[])
    • aten::append.t(t self, t(c -> *) el) -> (t)
    • aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
    • aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
    • aten::dim(Tensor self) -> int
    • aten::div.float(float a, float b) -> (float)
    • aten::div.int(int a, int b) -> (float)
    • aten::eq.bool(bool a, bool b) -> (bool)
    • aten::eq.float(float a, float b) -> (bool)
    • aten::eq.float_int(float a, int b) -> (bool)
    • aten::eq.int(int a, int b) -> (bool)
    • aten::eq.int_float(int a, float b) -> (bool)
    • aten::floor.float(float a) -> (int)
    • aten::floor.int(int a) -> (int)
    • aten::floordiv.float(float a, float b) -> (int)
    • aten::floordiv.int(int a, int b) -> (int)
    • aten::ge.bool(bool a, bool b) -> (bool)
    • aten::ge.float(float a, float b) -> (bool)
    • aten::ge.float_int(float a, int b) -> (bool)
    • aten::ge.int(int a, int b) -> (bool)
    • aten::ge.int_float(int a, float b) -> (bool)
    • aten::gt.bool(bool a, bool b) -> (bool)
    • aten::gt.float(float a, float b) -> (bool)
    • aten::gt.float_int(float a, int b) -> (bool)
    • aten::gt.int(int a, int b) -> (bool)
    • aten::gt.int_float(int a, float b) -> (bool)
    • aten::is_floating_point(Tensor self) -> (bool)
    • aten::le.bool(bool a, bool b) -> (bool)
    • aten::le.float(float a, float b) -> (bool)
    • aten::le.float_int(float a, int b) -> (bool)
    • aten::le.int(int a, int b) -> (bool)
    • aten::le.int_float(int a, float b) -> (bool)
    • aten::len.t(t[] a) -> (int)
    • aten::lt.bool(bool a, bool b) -> (bool)
    • aten::lt.float(float a, float b) -> (bool)
    • aten::lt.float_int(float a, int b) -> (bool)
    • aten::lt.int(int a, int b) -> (bool)
    • aten::lt.int_float(int a, float b) -> (bool)
    • aten::mul.float(float a, float b) -> (float)
    • aten::mul.int(int a, int b) -> (int)
    • aten::ne.bool(bool a, bool b) -> (bool)
    • aten::ne.float(float a, float b) -> (bool)
    • aten::ne.float_int(float a, int b) -> (bool)
    • aten::ne.int(int a, int b) -> (bool)
    • aten::ne.int_float(int a, float b) -> (bool)
    • aten::neg.int(int a) -> (int)
    • aten::numel(Tensor self) -> int
    • aten::size(Tensor self) -> (int[])
    • aten::size.int(Tensor self, int dim) -> (int)
    • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
    • aten::sqrt.float(float a) -> (float)
    • aten::sqrt.int(int a) -> (float)
    • aten::sub.float(float a, float b) -> (float)
    • aten::sub.int(int a, int b) -> (int)
    • aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
    • prim::dtype(Tensor a) -> (int)
    • prim::max.bool(bool a, bool b) -> (bool)
    • prim::max.float(float a, float b) -> (bool)
    • prim::max.float_int(float a, int b) -> (bool)
    • prim::max.int(int a, int b) -> (bool)
    • prim::max.int_float(int a, float b) -> (bool)
    • prim::max.self_int(int[] self) -> (int)
    • prim::min.bool(bool a, bool b) -> (bool)
    • prim::min.float(float a, float b) -> (bool)
    • prim::min.float_int(float a, int b) -> (bool)
    • prim::min.int(int a, int b) -> (bool)
    • prim::min.int_float(int a, float b) -> (bool)
    • prim::min.self_int(int[] self) -> (int)
    • prim::shape(Tensor a) -> (int[])
    Source code(tar.gz)
    Source code(zip)
    libtrtorch-v0.4.1-cudnn8.2-tensorrt8.0-cuda11.1-libtorch1.9.1.tar.gz(1.68 MB)
    libtrtorch-v0.4.1-pre-cxx11-abi-cudnn8.2-tensorrt8.0-cuda11.1-libtorch1.9.1.tar.gz(1.69 MB)
    trtorch-0.4.1-cp36-cp36m-linux_x86_64.whl(7.21 MB)
    trtorch-0.4.1-cp37-cp37m-linux_x86_64.whl(7.21 MB)
    trtorch-0.4.1-cp38-cp38-linux_x86_64.whl(7.25 MB)
    trtorch-0.4.1-cp39-cp39-linux_x86_64.whl(7.17 MB)
  • v0.4.0(Aug 24, 2021)

    TRTorch v0.4.0

    Support for PyTorch 1.9, TensorRT 8.0. Introducing INT8 Execution for QAT models, Module Based Partial Compilation, Auto Device Configuration, Input Class, Usability Improvements, New Converters, Bug Fixes

    Target Platform Changes

    This is the fourth beta release of TRTorch, targeting PyTorch 1.9, CUDA 11.1 (on x86_64, CUDA 10.2 on aarch64), cuDNN 8.2 and TensorRT 8.0 with backwards compatible source for TensorRT 7.1. On aarch64 TRTorch targets Jetpack 4.6 primarily with backwards compatibile source for Jetpack 4.5. When building on Jetson, the flag --platforms //toolchains:jetpack_4.x must be now be provided for C++ compilation to select the correct dependency paths. For python by default it is assumed the Jetpack version is 4.6. To override this add the --jetpack-version 4.5 flag when building.

    TensorRT 8.0

    This release adds support for compiling models trained with Quantization aware training (QAT) allowing users using the TensorRT PyTorch Quantization Toolkit (https://github.com/NVIDIA/TensorRT/tree/master/tools/pytorch-quantization) to compile their models using TRTorch. For more information and a tutorial, refer to https://www.github.com/NVIDIA/TRTorch/tree/v0.4.0/examples/int8/qat. It also adds support for sparsity via the sparse_weights flag in the compile spec. This allows TensorRT to utilize specialized hardware in Ampere GPUs to minimize unnecessary computation and therefore increase computational efficiency.

    Partial Compilation

    In v0.4.0 the partial compilation feature of TRTorch can now be considered beta level stability. New in this release is the ability to specify entire PyTorch modules to run in PyTorch explicitly as part of partial compilation. This should let users isolate troublesome code easily when compiling. Again, feedback on this feature is greatly appreciated.

    Automatic Device Configuration at Runtime

    v0.4.0 also changes the "ABI" of TRTorch to now include information about the target device for the program. Programs compiled with v0.4.0 will look for and select the most compatible available device. The rules used are: Any valid device option must have the same SM capability as the device building the engine. From there, TRTorch prefers the same device (e.g. Built on A100 so A100 is better than A30) and finally prefers the same device ID. Users will be warned if this selected device is not the current active device in the course of execution as overhead may be incurred in transferring input tensors from the current device to the target device. Users can then modify their code to avoid this. Due to this ABI change, existing compiled TRTorch programs are incompatible with the TRTorch v0.4.0 runtime. From v0.4.0 onwards an internal ABI version will check program compatibility. This ABI version is only incremented with breaking changes to the ABI.

    API Changes (Input, enabled_precisions, Device)

    TRTorch v0.4.0 changes the API for specifying Input shapes and data types to provide users more control over configuration. The new API makes use of the class trtorch.Input which lets users set the shape (or shape range) as well as memory layout and expected data type. These input specs are set in the input field of the CompileSpec.

    "inputs": [
            trtorch.Input((1, 3, 224, 224)), # Static input shape for input #1
            trtorch.Input(
                min_shape=(1, 224, 224, 3),
                opt_shape=(1, 512, 512, 3),
                max_shape=(1, 1024, 1024, 3),
                dtype=torch.int32,
                format=torch.channel_last,
            ) # Dynamic input shape for input #2, input type int and channel last format
        ],
    

    The legacy input_shapes field and associated usage with lists of tuples/InputRanges should now be considered deprecated. They remain usable in v0.4.0 but will be removed in the next release. Similarly, the compile spec field op_precision is now also deprecated in favor of enabled_precisions. enabled_precisions is a set containing the data types that kernels will be allowed to use. Whereas setting op_precision = torch.int8 would implicitly enable FP32 and FP16 kernels as well, now enabled_precisions should be set as {torch.float32, torch.float16, torch.int8} to do the same. In order to maintain similar behavior to normal PyTorch, if FP16 is the lowest precision enabled but no explicit data type is set for the inputs to the model, the expectation will be that inputs will be in FP16 . For other cases (FP32, INT8) FP32 is the default, similar to PyTorch and previous versions of TRTorch. Finally in the Python API, a class trtorch.Device has been added. While users can continue to use torch.Device or other torch APIs, trtorch.Device allows for better control for the specific use cases of compiling with TRTorch (e.g. setting DLA core and GPU fallback). This class is very similar to the C++ version with a couple additions of syntactic sugar to make the class easier and more familiar to use:

    trtorch.Device("dla:0", allow_gpu_fallback=False) #Set device as DLA Core 0 (implicitly sets the GPU managing DLA cores as the GPU and sets fallback to false)
    

    trtorch.Device can be used instead of a dictionary in the compile spec if desired.

    trtorchc has been updated to reflect these API changes. Users can set the shape, dtype and format of inputs from the command line using the following format "[(MIN_N,..,MIN_C,MIN_H,MIN_W);(OPT_N,..,OPT_C,OPT_H,OPT_W);(MAX_N,..,MAX_C,MAX_H,MAX_W)]@DTYPE%FORMAT" e.g. (3, 3, 32,32)@f16%NHWC. -p is now a repeatable flag to enable multiple precisions. Also added are repeatable flags --ffm and --ffo to mark specific modules and operators for running in PyTorch respectively. To use these two options, --allow-torch-fallback should be set. Options for embedding serialized engines (--embed-engine) and sparsity (--sparse-weights) added as well.

    Usability

    Finally, TRTorch v0.4.0 also now includes the ability to provide backtraces for locations in your model which TRTorch does not support. This can help in identifying locations in the model that might need to change for TRTorch support or modules which should run fully in PyTorch via partial compilation.

    Dependencies

    - Bazel 4.0.0
    - LibTorch 1.9.0
    - CUDA 11.1 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
    - cuDNN 8.2.2.3
    - TensorRT 8.0.1.6
    

    0.4.0 (2021-08-24)

    • feat(serde)!: Refactor CudaDevice struct, implement ABI versioning, (9327cce)
    • feat(//py)!: Implementing top level python api changes to reflect new (482265f)
    • feat(//cpp)!: Changes to TRTorch C++ api reflecting Input and (08b4942)
    • feat!: Pytorch 1.9 version bump (a12d249)
    • feat(//core/runtime)!: Better and more portable names for engines (6eb3bb2)

    Bug Fixes

    • //core/conversion/conversionctx: Guard final engine building (dfa9ae8)
    • //core/lowering: use lower_info as parameter (370aeb9)
    • //cpp/ptq: fixing bad accuracy in just the example code (7efa11d)
    • //py: Fix python setup.py with new libtrtorch.so location (68ba63c)
    • //tests: fix optional jetson tests (4c32a83)
    • //tests: use right type for masked_fill test (4a5c28f)
    • aten::cat: support neg dim for cat (d8ca182)
    • aten::select and aten::var: Fix converters to handle negative axes (3a734a2)
    • aten::slice: Allow slicing of pytorch tensors (50f012e)
    • aten::tensor: Last dim doesnt always get written right (b68d4aa)
    • aten::tensor: Last dim doesnt always get written right (38744bc)
    • Address review comments, fix failing tests due to bool mishandling (13eef91)
    • Final working version of QAT in TRTorch (521a0cb)
    • fix aten::sub.scalar operator (9a09514)
    • Fix linear lowering pass, lift layer_norm scale layer restriction and matmul layer nbdims restriction (930d582)
    • Fix testcases using old InputRange API (ff87956)
    • Fix TRT8 engine capability flags (2b69742)
    • Fix warnings thrown by noexcept functions (c5f7eea)
    • Fix warnings thrown by noexcept functions (ddc8950)
    • Minor fixes to qat scripts (b244423)
    • Restrict TRTorch to compile only forward methods (9f006d5)
    • Transfer calibration data to gpu when it is not a batch (23739cb)
    • typo in aten::batch_norm (d47f48f)
    • qat: Rescale input data for C++ application (9dc6061)
    • Use len() to get size of dataset (ccc60d5)
    • device_conf: Devices never actually got swithed in multi device (f1d0a43)
    • exception_elimination: Exception branches are no longer consistent (d61b667)
    • to_backend: Clean up to_backend implementation (4e15605)
    • trtorchc: Allow for workspaces larger than 2G and better debugging (e1e7812)
    • Using TensorRT 8 new API calls (14691e7)
    • Using TensorRT 8 new API calls (fa969a5)

    Features

    • //core/conversion: Adding error prefix to python source traceback (4bf2a41)
    • //core/conversion: Handle adding and wrapping ITensors as (a22e99b)
    • //core/ir: Implementing new internal input spec type (316df28)
    • //core/lowering: Adding two passes, one to delimit and one to mark (2e04ce5)
    • //core/lowering: additional logging in module fallback (ad07645)
    • //core/plugins: Add adaptive_max_pool2d plugin, enable the plugins to run on GPU (6f4aa40)
    • //cpp/int8/qat: QAT application release (d8f5d29)
    • //examples/int8: Implement Makefile based execution for ptq and qat (b7f6d8a)
    • //examples/int8/qat: Install pytorch-quantization with (1ca1484)
    • //py: add user level device class in py for embed engine (d99169f)
    • aten::masked_fill: In progress implementation of masked_fill (fa7d6d9)
    • aten::ones: Adding support for aten::ones (2b45a3d)
    • aten::slice: Patching slice for new optional params (a11287f)
    • aten::sqrt: Adding support for sqrt evaluators (6aaba3b)
    • aten::std|aten::masked_fill: Implement masked_fill, aten::std (a086a5b)
    • aten::std|aten::masked_fill: Implement masked_fill, aten::std (2866627)
    • jetson: Support for Jetpack 4.6 (9760fe3)
    • to_backend: Updating backend integration preproc function (080b594)
    • Enable sparsity support in TRTorch (f9e1f2b)
    • trtorchc: Adding flag for sparse weights (bfdc6f5)
    • Add aten::full converter, quantization ops testcases (9f2ffd0)
    • Add aten::type_as lowering pass (b57a6dd)
    • Add functionality for QAT workflow (fc8eafb)
    • Add functionality for QAT workflow (f776e76)
    • Add support for providing input datatypes in TRTorch (a3f4a3c)
    • Adding automatic casting to compare layers (90af26e)
    • Enable sparsity support in TRTorch (decd0ed)
    • Enable TRT 8.0 QAT functionality in TRTorch (c76a28a)
    • Makefile for trtorchrt.so example (c60c521)
    • show pytorch code of unsupported operators (2ee2a84)
    • support aten::Int (5bc977d)
    • trtorchc: Adding more dtype aliases (652fb13)
    • trtorchc: Adding new support for dtypes and formats in (c39bf81)
    • Support fallback options in trtorchc (ad966b7)
    • Using shared_ptrs to manage TRT resources in runtime (e336630)
    • trtorchc: Embedding engines in modules from the CLI (2b4b9e3)

    BREAKING CHANGES

    • This commit cleans up the WIP CudaDevice class, simplifying implementation and formalizing the seralized format for CUDA devices.

    It also implements ABI Versioning. The first entry in the serialized format of a TRTEngine now records the ABI that the engine was compiled with, defining expected compatibility with the TRTorch runtime. If the ABI version does not match, the runtime will error out asking to recompile the program.

    ABI version is a monotonically increasing integer and should be incremented everytime the serialization format changes in some way.

    This commit cleans up the CudaDevice class, implementing a number of constructors to replace the various utility functions that populate the struct. Descriptive utility functions remain but solely call the relevant constructor.

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This commit introduces the next iteration of the Python TRTorch API. Starting in TRTorch v0.5.0 support for the "input_shapes" and "op_precision" compile spec keys will be removed. Users should port forward to using the "inputs" key which expects a list of trtorch.Input objects and the "enabled_precisions" key which expects a set of data type specifying enums.

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This change deprecates InputRange, and the CompileSpec fields "input_shapes", "op_precision" and associated contructors and functions. These are replaced wtih Input, "inputs" and "enabled_precisions" respectively. Deprecated components will be removed in TRTorch v0.5.0

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • Updating PyTorch version to 1.9.0 which includes breaking changes to the to_backend api

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This bumps the TRTorch ABI version to 3 due to a new field for engine name included in the serialized form of TRTEngine. This lets deserialized engines have the same name they serialized with

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]>

    Supported Operators in TRTorch v0.4.0

    Operators Currently Supported Through Converters

    • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
    • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
    • aten::abs(Tensor self) -> (Tensor)
    • aten::acos(Tensor self) -> (Tensor)
    • aten::acosh(Tensor self) -> (Tensor)
    • aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
    • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
    • aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
    • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::asin(Tensor self) -> (Tensor)
    • aten::asinh(Tensor self) -> (Tensor)
    • aten::atan(Tensor self) -> (Tensor)
    • aten::atanh(Tensor self) -> (Tensor)
    • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
    • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
    • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::ceil(Tensor self) -> (Tensor)
    • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
    • aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
    • aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
    • aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
    • aten::cos(Tensor self) -> (Tensor)
    • aten::cosh(Tensor self) -> (Tensor)
    • aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
    • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
    • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
    • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::erf(Tensor self) -> (Tensor)
    • aten::exp(Tensor self) -> (Tensor)
    • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
    • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
    • aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
    • aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
    • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
    • aten::floor(Tensor self) -> (Tensor)
    • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
    • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::gelu(Tensor self) -> (Tensor)
    • aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
    • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
    • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
    • aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
    • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
    • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
    • aten::log(Tensor self) -> (Tensor)
    • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
    • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
    • aten::matmul(Tensor self, Tensor other) -> (Tensor)
    • aten::max(Tensor self) -> (Tensor)
    • aten::max.other(Tensor self, Tensor other) -> (Tensor)
    • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::min(Tensor self) -> (Tensor)
    • aten::min.other(Tensor self, Tensor other) -> (Tensor)
    • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
    • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
    • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::neg(Tensor self) -> (Tensor)
    • aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
    • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
    • aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
    • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
    • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
    • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
    • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::reciprocal(Tensor self) -> (Tensor)
    • aten::relu(Tensor input) -> (Tensor)
    • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
    • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
    • aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
    • aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
    • aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
    • aten::reshape(Tensor self, int[] shape) -> (Tensor)
    • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
    • aten::sigmoid(Tensor input) -> (Tensor)
    • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
    • aten::sin(Tensor self) -> (Tensor)
    • aten::sinh(Tensor self) -> (Tensor)
    • aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
    • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
    • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
    • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::sqrt(Tensor self) -> (Tensor)
    • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::t(Tensor self) -> (Tensor)
    • aten::tan(Tensor self) -> (Tensor)
    • aten::tanh(Tensor input) -> (Tensor)
    • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
    • aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
    • aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
    • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
    • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
    • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
    • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
    • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
    • trt::const(Tensor self) -> (Tensor)

    Operators Currently Supported Through Evaluators

    • aten::Bool.float(float b) -> (bool)
    • aten::Bool.int(int a) -> (bool)
    • aten::Float.Scalar(Scalar a) -> float
    • aten::Float.bool(bool a) -> float
    • aten::Float.int(int a) -> float
    • aten::Int.Scalar(Scalar a) -> int
    • aten::Int.bool(bool a) -> int
    • aten::Int.float(float a) -> int
    • aten::Int.int(int a) -> int
    • aten::and(int a, int b) -> (bool)
    • aten::getitem.t(t list, int idx) -> (t(*))
    • aten::is(t1 self, t2 obj) -> bool
    • aten::isnot(t1 self, t2 obj) -> bool
    • aten::not(bool self) -> bool
    • aten::or(int a, int b) -> (bool)
    • aten::__round_to_zero_floordiv(int a, int b) -> (int)
    • aten::xor(int a, int b) -> (bool)
    • aten::add.float(float a, float b) -> (float)
    • aten::add.int(int a, int b) -> (int)
    • aten::add_.t(t self, t[] b) -> (t[])
    • aten::append.t(t self, t(c -> *) el) -> (t)
    • aten::arange(Scalar end, *, int? dtype=None, int? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
    • aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
    • aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
    • aten::dim(Tensor self) -> int
    • aten::div.float(float a, float b) -> (float)
    • aten::div.int(int a, int b) -> (float)
    • aten::eq.bool(bool a, bool b) -> (bool)
    • aten::eq.float(float a, float b) -> (bool)
    • aten::eq.float_int(float a, int b) -> (bool)
    • aten::eq.int(int a, int b) -> (bool)
    • aten::eq.int_float(int a, float b) -> (bool)
    • aten::floor.float(float a) -> (int)
    • aten::floor.int(int a) -> (int)
    • aten::floordiv.float(float a, float b) -> (int)
    • aten::floordiv.int(int a, int b) -> (int)
    • aten::ge.bool(bool a, bool b) -> (bool)
    • aten::ge.float(float a, float b) -> (bool)
    • aten::ge.float_int(float a, int b) -> (bool)
    • aten::ge.int(int a, int b) -> (bool)
    • aten::ge.int_float(int a, float b) -> (bool)
    • aten::gt.bool(bool a, bool b) -> (bool)
    • aten::gt.float(float a, float b) -> (bool)
    • aten::gt.float_int(float a, int b) -> (bool)
    • aten::gt.int(int a, int b) -> (bool)
    • aten::gt.int_float(int a, float b) -> (bool)
    • aten::is_floating_point(Tensor self) -> (bool)
    • aten::le.bool(bool a, bool b) -> (bool)
    • aten::le.float(float a, float b) -> (bool)
    • aten::le.float_int(float a, int b) -> (bool)
    • aten::le.int(int a, int b) -> (bool)
    • aten::le.int_float(int a, float b) -> (bool)
    • aten::len.t(t[] a) -> (int)
    • aten::lt.bool(bool a, bool b) -> (bool)
    • aten::lt.float(float a, float b) -> (bool)
    • aten::lt.float_int(float a, int b) -> (bool)
    • aten::lt.int(int a, int b) -> (bool)
    • aten::lt.int_float(int a, float b) -> (bool)
    • aten::mul.float(float a, float b) -> (float)
    • aten::mul.int(int a, int b) -> (int)
    • aten::ne.bool(bool a, bool b) -> (bool)
    • aten::ne.float(float a, float b) -> (bool)
    • aten::ne.float_int(float a, int b) -> (bool)
    • aten::ne.int(int a, int b) -> (bool)
    • aten::ne.int_float(int a, float b) -> (bool)
    • aten::neg.int(int a) -> (int)
    • aten::numel(Tensor self) -> int
    • aten::size(Tensor self) -> (int[])
    • aten::size.int(Tensor self, int dim) -> (int)
    • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
    • aten::sqrt.float(float a) -> (float)
    • aten::sqrt.int(int a) -> (float)
    • aten::sub.float(float a, float b) -> (float)
    • aten::sub.int(int a, int b) -> (int)
    • aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
    • prim::dtype(Tensor a) -> (int)
    • prim::max.bool(bool a, bool b) -> (bool)
    • prim::max.float(float a, float b) -> (bool)
    • prim::max.float_int(float a, int b) -> (bool)
    • prim::max.int(int a, int b) -> (bool)
    • prim::max.int_float(int a, float b) -> (bool)
    • prim::max.self_int(int[] self) -> (int)
    • prim::min.bool(bool a, bool b) -> (bool)
    • prim::min.float(float a, float b) -> (bool)
    • prim::min.float_int(float a, int b) -> (bool)
    • prim::min.int(int a, int b) -> (bool)
    • prim::min.int_float(int a, float b) -> (bool)
    • prim::min.self_int(int[] self) -> (int)
    • prim::shape(Tensor a) -> (int[])
    Source code(tar.gz)
    Source code(zip)
    libtrtorch-v0.4.0-cudnn8.2-tensorrt8.0-cuda11.1-libtorch1.9.0.tar.gz(1.68 MB)
    libtrtorch-v0.4.0-pre-cxx11-abi-cudnn8.2-tensorrt8.0-cuda11.1-libtorch1.9.0.tar.gz(1.68 MB)
    trtorch-0.4.0-cp36-cp36m-linux_x86_64.whl(7.21 MB)
    trtorch-0.4.0-cp37-cp37m-linux_x86_64.whl(7.21 MB)
    trtorch-0.4.0-cp38-cp38-linux_x86_64.whl(7.25 MB)
    trtorch-0.4.0-cp39-cp39-linux_x86_64.whl(7.17 MB)
  • v0.3.0(May 14, 2021)

    TRTorch v0.3.0

    Support for PyTorch 1.8.x (by default 1.8.1), Introducing Plugin Library, PTQ from Python, Arbitrary TRT engine embedding, Preview Release of Partial Compilation, New Converters, Bug Fixes

    This is the third beta release of TRTorch, targeting PyTorch 1.8.x, CUDA 11.1 (on x86_64), TensorRT 7.2, cuDNN 8. TRTorch 0.3.0 binary releases target PyTorch 1.8.1 specifically, these builds are not compatible with 1.8.0, though the source code remains compatible with any PyTorch 1.8.x version. On aarch64 TRTorch targets JetPack 4.5.x. This release introduces libtrtorch_plugins.so. This library is a portable distribution of all TensorRT plugins used in TRTorch. The intended usecase is to support TRTorch programs that utilize TensorRT plugins deployed on systems with only the runtime library available or in the case that TRTorch was used to create a TensorRT engine to be run outside the TRTorch runtime, which makes uses of TRTorch plugins. An example on how to use this library can be found here: https://www.github.com/NVIDIA/TRTorch/tree/v0.3.0/examples/sample_rt_app. TRTorch 0.3.0 also now allows users to repurpose PyTorch Dataloaders to do post training quantization in Python similar to the workflow supported in C++ currently. It also introduces a new API to wrap arbitrary TensorRT engines in a PyTorch Module wrapper, making the serializable by torch.jit.save and completely compatible with other PyTorch modules. Finally, TRTorch 0.3.0 also includes a preview of the new partial compilation capability of the TRTorch compiler. With this feature, users can now instruct TRTorch to keep operations that are not supported but TRTorch/TensorRT in PyTorch. Partial compilation should be considered alpha stability and we are seeking feedback on bugs, pain points and feature requests surrounding using this feature.

    Dependencies:

    - Bazel 4.0.0
    - LibTorch 1.8.1 (on x86_64), 1.8.0 (on aarch64)
    - CUDA 11.1 (on x86_64, by default , newer CUDA 11 supported with compatible PyTorch Build), 10.2 (on aarch64)
    - cuDNN 8.1.1
    - TensorRT 7.2.3.4
    

    0.3.0 (2021-05-13)

    Bug Fixes

    • //plugins: Readding cuBLAS BUILD to allow linking of libnvinfer_plugin on Jetson (a8008f4)

    • //tests/../concat: Concat test fix (2432fb8)

    • //tests/core/partitioning: Fixing some issues with the partition (ff89059)

    • erase the repetitive nodes in dependency analysis (80b1038)

    • fix a typo for debug (c823ebd)

    • fix typo bug (e491bb5)

    • aten::linear: Fixes new issues in 1.8 that cause script based (c5057f8)

    • register the torch_fallback attribute in Python API (8b7919f)

    • support expand/repeat with IValue type input (a4882c6)

    • support shape inference for add_, support non-tensor arguments for segmented graphs (46950bb)

    • feat!: Updating versions of CUDA, cuDNN, TensorRT and PyTorch (71c4dcb)

    • feat(WORKSPACE)!: Updating PyTorch version to 1.8.1 (c9aa99a)

    Features

    • //.github: Linter throws 1 when there needs to be style changes to (a39dea7)
    • //core: New API to register arbitrary TRT engines in TorchScript (3ec836e)
    • //core/conversion/conversionctx: Adding logging for truncated (96245ee)
    • //core/partitioing: Adding ostream for Partition Info (b3589c5)
    • //core/partitioning: Add an ostream implementation for (ee536b6)
    • //core/partitioning: Refactor top level partitioning API, fix a bug with (abc63f6)
    • //core/plugins: Gating plugin logging based on global config (1d5a088)
    • added user level API for fallback (f4c29b4)
    • allow users to set fallback block size and ops (6d3064a)
    • insert nodes by dependencies for nonTensor inputs/outputs (4e32eff)
    • support aten::arange converter (014e381)
    • support aten::transpose with negative dim (4a1d2f3)
    • support Int/Bool and other constants' inputs/outputs for TensorRT segments (54e407e)
    • support prim::Param for fallback inputs (ec2bbf2)
    • support prim::Param for input type after refactor (3cebe97)
    • support Python APIs for Automatic Fallback (100b090)
    • support the case when the injected node is not supported in dependency analysis (c67d8f6)
    • support truncate long/double to int/float with option (740eb54)
    • Try to submit review before exit (9a9d7f0)
    • update truncate long/double python api (69e49e8)
    • //docker: Adding Docker 21.03 (9b326e8)
    • update truncate long/double warning message (60dba12)
    • //docker: Update CI container (df63467)
    • //py: Allowing people using the PyTorch backend to use TRTorch/TRT (6c3e0ad)
    • //py: Catch when bazel is not in path and error out when running (1da999d)
    • //py: Gate partial compilation from to_backend API (bf1b2d8)
    • //py: New API to embed engine in new module (88d07a9)
    • aten::floor: Adds floor.int evaluator (a6a46e5)

    BREAKING CHANGES

    • PyTorch version has been bumped to 1.8.0 Default CUDA version is CUDA 11.1 TensorRT version is TensorRT 7.2.3.4 cuDNN version is now cuDNN 8.1

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • Due to issues with compatability between PyTorch 1.8.0 and 1.8.1 in the Torch Python API, TRTorch 0.3.0 compiled for 1.8.0 does not work with PyTorch 1.8.1 and will show an error about use_input_stats. If you see this error make sure the version of libtorch you are compiling with is PyTorch 1.8.1

    TRTorch 0.3.0 will target PyTorch 1.8.1. There is no backwards compatability with 1.8.0. If you need this specific version compile from source with the dependencies in WORKSPACE changed

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    Supported Operators in TRTorch v0.3.0

    Operators Currently Supported Through Converters

    • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
    • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
    • aten::abs(Tensor self) -> (Tensor)
    • aten::acos(Tensor self) -> (Tensor)
    • aten::acosh(Tensor self) -> (Tensor)
    • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
    • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::asin(Tensor self) -> (Tensor)
    • aten::asinh(Tensor self) -> (Tensor)
    • aten::atan(Tensor self) -> (Tensor)
    • aten::atanh(Tensor self) -> (Tensor)
    • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
    • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::ceil(Tensor self) -> (Tensor)
    • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
    • aten::cos(Tensor self) -> (Tensor)
    • aten::cosh(Tensor self) -> (Tensor)
    • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
    • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
    • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::erf(Tensor self) -> (Tensor)
    • aten::exp(Tensor self) -> (Tensor)
    • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
    • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
    • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
    • aten::floor(Tensor self) -> (Tensor)
    • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
    • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
    • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
    • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
    • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
    • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
    • aten::log(Tensor self) -> (Tensor)
    • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
    • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::matmul(Tensor self, Tensor other) -> (Tensor)
    • aten::max(Tensor self) -> (Tensor)
    • aten::max.other(Tensor self, Tensor other) -> (Tensor)
    • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::min(Tensor self) -> (Tensor)
    • aten::min.other(Tensor self, Tensor other) -> (Tensor)
    • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
    • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
    • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::neg(Tensor self) -> (Tensor)
    • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
    • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
    • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
    • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
    • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::reciprocal(Tensor self) -> (Tensor)
    • aten::relu(Tensor input) -> (Tensor)
    • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
    • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
    • aten::reshape(Tensor self, int[] shape) -> (Tensor)
    • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
    • aten::sigmoid(Tensor input) -> (Tensor)
    • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
    • aten::sin(Tensor self) -> (Tensor)
    • aten::sinh(Tensor self) -> (Tensor)
    • aten::slice.Tensor(Tensor(a) self, int dim=0, int start=0, int end=9223372036854775807, int step=1) -> (Tensor(a))
    • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
    • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
    • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::sqrt(Tensor self) -> (Tensor)
    • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::tan(Tensor self) -> (Tensor)
    • aten::tanh(Tensor input) -> (Tensor)
    • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
    • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
    • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
    • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
    • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
    • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
    • trt::const(Tensor self) -> (Tensor)

    Operators Currently Supported Through Evaluators

    • aten::Bool.float(float b) -> (bool)
    • aten::Bool.int(int a) -> (bool)
    • aten::Float.Scalar(Scalar a) -> float
    • aten::Float.bool(bool a) -> float
    • aten::Float.int(int a) -> float
    • aten::and(int a, int b) -> (bool)
    • aten::getitem.t(t list, int idx) -> (t(*))
    • aten::is(t1 self, t2 obj) -> bool
    • aten::isnot(t1 self, t2 obj) -> bool
    • aten::not(bool self) -> bool
    • aten::or(int a, int b) -> (bool)
    • aten::__round_to_zero_floordiv(int a, int b) -> (int)
    • aten::xor(int a, int b) -> (bool)
    • aten::add.float(float a, float b) -> (float)
    • aten::add.int(int a, int b) -> (int)
    • aten::add_.t(t self, t[] b) -> (t[])
    • aten::append.t(t self, t(c -> *) el) -> (t)
    • aten::dim(Tensor self) -> int
    • aten::div.float(float a, float b) -> (float)
    • aten::div.int(int a, int b) -> (float)
    • aten::eq.bool(bool a, bool b) -> (bool)
    • aten::eq.float(float a, float b) -> (bool)
    • aten::eq.float_int(float a, int b) -> (bool)
    • aten::eq.int(int a, int b) -> (bool)
    • aten::eq.int_float(int a, float b) -> (bool)
    • aten::floor.float(float a) -> (int)
    • aten::floordiv.float(float a, float b) -> (int)
    • aten::floordiv.int(int a, int b) -> (int)
    • aten::ge.bool(bool a, bool b) -> (bool)
    • aten::ge.float(float a, float b) -> (bool)
    • aten::ge.float_int(float a, int b) -> (bool)
    • aten::ge.int(int a, int b) -> (bool)
    • aten::ge.int_float(int a, float b) -> (bool)
    • aten::gt.bool(bool a, bool b) -> (bool)
    • aten::gt.float(float a, float b) -> (bool)
    • aten::gt.float_int(float a, int b) -> (bool)
    • aten::gt.int(int a, int b) -> (bool)
    • aten::gt.int_float(int a, float b) -> (bool)
    • aten::le.bool(bool a, bool b) -> (bool)
    • aten::le.float(float a, float b) -> (bool)
    • aten::le.float_int(float a, int b) -> (bool)
    • aten::le.int(int a, int b) -> (bool)
    • aten::le.int_float(int a, float b) -> (bool)
    • aten::len.t(t[] a) -> (int)
    • aten::lt.bool(bool a, bool b) -> (bool)
    • aten::lt.float(float a, float b) -> (bool)
    • aten::lt.float_int(float a, int b) -> (bool)
    • aten::lt.int(int a, int b) -> (bool)
    • aten::lt.int_float(int a, float b) -> (bool)
    • aten::mul.float(float a, float b) -> (float)
    • aten::mul.int(int a, int b) -> (int)
    • aten::ne.bool(bool a, bool b) -> (bool)
    • aten::ne.float(float a, float b) -> (bool)
    • aten::ne.float_int(float a, int b) -> (bool)
    • aten::ne.int(int a, int b) -> (bool)
    • aten::ne.int_float(int a, float b) -> (bool)
    • aten::neg.int(int a) -> (int)
    • aten::numel(Tensor self) -> int
    • aten::size(Tensor self) -> (int[])
    • aten::size.int(Tensor self, int dim) -> (int)
    • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
    • aten::sub.float(float a, float b) -> (float)
    • aten::sub.int(int a, int b) -> (int)
    • prim::max.bool(bool a, bool b) -> (bool)
    • prim::max.float(float a, float b) -> (bool)
    • prim::max.float_int(float a, int b) -> (bool)
    • prim::max.int(int a, int b) -> (bool)
    • prim::max.int_float(int a, float b) -> (bool)
    • prim::max.self_int(int[] self) -> (int)
    • prim::min.bool(bool a, bool b) -> (bool)
    • prim::min.float(float a, float b) -> (bool)
    • prim::min.float_int(float a, int b) -> (bool)
    • prim::min.int(int a, int b) -> (bool)
    • prim::min.int_float(int a, float b) -> (bool)
    • prim::min.self_int(int[] self) -> (int)
    • prim::shape(Tensor a) -> (int[])
    Source code(tar.gz)
    Source code(zip)
    libtrtorch-v0.3.0-cudnn8.1-tensorrt7.2-cuda11.1-libtorch-1.8.1.tar.gz(1.44 MB)
    libtrtorch-v0.3.0-pre-cxx11-abi-cudnn8.1-tensorrt7.2-cuda11.1-libtorch-1.8.1.tar.gz(1.44 MB)
    trtorch-0.3.0-cp36-cp36m-linux_x86_64.whl(6.42 MB)
    trtorch-0.3.0-cp37-cp37m-linux_x86_64.whl(6.42 MB)
    trtorch-0.3.0-cp38-cp38-linux_x86_64.whl(6.45 MB)
    trtorch-0.3.0-cp39-cp39-linux_x86_64.whl(6.37 MB)
  • v0.2.0(Feb 26, 2021)

    TRTorch v0.2.0

    Support for PyTorch 1.7.x, Multi Device APIs, Runtime Library, New Converters, Bug Fixes

    This is the second beta release of TRTorch, targeting PyTorch 1.7.x, CUDA 11.0 (on x86_64), TensorRT 7.2 and cuDNN 8. TRTorch 0.2.0 for aarch64 targets JetPack 4.5.x. It updates the to_backend integration for PyTorch to reflect changes in the PyTorch API. A new API has been added to disable the newly introduced TF32 data format used on Ampere as TF32 is now the default FP32 format used in TRTorch. APIs have been solidified for runtime configuration of the active CUDA device to let users choose what device a program is deserialized on. This API will continue to change as we further define the serialization format and work with the PyTorch team to make runtime device configuration more ergonomic. You can follow this work here: https://github.com/NVIDIA/TRTorch/discussions/311. This PR also formalizes DLA support in TRTorch, adding APIs and capabilities to target DLA on Jetson and DRIVE platforms. v0.2.0 also includes a new shared library libtrtorchrt.so. This library only contains the runtime components of TRTorch and is suitable for use in situations where device footprint is extremely limited. libtrtorch.so can be linked to C++ applications and loaded into Python scripts and will load all necessary trtorch runtime components into the torch runtime allowing users to run TRTorch applications without the full compiler. v0.2.0 also adds support for Python 3.9.

    Dependencies:

    - Bazel 4.0.0
    - Libtorch 1.7.1 (on x86_64), 1.7.0 (on aarch64)
    - CUDA 11.0 (by default, newer CUDA 11 supported with compatible PyTorch build)
    - cuDNN 8.0.5
    - TensorRT 7.2.2
    

    v0.2.0 (2021-02-25)

    • refactor!: Update bazel and trt versions (0618b6b)

    Bug Fixes

    • //core/conversion/conversionctx: Fix memory leak in conversion (6f83b41)
    • //core/lowering: fix debug message for bn dim check removal pass (86bb5b7)
    • //py: Fix bounds for enum macros (6b942e5)
    • aten::expand: Fix compiler warning for unused out ITensor (5b0f584)
    • aten::expand: Fix compiler warnings in the expand converter (51b09d4)
    • aten::flatten: Fixing flatten converter to handle dynamic batch (00f2d78)
    • aten::max_pool2d: Supressing error due to not filling in stride in (ed3c185)
    • aten::zeros: verify zeros produces a tensor correctly (00d2d0c)
    • remove_to: bug in remove_to.cpp, replace outputs()[0] with inputs()[0] (6c5118a)
    • setup.py: Broaden the supported pytorch versions to handle jetson (e94a040)
    • test_op_aliasing: Fix the renamed op (91c3c80)
    • tests: Fix broken elementwise tests (22ed944)

    Features

    • support true_divide, floor_divide, max, min, rsub (a35fbf1)
    • //.github: Moving to python directly (ece114c)
    • //core/conversion: Adding a check to detect programs that will (a3d4144)
    • //core/lowering: Adding a new pass to handle new dim checks for (3d14cda)
    • //cpp/api/lib: New runtime only library (6644a9e)
    • //notebooks: Update notebooks container for 0.1.0 (a5851ff)
    • //py: [to_backend] adding device specification support for (6eeba1c), closes #286
    • aten::leaky_relu_: Adding alias for inplace leaky relu (bc53411)
    • aten::softmax: Adding support for any neg index (abc29a2)
    • aten::squeeze|aten::unsqueeze: adding BUILD files for new squeeze (9e0a1d7)
    • aten::sum: Allow for negative indices less than -1 (769bbc9)
    • aten::topk: Add a debug message noting that sorted is always true (81f1e9d)
    • aten::topk: Adding BUILD files for topk op (22e6a6b)
    • disable_tf32: Add a new API to disable TF32 (536983b)
    • interpolate: Adding support for .vec variants and overhauling test (0cda1cc)
    • interpolate: Addressing the linear, scale factor, align corners edge case (92e3818)
    • supportedops: Application to dump a list of supported operators (872d9a3)

    BREAKING CHANGES

    • Version of bazel has been bumped to 4.0.0 Version of TensorRT has been bumped to 7.2.2.3

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • The device API has now changed. Device settings are configured via a device struct which encapsulates information on selected device ids and types.

    Supported Operators in TRTorch v0.2.0

    Operators Currently Supported Through Converters

    • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
    • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
    • aten::abs(Tensor self) -> (Tensor)
    • aten::acos(Tensor self) -> (Tensor)
    • aten::acosh(Tensor self) -> (Tensor)
    • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
    • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::asin(Tensor self) -> (Tensor)
    • aten::asinh(Tensor self) -> (Tensor)
    • aten::atan(Tensor self) -> (Tensor)
    • aten::atanh(Tensor self) -> (Tensor)
    • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
    • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
    • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
    • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::ceil(Tensor self) -> (Tensor)
    • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
    • aten::cos(Tensor self) -> (Tensor)
    • aten::cosh(Tensor self) -> (Tensor)
    • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
    • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
    • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
    • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::erf(Tensor self) -> (Tensor)
    • aten::exp(Tensor self) -> (Tensor)
    • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
    • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
    • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
    • aten::floor(Tensor self) -> (Tensor)
    • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
    • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
    • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
    • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
    • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
    • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
    • aten::log(Tensor self) -> (Tensor)
    • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
    • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::matmul(Tensor self, Tensor other) -> (Tensor)
    • aten::max(Tensor self) -> (Tensor)
    • aten::max.other(Tensor self, Tensor other) -> (Tensor)
    • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
    • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
    • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::min(Tensor self) -> (Tensor)
    • aten::min.other(Tensor self, Tensor other) -> (Tensor)
    • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
    • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
    • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
    • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
    • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
    • aten::neg(Tensor self) -> (Tensor)
    • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
    • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
    • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
    • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
    • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::reciprocal(Tensor self) -> (Tensor)
    • aten::relu(Tensor input) -> (Tensor)
    • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
    • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
    • aten::reshape(Tensor self, int[] shape) -> (Tensor)
    • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
    • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
    • aten::sigmoid(Tensor input) -> (Tensor)
    • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
    • aten::sin(Tensor self) -> (Tensor)
    • aten::sinh(Tensor self) -> (Tensor)
    • aten::slice.Tensor(Tensor(a) self, int dim=0, int start=0, int end=9223372036854775807, int step=1) -> (Tensor(a))
    • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
    • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
    • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
    • aten::sqrt(Tensor self) -> (Tensor)
    • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
    • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
    • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
    • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
    • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
    • aten::tan(Tensor self) -> (Tensor)
    • aten::tanh(Tensor input) -> (Tensor)
    • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
    • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
    • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
    • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
    • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
    • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
    • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
    • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
    • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
    • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
    • trt::const(Tensor self) -> (Tensor)

    Operators Currently Supported Through Evaluators

    • aten::Bool.float(float b) -> (bool)
    • aten::Bool.int(int a) -> (bool)
    • aten::Float.Scalar(Scalar a) -> float
    • aten::Float.bool(bool a) -> float
    • aten::Float.int(int a) -> float
    • aten::and(int a, int b) -> (bool)
    • aten::getitem.t(t list, int idx) -> (t(*))
    • aten::is(t1 self, t2 obj) -> bool
    • aten::isnot(t1 self, t2 obj) -> bool
    • aten::not(bool self) -> bool
    • aten::or(int a, int b) -> (bool)
    • aten::__round_to_zero_floordiv(int a, int b) -> (int)
    • aten::xor(int a, int b) -> (bool)
    • aten::add.float(float a, float b) -> (float)
    • aten::add.int(int a, int b) -> (int)
    • aten::add_.t(t self, t[] b) -> (t[])
    • aten::append.t(t self, t(c -> *) el) -> (t)
    • aten::dim(Tensor self) -> int
    • aten::div.float(float a, float b) -> (float)
    • aten::div.int(int a, int b) -> (float)
    • aten::eq.bool(bool a, bool b) -> (bool)
    • aten::eq.float(float a, float b) -> (bool)
    • aten::eq.float_int(float a, int b) -> (bool)
    • aten::eq.int(int a, int b) -> (bool)
    • aten::eq.int_float(int a, float b) -> (bool)
    • aten::floor.float(float a) -> (int)
    • aten::floordiv.float(float a, float b) -> (int)
    • aten::floordiv.int(int a, int b) -> (int)
    • aten::ge.bool(bool a, bool b) -> (bool)
    • aten::ge.float(float a, float b) -> (bool)
    • aten::ge.float_int(float a, int b) -> (bool)
    • aten::ge.int(int a, int b) -> (bool)
    • aten::ge.int_float(int a, float b) -> (bool)
    • aten::gt.bool(bool a, bool b) -> (bool)
    • aten::gt.float(float a, float b) -> (bool)
    • aten::gt.float_int(float a, int b) -> (bool)
    • aten::gt.int(int a, int b) -> (bool)
    • aten::gt.int_float(int a, float b) -> (bool)
    • aten::le.bool(bool a, bool b) -> (bool)
    • aten::le.float(float a, float b) -> (bool)
    • aten::le.float_int(float a, int b) -> (bool)
    • aten::le.int(int a, int b) -> (bool)
    • aten::le.int_float(int a, float b) -> (bool)
    • aten::len.t(t[] a) -> (int)
    • aten::lt.bool(bool a, bool b) -> (bool)
    • aten::lt.float(float a, float b) -> (bool)
    • aten::lt.float_int(float a, int b) -> (bool)
    • aten::lt.int(int a, int b) -> (bool)
    • aten::lt.int_float(int a, float b) -> (bool)
    • aten::mul.float(float a, float b) -> (float)
    • aten::mul.int(int a, int b) -> (int)
    • aten::ne.bool(bool a, bool b) -> (bool)
    • aten::ne.float(float a, float b) -> (bool)
    • aten::ne.float_int(float a, int b) -> (bool)
    • aten::ne.int(int a, int b) -> (bool)
    • aten::ne.int_float(int a, float b) -> (bool)
    • aten::neg.int(int a) -> (int)
    • aten::numel(Tensor self) -> int
    • aten::size(Tensor self) -> (int[])
    • aten::size.int(Tensor self, int dim) -> (int)
    • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
    • aten::sub.float(float a, float b) -> (float)
    • aten::sub.int(int a, int b) -> (int)
    • prim::max.bool(bool a, bool b) -> (bool)
    • prim::max.float(float a, float b) -> (bool)
    • prim::max.float_int(float a, int b) -> (bool)
    • prim::max.int(int a, int b) -> (bool)
    • prim::max.int_float(int a, float b) -> (bool)
    • prim::max.self_int(int[] self) -> (int)
    • prim::min.bool(bool a, bool b) -> (bool)
    • prim::min.float(float a, float b) -> (bool)
    • prim::min.float_int(float a, int b) -> (bool)
    • prim::min.int(int a, int b) -> (bool)
    • prim::min.int_float(int a, float b) -> (bool)
    • prim::min.self_int(int[] self) -> (int)
    • prim::shape(Tensor a) -> (int[])
    Source code(tar.gz)
    Source code(zip)
    libtrtorch-v0.2.0-cudnn8.0-tensorrt7.2-cuda11.0-libtorch-1.7.1.tar.gz(1.19 MB)
    libtrtorch-v0.2.0-pre-cxx11-abi-cudnn8.0-tensorrt7.2-cuda11.0-libtorch-1.7.1.tar.gz(1.19 MB)
    trtorch-0.2.0-cp36-cp36m-linux_x86_64.whl(5.69 MB)
    trtorch-0.2.0-cp37-cp37m-linux_x86_64.whl(5.69 MB)
    trtorch-0.2.0-cp38-cp38-linux_x86_64.whl(5.72 MB)
    trtorch-0.2.0-cp39-cp39-linux_x86_64.whl(5.64 MB)
  • v0.1.0(Oct 23, 2020)

    TRTorch v0.1.0

    Direct PyTorch integration via backend API, support for Ampere, support for simple branch and loop cases

    This is the first "beta" release of TRTorch, introducing direct integration into PyTorch via the new Backend API. This release also contains an NGC based Dockerfile for users looking to use TRTorch on Ampere, using NGC's patched version of PyTorch. Note that compiled programs from older versions of TRTorch are not compatible with the TRTorch 0.1.0 runtime due to an ABI change. There are now example Jupyter notebooks which demonstrate various features of the compiler included in the documentation.

    New Ops:

    • prelu
    • lstm_cell
    • power
    • conv3d
    • narrow

    Dependencies:

    • Bazel 3.4.1
    • Libtorch 1.6.0
    • CUDA 10.2 (by default, CUDA 11 supported with compatible PyTorch build)
    • cuDNN 7.6.5 (by default, cuDNN 8 supported with compatible PyTorch build)
    • TensorRT 7.0.0 (by default, TensorRT 7.1 supported with compatible PyTorch build)

    Changelog

    v0.1.0 (2020-10-23)

    Bug Fixes

    • added some fixes, trt/jit output still mismatches (723ac1d)

    • added test cases to explicitly check hidden/cell state outputs (d7c3164)

    • cleaned up logic, added case where bias doesn't exist for LSTM cell converter (a3e1093)

    • //core/conversion/evaluator: Custom to IValue that handles int[] (68c934a)

    • //docker: Workaround only shared libraries being available in (50c7eda)

    • //py: Fix long description section of setup.py (efd2099)

    • //tests: Add stride to complete tensors (af5d28e)

    • //tests/accuracy: Fix int8 accuracy test for new PTQ api (a53bea7)

    • //tests/core/converters/activations: Complete tensors in prelu test (0e90f78)

    • docsrc: Update docsrc container for bazel 3.4.1 (4eb53b5)

    • fix(Windows)!: Fix dependency resolution for local builds (858d8c3)

    • chore!: Update dependencies to PyTorch 1.6.0 (8eda27d)

    • chore!: Bumping version numbers to 0.1.0 (b84c90b)

    • refactor(//core)!: Introducing a binding convention that will address (5a105c6)

    • refactor!: Renaming extra info to compile spec to be more consistent (b8fa228)

    Features

    • //core/conversion/converters: LSTMCell converter (8c61248)
    • //core/conversion/var: created ITensorOrFreeze() method, to replace functionality of Var::ITensor() (2ccf8d0)
    • //core/converters: Add power layer conversion support and minor README edits (a801506)
    • //core/lowering: Add functionalization pass to replace implace (90a9ed6), closes #30
    • //docker: Adding CUDA11 based container for Ampere support (970d775)
    • started working on lstm_cell converter (546d790)
    • //py: Initial compiliant implementation of the to_backend api for (59113cf)
    • //third_party/tensorrt: Add back TensorRT static lib in a cross (d3c2e7e)
    • aten::prelu: Basic prelu support (8bc4369)
    • aten::prelu: Implement the multi-channel version of prelu and (c066581)
    • finished logic for LSTM cell, now to test (a88cfaf)

    BREAKING CHANGES

    • Users on Windows trying to use cuDNN 8 must manually configure third_party/cudnn/local/BUILD to use cuDNN 8.

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • Support for Python 3.5 is being dropped with this update

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • Version is being bumped to version 0.1.0a0 to target PyTorch 1.6.0

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This changes the "ABI" of compiled TRTorch programs and the runtime and breaks backwards compatability between the runtime in 0.1.0+ and programs compiled pre-0.1.0

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • This changes the top level api for setting the specification for compilation, a simple find and replace should allow users to port forward

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    Source code(tar.gz)
    Source code(zip)
    libtorch-v0.1.0-cudnn7.6-tensorrt7.0-cuda10.2-libtorch-1.6.0.tar.gz(996.40 KB)
    libtorch-v0.1.0-pre-cxx11-abi-cudnn7.6-tensorrt7.0-cuda10.2-libtorch-1.6.0.tar.gz(998.26 KB)
    trtorch-0.1.0-cp36-cp36m-linux_x86_64.whl(5.23 MB)
    trtorch-0.1.0-cp37-cp37m-linux_x86_64.whl(5.23 MB)
    trtorch-0.1.0-cp38-cp38-linux_x86_64.whl(5.24 MB)
  • v0.0.3(Jul 18, 2020)

    TRTorch v0.0.3

    aarch64 toolchain, Revised PTQ API, PyTorch 1.5.1, support for cuDNN 8.0, TensorRT 7.1 (with compatible PyTorch build)

    This is the thrid alpha release of TRTorch. It bumps the target PyTorch version to 1.5.1 and introduces support for cuDNN 8.0 and TensorRT 7.1, however this is only supported in cases where PyTorch has been compiled with the same cuDNN version. This release also introduces formal support for aarch64, however pre-compiled binaries will not be available until we can deliver python packages for aarch64 for all supported version of python. Note some idiosyncrasies when it comes to working with PyTorch on aarch64, if you are using PyTorch compiled by NVIDIA for aarch64 the ABI version is CXX11 instead of the pre CXX11 ABI found on PyTorch on x86_64. When compiling the Python API for TRTorch add the --use-cxx11-abi flag to the command and do not use the --config=pre-cxx11-abi flag when building the C++ library (more instructions on native aarch64 compilation in the documentation). This release also introduces a breaking change to the C++ API where now in order to use logging or ptq APIs a separate header file must be included. Look at the implementation of trtorchc or ptq for example usage.

    Dependencies:

    • Bazel 3.3.1
    • Libtorch 1.5.1
    • CUDA 10.2
    • cuDNN 7.6.5 (by default, cuDNN 8 supported with compatable PyTorch build)
    • TensorRT 7.0.0 (by default, TensorRT 7.1 supported with compatable PyTorch build)

    Changelog

    • feat!: Lock bazel version (25f4371)
    • refactor(//cpp/api)!: Refactoring ptq to use includes but seperate from (d2f8a59)

    Bug Fixes

    • //core: Do not compile hidden methods (6bd1a3f)
    • //core/conversion: Check for calibrator before setting int8 mode (3afd209)
    • //core/conversion: Supress unnecessary debug messages (2b23874)
    • //core/conversion/conversionctx: Check both tensor and eval maps (2d65ece)
    • //core/conversion/conversionctx: In the case of strict types and (3611778)
    • //core/conversion/converters: Fix plugin implementation for TRT 7 (94d6a0f)
    • //core/conversion/converters/impl: 1d case not working (f42562b)
    • //core/conversion/converters/impl: code works for interpolate2d/3d, doesn't work for 1d yet (e4cb117)
    • //core/conversion/converters/impl: Fix interpolate.cpp (b6942a2)
    • //core/conversion/converters/impl/element_wise: Fix broadcast (a9f33e4)
    • //core/conversion/evaluators: A couple fixes for evaluators (07ba980)
    • //core/lowering: Conv2D -> _convolution pass was triggering conv (ca2b5f9)
    • //cpp: Remove deprecated script namespace (d70760f)
    • //cpp/api: Better inital condition for the dataloader iterator to (8d22bdd)
    • //cpp/api: Remove unecessary destructor in ptq class (fc70267)
    • //cpp/api: set a default for calibrator (825be69)
    • //cpp/benchmark: reorder benchmark so FP16 bn issue in JIT doesnt (98527d2)
    • //cpp/ptq: Default version of the app should not resize images (de3cbc4)
    • //cpp/ptq: Enable FP16 kernels for INT8 applications (26709cc)
    • //cpp/ptq: Enable FP16 kernels for INT8 applications (e1c5416)
    • //cpp/ptq: remove some logging from ptq app (b989c7f)
    • //cpp/ptq: Tracing model in eval mode wrecks accuracy in Libtorch (54a24b3)
    • //cpp/trtorchc: Refactor trtorchc to use new C++ API (789e1be), closes #132
    • //cpp/trtorchc: Support building trtorchc with the pre_cxx11_abi (172d4d5)
    • //docs: add nojekyll file (2a02cd5)
    • //docs: fix version links (11555f7)
    • //notebooks: Fix WORKSPACE template file to reflect new build system layout (c8ea9b7)
    • //py: Build system issues (c1de126)
    • //py: Ignore generated version file (9e37dc1)
    • //py: Lib path incorrect (ff2b13c)
    • //tests: Duplicated tensorrt dep (5cd697e)
    • //third_party/tensorrt: Fix include dir for library headers (22ed5cf)
    • //third_party/tensorrt: Fix TensorRT paths for local x86 builds (73d804b)
    • aarch64: fixes and issues for aarch64 toolchain (9a6cccd)
    • aten::_convolution: out channels was passed in incorrectly for (ee727f8)
    • aten::_convolution: Pass dummy bias when there is no bias (b20671c)
    • aten::batch_norm: A new batch norm implementation that hopefully (6461872)
    • aten::batchnorm|aten::view: Fix converter implementation for (bf651dd)
    • aten::contiguous: Blacklist aten::contiguous from conversion (b718121)
    • aten::flatten: Fixes dynamic shape for flatten (4eb20bb)
    • fixed FP16 bug, fixed README, addressed some other PR comments (d9c0e84)
    • aten::neg: Fix a index bug in neg (1b2cde4)
    • aten::size, other aten evaluators: Removes aten::size converter in (c83447e)
    • BUILD: modified BUILD (a0d8586)
    • trying to resolve interpolate plugin problems (f0fefaa)
    • core/conversion/converters/impl: fix error message in interpolate (5ddab8b)
    • Address issues in PR (cd24f26)
    • bypass jeykll, also add PR template (a41c400)
    • first commit (4f1a9df)
    • Fix pre CXX11 ABI python builds and regen docs (42013ab)
    • fixed interpolate_plugin to handle dynamically sized inputs for adaptive_pool2d (7794c78)
    • need to fix gather converter (024a6b2)
    • plugin: trying to fix bug in plugin (cafcced)
    • pooling: fix the tests and the 1D pooling cases (a90e6db)
    • RunGraphEngineDynamic fixed to work with dynamically sized input tensors (6308190)

    Features

    • //:libtrtorch: Ship trtorchc with the tarball (d647447)
    • //core/compiler: Multiple outputs supported now via tuple (f9af574)
    • //core/conversion: Adds the ability to evaluate loops (dcb1474)
    • //core/conversion: Compiler can now create graphs (9d1946e)
    • //core/conversion: Evaluation of static conditionals works now (6421f3d)
    • //core/conversion/conversionctx: Make op precision available at (78a1c61)
    • //core/conversion/converters: Throw a warning if a converter is (6cce381)
    • //core/conversion/converters/impl: added support for aten::stack (415378e)
    • //core/conversion/converters/impl: added support for linear1d and bilinear2d ops (4416d1f)
    • //core/conversion/converters/impl: added support for trilinear3d op (bb46e70)
    • //core/conversion/converters/impl: all function schemas for upsample_nearest (1b50484)
    • //core/conversion/converters/impl: logic implemented (7f12160)
    • //core/conversion/converters/impl: Round out pooling (7dc4af4)
    • //core/conversion/converters/impl: select converter, which adds support for aten::select.int (5151c34)
    • //core/conversion/converters/impl/plugins: Created interpolate plugin, works for mode='linear' (205ab99)
    • //core/conversion/converters/impl/plugins: interpolate plugin compiles now. time to test it. (58dbaef)
    • //core/conversion/converters/impl/plugins: template for interpolate plugin (7c91dec)
    • //core/conversion/converters/impl/shuffle: Implement aten::resize (353f2d2)
    • //core/conversion/evaluators: A whole bunch of new evaluators (7466b8a)
    • //core/conversion/evaluators: adding support for common evaluation (d351717)
    • //core/conversion/evaluators: Adds new applicability filters for (2cc3226)
    • //core/conversion/evaluators: Allow ITensors to be wrapped in (619e345)
    • //core/execution: Type checking for the executor, now is the (2dd1ba3)
    • //core/lowering: Add tuple lowering pass to remove tuples if (ce6cf75)
    • //core/lowering: Adds peephole optimization pass (0014b84)
    • //core/lowering: Fuse aten::addmm branches into a single (68f0317)
    • //core/lowering: New freeze model pass and new exception (4acc3fd)
    • //core/lowering: Remove aten::contiguous (630b615)
    • //core/quantization: skeleton of INT8 PTQ calibrator (dd443a6)
    • //core/util: New logging level for Graph Dumping (90c44b9)
    • //cpp/api: Adding max batch size setting (1b25542)
    • //cpp/api: Functional Dataloader based PTQ (f022dfe)
    • //cpp/api: Remove the extra includes in the API header (2f86f84)
    • //cpp/benchmark: Increased workspace size for benchmark, may help (8171f79)
    • //cpp/ptq: Add a feature to the dataset to use less than the full (5f36f47)
    • //cpp/ptq: do real benchmarking in the PTQ app instead of rough (65e71c7)
    • //cpp/ptq/training: Training recipe for VGG16 Classifier on (676bf56)
    • //cpp/trtorchc: Adding a new CLI application for TRTorch which (4f349a1)
    • //cpp/trtorchexec: TRTorch exec now supports checking correctness (80808b7)
    • //lowering: centralize lowering and try to use PyTorch Conv2DBN folding (fad4a10)
    • //py: add the option to build python package with CXX11 abi (fdbd7d2)
    • //py: API now produces valid engines that are consumable by (72bc1f7)
    • //py: Inital introduction of the Python API (7088245)
    • //py: Manylinux container and build system for multiple python (639c2a3)
    • //py: register trtorch with torch op library to support (736e914)
    • //py: setup.py now searches for bazel executable (737fe5c)
    • //py: Working portable package (482ef2c)
    • added adaptive_avg_pool2d plugin, and added test for it (fa227b0)
    • //tests: New optional accuracy tests to check INT8 and FP16 (df74136)
    • //toolchains: Adding platform targets for supported platforms (7889ebd)
    • /cpp/api: Working INT8 Calibrator, also resolves #41 (5c0d737)
    • aten::add_t: aten::add_.t evaluator that adds lists together (c4c3ce1)
    • aten::avg_pool2d: Implement Average Pooling 2D (0c39519)
    • aten::cat: Implements aten::cat and completes support for SSD (c2d3a6e)
    • aten::conv_transpose: Add support for dilated and group (48b950a)
    • aten::dropout_: Remove inplace dropout (7aa57c3)
    • aten::flatten: Adds a converter for aten flatten since MM is the (d945eb9)
    • addressed some PR comments, refactored code (141763f)
    • aten::matmul|aten::addmm: Adds support for aten::matmul and (c5b6202)
    • aten::permute: Implement permute support (c7d6b49)
    • aten::size [static]: Implement a aten::size converter for static input size (0548540)
    • started to work on add_.t evaluator, doesn't work yet (f216d3f)
    • aten::to: Remove remaining typecast operators (should be a very (0f63ffa)
    • aten::view: Adds support for ATen view also fixes some tests (24b422e)
    • aten::zeros: Implement aten::zeros evaluator (670817c)
    • conv2d_to_convolution: A pass to map aten::conv2d to _convolution (2c5c0d5)
    • prim::NumToTensor: Implement evaluator for NumToTensor (60df888)
    • tests/util: added RunGraphEngineDynamic to handle dynamic input sized tensors (9458f21)
    • trt_util: from Naren, added unpadDims tool (164a1a6)
    • support for adaptive_avg_pool2d plugin (52be580)
    • Support non cxx11-abi builds for use in python api (83e0ed6)

    BREAKING CHANGES

    • Bazel version is now locked to Bazel 3.3.1 and will be bumped manually from now on. Builds will fail on all other versions since now bazel will check the version before it compiles.

    Documentation on how to install bazel is added as well to support aarch64 until bazel releases binaries for the platform (which is soon)

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    • To use ptq you now need to include trtorch/ptq.h in addition to trtorch/trtorch.h, similarly for logging commands you need to include trtorch/logging.h

    Signed-off-by: Naren Dasan [email protected] Signed-off-by: Naren Dasan [email protected]

    Source code(tar.gz)
    Source code(zip)
    libtrtorch-v0.0.3-cudnn7.6-tensorrt7.0-cuda10.2-libtorch1.5.1.tar.gz(1018.51 KB)
    libtrtorch-v0.0.3-pre-cxx11-abi-cudnn7.6-tensorrt7.0-cuda10.2-libtorch1.5.1.tar.gz(1015.86 KB)
    trtorch-0.0.3-cp35-cp35m-linux_x86_64.whl(2.78 MB)
    trtorch-0.0.3-cp36-cp36m-linux_x86_64.whl(2.78 MB)
    trtorch-0.0.3-cp37-cp37m-linux_x86_64.whl(2.78 MB)
    trtorch-0.0.3-cp38-cp38-linux_x86_64.whl(2.79 MB)
  • v0.0.2(May 17, 2020)

    TRTorch v0.0.2

    Python API & PyTorch 1.5.0 Support

    • This is a second alpha release of TRTorch. It bumps support for PyTorch to 1.5.0 and introduces a Python distribution for TRTorch.
    • Also now includes full documentation https://nvidia.github.io/TRTorch
    • Adds support for Post Training Quantization in C++

    Dependencies

    • Libtorch 1.5.0
    • CUDA 10.2
    • cuDNN 7.6.5
    • TensorRT 7.0.0

    Changelog

    Bug Fixes

    • //core/conversion: Check for calibrator before setting int8 mode (3afd209)
    • //core/conversion/conversionctx: Check both tensor and eval maps (2d65ece)
    • //core/conversion/converters/impl/element_wise: Fix broadcast (a9f33e4)
    • //cpp: Remove deprecated script namespace (d70760f)
    • //cpp/api: Better inital condition for the dataloader iterator to (8d22bdd)
    • //cpp/api: Remove unecessary destructor in ptq class (fc70267)
    • //cpp/api: set a default for calibrator (825be69)
    • //cpp/ptq: remove some logging from ptq app (b989c7f)
    • Address issues in PR (cd24f26)
    • //cpp/ptq: Tracing model in eval mode wrecks accuracy in Libtorch (54a24b3)
    • //docs: add nojekyll file (2a02cd5)
    • //docs: fix version links (11555f7)
    • //py: Build system issues (c1de126)
    • //py: Ignore generated version file (9e37dc1)
    • bypass jeykll, also add PR template (a41c400)

    Features

    • //core/conversion/conversionctx: Make op precision available at (78a1c61)
    • //core/conversion/converters/impl/shuffle: Implement aten::resize (353f2d2)
    • //core/execution: Type checking for the executor, now is the (2dd1ba3)
    • //core/lowering: New freeze model pass and new exception (4acc3fd)
    • //core/quantization: skeleton of INT8 PTQ calibrator (dd443a6)
    • //core/util: New logging level for Graph Dumping (90c44b9)
    • //cpp/api: Adding max batch size setting (1b25542)
    • //cpp/api: Functional Dataloader based PTQ (f022dfe)
    • //cpp/api: Remove the extra includes in the API header (2f86f84)
    • //cpp/ptq: Add a feature to the dataset to use less than the full (5f36f47)
    • //cpp/ptq/training: Training recipe for VGG16 Classifier on (676bf56)
    • //lowering: centralize lowering and try to use PyTorch Conv2DBN folding (fad4a10)
    • //py: API now produces valid engines that are consumable by (72bc1f7)
    • //py: Inital introduction of the Python API (7088245)
    • //py: Manylinux container and build system for multiple python (639c2a3)
    • //py: Working portable package (482ef2c)
    • //tests: New optional accuracy tests to check INT8 and FP16 (df74136)
    • //cpp/api: Working INT8 Calibrator, also resolves #41 (5c0d737)
    • aten::flatten: Adds a converter for aten flatten since MM is the (d945eb9)
    • aten::matmul|aten::addmm: Adds support for aten::matmul and (c5b6202)
    • Support non cxx11-abi builds for use in python api (83e0ed6)
    • aten::size [static]: Implement a aten::size converter for static input size (0548540)
    • conv2d_to_convolution: A pass to map aten::conv2d to _convolution (2c5c0d5)
    Source code(tar.gz)
    Source code(zip)
    libtrtorch-pre-cxx11-abi.tar.gz(344.94 KB)
    libtrtorch.tar.gz(347.85 KB)
    trtorch-0.0.2-cp35-cp35m-linux_x86_64.whl(2.64 MB)
    trtorch-0.0.2-cp36-cp36m-linux_x86_64.whl(2.64 MB)
    trtorch-0.0.2-cp37-cp37m-linux_x86_64.whl(2.64 MB)
    trtorch-0.0.2-cp38-cp38-linux_x86_64.whl(2.65 MB)
  • v0.0.1(Apr 8, 2020)

    TRTorch v0.0.1

    Initial Release

    • This is the initial alpha release of TRTorch. Supports basic compilation of TorchScript Modules, networks similar to ResNet50, Mobilenet, simple feed forward networks.
    • C++ Based API
      • Can save converted models to PLAN file for use in TensorRT Apps
      • Compile module and continue running with JIT interpreter accelerated by TensorRT
    • Supports FP32 and FP16 execution
    • Sample application to show how to use the compiler

    Dependencies

    • Libtorch 1.4.0
    • CUDA 10.1
    • cuDNN 7.6
    • TensorRT 6.0.1
    Source code(tar.gz)
    Source code(zip)
    libtrtorch.tar.gz(354.60 KB)
Real-time pose estimation accelerated with NVIDIA TensorRT

trt_pose Want to detect hand poses? Check out the new trt_pose_hand project for real-time hand pose and gesture recognition! trt_pose is aimed at enab

NVIDIA AI IOT 803 Jan 6, 2023
Example repository for custom C++/CUDA operators for TorchScript

Custom TorchScript Operators Example This repository contains examples for writing, compiling and using custom TorchScript operators. See here for the

null 106 Dec 14, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022
PyTorch ,ONNX and TensorRT implementation of YOLOv4

PyTorch ,ONNX and TensorRT implementation of YOLOv4

null 4.2k Jan 1, 2023
This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model inference.

PyTorch Infer Utils This package proposes simplified exporting pytorch models to ONNX and TensorRT, and also gives some base interface for model infer

Alex Gorodnitskiy 11 Mar 20, 2022
The modify PyTorch version of Siam-trackers which are speed-up by TensorRT.

SiamTracker-with-TensorRT The modify PyTorch version of Siam-trackers which are speed-up by TensorRT or ONNX. [Updating...] Examples demonstrating how

null 9 Dec 13, 2022
Using image super resolution models with vapoursynth and speeding them up with TensorRT

vs-RealEsrganAnime-tensorrt-docker Using image super resolution models with vapoursynth and speeding them up with TensorRT. Also a docker image since

null 4 Aug 23, 2022
Using VapourSynth with super resolution models and speeding them up with TensorRT.

VSGAN-tensorrt-docker Using image super resolution models with vapoursynth and speeding them up with TensorRT. Using NVIDIA/Torch-TensorRT combined wi

null 111 Jan 5, 2023
HashNeRF-pytorch - Pure PyTorch Implementation of NVIDIA paper on Instant Training of Neural Graphics primitives

HashNeRF-pytorch Instant-NGP recently introduced a Multi-resolution Hash Encodin

Yash Sanjay Bhalgat 616 Jan 6, 2023
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Dec 31, 2022
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.6k Jan 6, 2023
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It can use GPUs and perform efficient symbolic differentiation.

============================================================================================================ `MILA will stop developing Theano <https:

null 9.3k Feb 12, 2021
ThunderSVM: A Fast SVM Library on GPUs and CPUs

What's new We have recently released ThunderGBM, a fast GBDT and Random Forest library on GPUs. add scikit-learn interface, see here Overview The miss

Xtra Computing Group 1.4k Dec 22, 2022
ThunderGBM: Fast GBDTs and Random Forests on GPUs

Documentations | Installation | Parameters | Python (scikit-learn) interface What's new? ThunderGBM won 2019 Best Paper Award from IEEE Transactions o

Xtra Computing Group 647 Jan 4, 2023
🔮 Execution time predictions for deep neural network training iterations across different GPUs.

Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training Habitat is a tool that predicts a deep neural network's

Geoffrey Yu 44 Dec 27, 2022
Implementation for the paper 'YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs'

YOLO-ReT This is the original implementation of the paper: YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs. Prakhar Ganesh, Ya

null 69 Oct 19, 2022
tensorrt int8 量化yolov5 4.0 onnx模型

onnx模型转换为 int8 tensorrt引擎

null 123 Dec 28, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
EfficientNetv2 TensorRT int8

EfficientNetv2_TensorRT_int8 EfficientNetv2模型实现来自https://github.com/d-li14/efficientnetv2.pytorch 环境配置 ubuntu:18.04 cuda:11.0 cudnn:8.0 tensorrt:7

null 34 Apr 24, 2022