Skip to content

Torch-TensorRT v1.2.0

Compare
Choose a tag to compare
@ncomly-nvidia ncomly-nvidia released this 14 Sep 03:48

PyTorch 1.12, Collections based I/O, FX Frontend, torchtrtc custom op support, CMake build system and Community Window Support

Torch-TensorRT 1.2.0 targets PyTorch 1.12, CUDA 11.6, cuDNN 8.4 and TensorRT 8.4. This release focuses on a couple key new APIs to handle function I/O that uses collection types which should enable whole new model classes to be compiled by Torch-TensorRT without source code modification. It also introduces the "FX Frontend", a new frontend for Torch-TensorRT which leverages FX, a high level IR built into PyTorch with extensive Python APIs. For uses cases which do not need to be run outside of Python this may be a strong option to try as it is easily extensible in a familar development enviornment. In Torch-TensorRT 1.2.0, the FX frontend should be considered beta level in stability. torchtrtc has received improvements which target the ability to handle operators outside of the core PyTorch op set. This includes custom operators from libraries such as torchvision and torchtext. Similarlly users can provide custom converters to torchtrtc to extend the compilers support from the command line instead of having to write an application to do so. Finally, Torch-TensorRT introduces community supported Windows and CMake support.

New Dependencies

nvidia-tensorrt

For previous versions of Torch-TensorRT, users had to install TensorRT via system package manager and modify their LD_LIBRARY_PATH in order to set up Torch-TensorRT. Now users should install the TensorRT Python API as part of the installation proceedure. This can be done via the following steps:

pip install nvidia-pyindex
pip install nvidia-tensorrt==8.4.3.1
pip install torch-tensorrt==1.2.0 -f https://github.com/pytorch/tensorrt/releases

Installing the TensorRT pip package will allow Torch-TensorRT to automatically load the TensorRT libraries without any modification to enviornment variables. It is also a necessary dependency for the FX Frontend.

torchvision

Some FX frontend converters are designed to target operators from 3rd party libraries like torchvision. As such, you must have torchvision installed in order to use them. However, this dependency is optional for cases where you do not need this support.

Jetson

Starting from this release we will be distributing precompiled binaries of our NGC release branches for aarch64 (as well as x86_64), starting with ngc/22.11. These releases are designed to be paired with NVIDIA distributed builds of PyTorch including the NGC containers and Jetson builds and are equivalent to the prepackaged distribution of Torch-TensorRT that comes in the containers. They represent the state of the master branch at the time of branch cutting so may lag in features by a month or so. These releases will come separately to minor version releases like this one. Therefore going forward, these NGC releases should be the primary release channel used on Jetson (including for building from source).

NOTE: NGC PyTorch builds are not identical to builds you might install through normal channels like pytorch.org. In the past this has caused issues in portability between pytorch.org builds and NGC builds. Therefore we strongly recommend in workflows such as exporting a TorchScript module on an x86 machine and then compiling on Jetson to ensure you are using the NGC container release on x86 for your host machine operations. More information about Jetson support can be found along side the 22.07 release (https://github.com/pytorch/TensorRT/releases/tag/v1.2.0a0.nv22.07)

Collections based I/O [Experimental]

Torch-TensorRT previously has operated under the assumption that nn.Module forward functions can trivially be reduced to the form forward([Tensor]) -> [Tensor]. Typically this implies functions fo the form forward(Tensor, Tensor, ... Tensor) -> (Tensor, Tensor, ..., Tensor). However as model complexity increases, grouping inputs may make it easier to manage many inputs. Therefore, function signatures similar to forward([Tensor], (Tensor, Tensor)) -> [Tensor] or forward((Tensor, Tensor)) -> (Tensor, (Tensor, Tensor)) might be more common. In Torch-TensorRT 1.2.0, more of these kinds of uses cases are supported using the new experimental input_signature compile spec API. This API allows users to group Input specs similar to how they might group the input Tensors they would use to call the original module's forward function. This informs Torch-TensorRT on how to map a Tensor input from its location in a group to the engine and from the engine into its grouping returned back to the user.

To make this concrete consider the following standard case:

class StandardTensorInput(nn.Module):
    def __init__(self):
        super(StandardTensorInput, self).__init__()

    def forward(self, x, y):
        r = x + y
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = StandardTensorInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    inputs=[
        torch_tensorrt.Input(x.shape),
        torch_tensorrt.Input(y.shape)
    ],
    min_block_size=1
)

out = trt_module(x,y)
print(out)

Here a user has defined two explicit tensor inputs and used the existing list based API to define the input specs.

With Torch-TensorRT the following use cases are now possible using the new input_signature API:

  • Tuple based input collection
class TupleInput(nn.Module):
    def __init__(self):
        super(TupleInput, self).__init__()

    def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
        r = z[0] + z[1]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = TupleInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=((x, y),), # Note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)
  • List based input collection
class ListInput(nn.Module):
    def __init__(self):
        super(ListInput, self).__init__()

    def forward(self, z: List[torch.Tensor]):
        r = z[0] + z[1]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = ListInput().eval().to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module([x,y])
print(out)

Note how the input specs (in this case just example tensors) are provided to the compiler. The input_signature argument expects a Tuple[Union[torch.Tensor, torch_tensorrt.Input, List, Tuple]] grouped in a format representative of how the function would be called. In these cases its just a list or tuple of specs.

More advanced cases are supported as we:

  • Tuple I/O
class TupleInputOutput(nn.Module):
    def __init__(self):
        super(TupleInputOutput, self).__init__()

    def forward(self, z: Tuple[torch.Tensor, torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = z[0] - z[1]
        r1 = r1 * 10
        r = (r1, r2)
        return r

x = torch.Tensor([1,2,3For previous versions of Torch-TensorRT, users had to install TensorRT via ]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = TupleInputOutput()

trt_module = torch_tensorrt.compile(
    module,
    input_signature=((x,y),), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)
  • List I/O
class ListInputOutput(nn.Module):
    def __init__(self):
        super(ListInputOutput, self).__init__()

    def forward(self, z: List[torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = z[0] - z[1]
        r = [r1, r2]
        return r

x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = ListInputOutput()

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module((x,y))
print(out)
  • Multple Groups of Mixed Types
class MultiGroupIO(nn.Module):
    def __init__(self):
        super(MultiGroupIO, self).__init__()

    def forward(self, z: List[torch.Tensor], a: Tuple[torch.Tensor, torch.Tensor]):
        r1 = z[0] + z[1]
        r2 = a[0] + a[1]
        r3 = r1 - r2
        r4 = [r1, r2]
        return (r3, r4)
    
x = torch.Tensor([1,2,3]).to("cuda")
y = torch.Tensor([4,5,6]).to("cuda")
module = MultiGroupIO().eval.to("cuda")

trt_module = torch_tensorrt.compile(
    module,
    input_signature=([x,y],(x,y)), # Again, note how inputs are grouped with the new API
    min_block_size=1
)

out = trt_module([x,y],(x,y))
print(out)   

These features are also supported in C++ as well:

torch::jit::Module mod;
try {
  // Deserialize the ScriptModule from a file using torch::jit::load().
  mod = torch::jit::load(path);
} catch (const c10::Error& e) {
  std::cerr << "error loading the model\n";
}
mod.eval();
mod.to(torch::kCUDA);

std::vector<torch::jit::IValue> inputs_;

for (auto in : inputs) {
  inputs_.push_back(torch::jit::IValue(in.clone()));
}

std::vector<torch::jit::IValue> complex_inputs;
auto input_list = c10::impl::GenericList(c10::TensorType::get());
input_list.push_back(inputs_[0]);
input_list.push_back(inputs_[0]);

torch::jit::IValue input_list_ivalue = torch::jit::IValue(input_list);

complex_inputs.push_back(input_list_ivalue);

auto input_shape = torch_tensorrt::Input(in0.sizes(), torch_tensorrt::DataType::kHalf);
auto input_shape_ivalue = torch::jit::IValue(std::move(c10::make_intrusive<torch_tensorrt::Input>(input_shape)));

c10::TypePtr elementType = input_shape_ivalue.type();
auto list = c10::impl::GenericList(elementType);
list.push_back(input_shape_ivalue);
list.push_back(input_shape_ivalue);

torch::jit::IValue complex_input_shape(list);
std::tuple<torch::jit::IValue> input_tuple2(complex_input_shape);
torch::jit::IValue complex_input_shape2(input_tuple2);

auto compile_settings = torch_tensorrt::ts::CompileSpec(complex_input_shape2);
compile_settings.min_block_size = 1;
compile_settings.enabled_precisions = {torch::kHalf};

// // Compile module
auto trt_mod = torch_tensorrt::ts::compile(mod, compile_settings);
auto trt_out = trt_mod.forward(complex_inputs);

Currently this feature should be considered experimental, APIs may be subject to change or folded into existing APIs. There are also limitations introduced by using this feature including the following:

  • Not all collection types are supported (e.g. Dict, namedtuple)
  • Not being able to require_full_compilation while using this feature
  • Certain operators are required to run in PyTorch throughout the graph which may impact performance
  • The maximum depth of collections nesting is limited.
    These limitations will be addressed in subsequent versions.

Adding FX frontend to Torch-TensorRT [Beta]

This release includes the FX as one of its supported IRs to convert torch models to TensorRT through the new FX frontend. At a high level, this path transforms the model into or consumes an FX graph and similar to the TorchScript frontend converts the graph to TensorRT through the use of a library of converters. The key difference is that it is implemented purely in Python. The role of this FX frontend is to supplement the TS lowering path and to provide users better ease of use and easier extensibility in use cases where removing Python as a dependency is not strictly necessary. Detailed user instructions can be find in the document.
The FX path examples are located under //examples/fx
The FX path unit tests are located under //py/torch_tensorrt/fx/tests

Custom operators and converters in Torch-TensorRT

While both the C++ API and Python API provide systems to include and convert custom operators in your model (for instance those implemented in torchvision) torchtrtc has been limited to the core opset. In Torch-TensorRT 1.2.0 two new flags have been added to torchtrtc.

    --custom-torch-ops                (repeatable) Shared object/DLL containing custom torch operators
    --custom-converters               (repeatable) Shared object/DLL containing custom converters

These arguments accept paths to .so or DLL files which define custom operators for PyTorch or custom converters for Torch-TensorRT. These files will get DL_OPEN'd at runtime to extend the op and converter libraries.

For example:

torchtrtc tests/modules/ssd_traced.jit.pt ssd_trt.ts --custom-torch-ops=<path to custom library .so file> --custom-converters=<path to custom library .so file> "[(1,3,300,300); (1,3,512,512); (1, 3, 1024, 1024)]@fp16%contiguous" -p f16

Community CMake and Windows support

Thanks to the great work of @gcuendet and others, CMake and consequentially Windows support has been added to the project! Users on Linux and Windows can now build the C++ API using this system and using torch_tensorrt_runtime.dll add support for executing Torch-TensorRT programs on Windows in both Python and C++. Detailed information on how to use this build system can be found here: https://pytorch.org/TensorRT/getting_started/installation.html

Bazel will continue to be the primary build system for the project and all testing and distributed builds will be built and run with Bazel (including future official Windows support) so users should consider this still the canonical version of Torch-TensorRT. However we aim to ensure as best as we can that the CMake system will be able to build the project properly including on Windows. Contributions to continue to grow the support for this build system and Windows as a platform are definitely welcomed.

Known Limitations

  • Collections I/O
    • Not all collection types are supported (e.g. Dict, namedtuple)
    • Not being able to require_full_compilation while using this feature
    • Certain operators are required to run in PyTorch throughout the graph which may impact performance
    • The maximum depth of collections nesting is limited.
  • FX
    • Some of FX operators have limited dynamic shape capability. Please check here.
    • Control flow in model could not be handled
  • Python API via the CMake build system.

Dependencies

- Bazel 5.2.0
- LibTorch 1.12.1
- CUDA 11.6 (on x86_64, by default, newer CUDA 11 supported with compatible PyTorch Build)
- cuDNN 8.4.1.50
- TensorRT 8.4.3.1

Operators Supported (TorchScript)

Operators Currently Supported Through Converters

  • aten::_convolution(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled, bool allow_tf32) -> (Tensor)
  • aten::_convolution.deprecated(Tensor input, Tensor weight, Tensor? bias, int[] stride, int[] padding, int[] dilation, bool transposed, int[] output_padding, int groups, bool benchmark, bool deterministic, bool cudnn_enabled) -> (Tensor)
  • aten::abs(Tensor self) -> (Tensor)
  • aten::acos(Tensor self) -> (Tensor)
  • aten::acosh(Tensor self) -> (Tensor)
  • aten::adaptive_avg_pool1d(Tensor self, int[1] output_size) -> (Tensor)
  • aten::adaptive_avg_pool2d(Tensor self, int[2] output_size) -> (Tensor)
  • aten::adaptive_avg_pool3d(Tensor self, int[3] output_size) -> (Tensor)
  • aten::adaptive_max_pool1d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
  • aten::adaptive_max_pool2d(Tensor self, int[2] output_size) -> (Tensor, Tensor)
  • aten::adaptive_max_pool3d(Tensor self, int[3] output_size) -> (Tensor, Tensor)
  • aten::add.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::add.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::add_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
  • aten::argmax(Tensor self, int dim, bool keepdim=False) -> (Tensor)
  • aten::argmin(Tensor self, int dim, bool keepdim=False) -> (Tensor)
  • aten::asin(Tensor self) -> (Tensor)
  • aten::asinh(Tensor self) -> (Tensor)
  • aten::atan(Tensor self) -> (Tensor)
  • aten::atanh(Tensor self) -> (Tensor)
  • aten::avg_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[0], bool ceil_mode=False, bool count_include_pad=True) -> (Tensor)
  • aten::avg_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::avg_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], bool ceil_mode=False, bool count_include_pad=True, int? divisor_override=None) -> (Tensor)
  • aten::batch_norm(Tensor input, Tensor? gamma, Tensor? beta, Tensor? mean, Tensor? var, bool training, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::bitwise_not(Tensor self) -> (Tensor)
  • aten::bmm(Tensor self, Tensor mat2) -> (Tensor)
  • aten::cat(Tensor[] tensors, int dim=0) -> (Tensor)
  • aten::ceil(Tensor self) -> (Tensor)
  • aten::clamp(Tensor self, Scalar? min=None, Scalar? max=None) -> (Tensor)
  • aten::clamp_max(Tensor self, Scalar max) -> (Tensor)
  • aten::clamp_min(Tensor self, Scalar min) -> (Tensor)
  • aten::constant_pad_nd(Tensor self, int[] pad, Scalar value=0) -> (Tensor)
  • aten::cos(Tensor self) -> (Tensor)
  • aten::cosh(Tensor self) -> (Tensor)
  • aten::cumsum(Tensor self, int dim, *, int? dtype=None) -> (Tensor)
  • aten::div.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::div.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::div.Tensor_mode(Tensor self, Tensor other, *, str? rounding_mode) -> (Tensor)
  • aten::div_.Scalar(Tensor(a!) self, Scalar other) -> (Tensor(a!))
  • aten::div_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
  • aten::elu(Tensor self, Scalar alpha=1, Scalar scale=1, Scalar input_scale=1) -> (Tensor)
  • aten::embedding(Tensor weight, Tensor indices, int padding_idx=-1, bool scale_grad_by_freq=False, bool sparse=False) -> (Tensor)
  • aten::eq.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::eq.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::erf(Tensor self) -> (Tensor)
  • aten::exp(Tensor self) -> (Tensor)
  • aten::expand(Tensor(a) self, int[] size, *, bool implicit=False) -> (Tensor(a))
  • aten::expand_as(Tensor(a) self, Tensor other) -> (Tensor(a))
  • aten::fake_quantize_per_channel_affine(Tensor self, Tensor scale, Tensor zero_point, int axis, int quant_min, int quant_max) -> (Tensor)
  • aten::fake_quantize_per_tensor_affine(Tensor self, float scale, int zero_point, int quant_min, int quant_max) -> (Tensor)
  • aten::flatten.using_ints(Tensor self, int start_dim=0, int end_dim=-1) -> (Tensor)
  • aten::floor(Tensor self) -> (Tensor)
  • aten::floor_divide(Tensor self, Tensor other) -> (Tensor)
  • aten::floor_divide.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ge.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ge.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::gru_cell(Tensor input, Tensor hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor)
  • aten::gt.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::gt.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::hardtanh(Tensor self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor)
  • aten::hardtanh_(Tensor(a!) self, Scalar min_val=-1, Scalar max_val=1) -> (Tensor(a!))
  • aten::index.Tensor(Tensor self, Tensor?[] indices) -> (Tensor)
  • aten::instance_norm(Tensor input, Tensor? weight, Tensor? bias, Tensor? running_mean, Tensor? running_var, bool use_input_stats, float momentum, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::layer_norm(Tensor input, int[] normalized_shape, Tensor? gamma, Tensor? beta, float eps, bool cudnn_enabled) -> (Tensor)
  • aten::le.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::le.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::leaky_relu(Tensor self, Scalar negative_slope=0.01) -> (Tensor)
  • aten::leaky_relu_(Tensor(a!) self, Scalar negative_slope=0.01) -> (Tensor(a!))
  • aten::linear(Tensor input, Tensor weight, Tensor? bias=None) -> (Tensor)
  • aten::log(Tensor self) -> (Tensor)
  • aten::lstm_cell(Tensor input, Tensor[] hx, Tensor w_ih, Tensor w_hh, Tensor? b_ih=None, Tensor? b_hh=None) -> (Tensor, Tensor)
  • aten::lt.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::lt.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::masked_fill.Scalar(Tensor self, Tensor mask, Scalar value) -> (Tensor)
  • aten::matmul(Tensor self, Tensor other) -> (Tensor)
  • aten::max(Tensor self) -> (Tensor)
  • aten::max.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
  • aten::max.other(Tensor self, Tensor other) -> (Tensor)
  • aten::max_pool1d(Tensor self, int[1] kernel_size, int[1] stride=[], int[1] padding=[], int[1] dilation=[], bool ceil_mode=False) -> (Tensor)
  • aten::max_pool2d(Tensor self, int[2] kernel_size, int[2] stride=[], int[2] padding=[0, 0], int[2] dilation=[1, 1], bool ceil_mode=False) -> (Tensor)
  • aten::max_pool3d(Tensor self, int[3] kernel_size, int[3] stride=[], int[3] padding=[], int[3] dilation=[], bool ceil_mode=False) -> (Tensor)
  • aten::mean(Tensor self, *, int? dtype=None) -> (Tensor)
  • aten::mean.dim(Tensor self, int[] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
  • aten::min(Tensor self) -> (Tensor)
  • aten::min.dim(Tensor self, int dim, bool keepdim=False) -> (Tensor values, Tensor indices)
  • aten::min.other(Tensor self, Tensor other) -> (Tensor)
  • aten::mul.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::mul.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::mul_.Tensor(Tensor(a!) self, Tensor other) -> (Tensor(a!))
  • aten::narrow(Tensor(a) self, int dim, int start, int length) -> (Tensor(a))
  • aten::narrow.Tensor(Tensor(a) self, int dim, Tensor start, int length) -> (Tensor(a))
  • aten::ne.Scalar(Tensor self, Scalar other) -> (Tensor)
  • aten::ne.Tensor(Tensor self, Tensor other) -> (Tensor)
  • aten::neg(Tensor self) -> (Tensor)
  • aten::norm.ScalarOpt_dim(Tensor self, Scalar? p, int[1] dim, bool keepdim=False) -> (Tensor)
  • aten::permute(Tensor(a) self, int[] dims) -> (Tensor(a))
  • aten::pixel_shuffle(Tensor self, int upscale_factor) -> (Tensor)
  • aten::pow.Tensor_Scalar(Tensor self, Scalar exponent) -> (Tensor)
  • aten::pow.Tensor_Tensor(Tensor self, Tensor exponent) -> (Tensor)
  • aten::prelu(Tensor self, Tensor weight) -> (Tensor)
  • aten::prod(Tensor self, *, int? dtype=None) -> (Tensor)
  • aten::prod.dim_int(Tensor self, int dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
  • aten::reciprocal(Tensor self) -> (Tensor)
  • aten::reflection_pad1d(Tensor self, int[2] padding) -> (Tensor)
  • aten::reflection_pad2d(Tensor self, int[4] padding) -> (Tensor)
  • aten::relu(Tensor input) -> (Tensor)
  • aten::relu_(Tensor(a!) self) -> (Tensor(a!))
  • aten::repeat(Tensor self, int[] repeats) -> (Tensor)
  • aten::repeat_interleave.self_int(Tensor self, int repeats, int? dim=None, *, int? output_size=None) -> (Tensor)
  • aten::replication_pad1d(Tensor self, int[2] padding) -> (Tensor)
  • aten::replication_pad2d(Tensor self, int[4] padding) -> (Tensor)
  • aten::replication_pad3d(Tensor self, int[6] padding) -> (Tensor)
  • aten::reshape(Tensor self, int[] shape) -> (Tensor)
  • aten::roll(Tensor self, int[1] shifts, int[1] dims=[]) -> (Tensor)
  • aten::rsub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::rsub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::scatter.src(Tensor self, int dim, Tensor index, Tensor src) -> (Tensor)
  • aten::scatter.value(Tensor self, int dim, Tensor index, Scalar value) -> (Tensor)
  • aten::select.int(Tensor(a) self, int dim, int index) -> (Tensor(a))
  • aten::sigmoid(Tensor input) -> (Tensor)
  • aten::sigmoid_(Tensor(a!) self) -> (Tensor(a!))
  • aten::sin(Tensor self) -> (Tensor)
  • aten::sinh(Tensor self) -> (Tensor)
  • aten::slice.Tensor(Tensor(a) self, int dim=0, int? start=None, int? end=None, int step=1) -> (Tensor(a))
  • aten::softmax.int(Tensor self, int dim, int? dtype=None) -> (Tensor)
  • aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[])
  • aten::split.Tensor(Tensor(a) self, int split_size, int dim=0) -> (Tensor[])
  • aten::split.sizes(Tensor(a -> *) self, int[] split_size, int dim=0) -> (Tensor[])
  • aten::split_with_sizes(Tensor(a) self, int[] split_sizes, int dim=0) -> (Tensor[])
  • aten::sqrt(Tensor self) -> (Tensor)
  • aten::square(Tensor self) -> (Tensor)
  • aten::squeeze.dim(Tensor(a) self, int dim) -> (Tensor(a))
  • aten::stack(Tensor[] tensors, int dim=0) -> (Tensor)
  • aten::sub.Scalar(Tensor self, Scalar other, Scalar alpha=1) -> (Tensor)
  • aten::sub.Tensor(Tensor self, Tensor other, Scalar alpha=1) -> (Tensor)
  • aten::sub_.Tensor(Tensor(a!) self, Tensor other, *, Scalar alpha=1) -> (Tensor(a!))
  • aten::sum(Tensor self, *, int? dtype=None) -> (Tensor)
  • aten::sum.dim_IntList(Tensor self, int[1] dim, bool keepdim=False, *, int? dtype=None) -> (Tensor)
  • aten::t(Tensor self) -> (Tensor)
  • aten::tan(Tensor self) -> (Tensor)
  • aten::tanh(Tensor input) -> (Tensor)
  • aten::tanh_(Tensor(a!) self) -> (Tensor(a!))
  • aten::to.device(Tensor(a) self, Device device, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor(a))
  • aten::to.dtype(Tensor self, int dtype, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
  • aten::to.other(Tensor self, Tensor other, bool non_blocking=False, bool copy=False, int? memory_format=None) -> (Tensor)
  • aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(a|b))
  • aten::topk(Tensor self, int k, int dim=-1, bool largest=True, bool sorted=True) -> (Tensor values, Tensor indices)
  • aten::transpose.int(Tensor(a) self, int dim0, int dim1) -> (Tensor(a))
  • aten::unbind.int(Tensor(a -> *) self, int dim=0) -> (Tensor[])
  • aten::unsqueeze(Tensor(a) self, int dim) -> (Tensor(a))
  • aten::upsample_bilinear2d(Tensor self, int[2] output_size, bool align_corners, float? scales_h=None, float? scales_w=None) -> (Tensor)
  • aten::upsample_bilinear2d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
  • aten::upsample_linear1d(Tensor self, int[1] output_size, bool align_corners, float? scales=None) -> (Tensor)
  • aten::upsample_linear1d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
  • aten::upsample_nearest1d(Tensor self, int[1] output_size, float? scales=None) -> (Tensor)
  • aten::upsample_nearest1d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
  • aten::upsample_nearest2d(Tensor self, int[2] output_size, float? scales_h=None, float? scales_w=None) -> (Tensor)
  • aten::upsample_nearest2d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
  • aten::upsample_nearest3d(Tensor self, int[3] output_size, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
  • aten::upsample_nearest3d.vec(Tensor input, int[]? output_size, float[]? scale_factors) -> (Tensor)
  • aten::upsample_trilinear3d(Tensor self, int[3] output_size, bool align_corners, float? scales_d=None, float? scales_h=None, float? scales_w=None) -> (Tensor)
  • aten::upsample_trilinear3d.vec(Tensor input, int[]? output_size, bool align_corners, float[]? scale_factors) -> (Tensor)
  • aten::view(Tensor(a) self, int[] size) -> (Tensor(a))
  • trt::const(Tensor self) -> (Tensor)

Operators Currently Supported Through Evaluators

  • aten::Bool.float(float b) -> (bool)
  • aten::Bool.int(int a) -> (bool)
  • aten::Float.Scalar(Scalar a) -> float
  • aten::Float.bool(bool a) -> float
  • aten::Float.int(int a) -> float
  • aten::Int.Scalar(Scalar a) -> int
  • aten::Int.bool(bool a) -> int
  • aten::Int.float(float a) -> int
  • aten::Int.int(int a) -> int
  • aten::and(int a, int b) -> (bool)
  • aten::and.bool(bool a, bool b) -> (bool)
  • aten::__derive_index(int idx, int start, int step) -> int
  • aten::getitem.t(t list, int idx) -> (t(*))
  • aten::is(t1 self, t2 obj) -> bool
  • aten::isnot(t1 self, t2 obj) -> bool
  • aten::not(bool self) -> bool
  • aten::or(int a, int b) -> (bool)
  • aten::__range_length(int lo, int hi, int step) -> int
  • aten::__round_to_zero_floordiv(int a, int b) -> (int)
  • aten::xor(int a, int b) -> (bool)
  • aten::add.float(float a, float b) -> (float)
  • aten::add.int(int a, int b) -> (int)
  • aten::add.str(str a, str b) -> (str)
  • aten::add_.t(t self, t[] b) -> (t[])
  • aten::append.t(t self, t(c -> *) el) -> (t)
  • aten::arange(Scalar end, *, int? dtype=None, int? layout=None,
    Device? device=None, bool? pin_memory=None) -> (Tensor)
  • aten::arange.start(Scalar start, Scalar end, *, ScalarType? dtype=None,
    Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
  • aten::arange.start_step(Scalar start, Scalar end, Scalar step, *, ScalarType? dtype=None,
    Layout? layout=None, Device? device=None, bool? pin_memory=None) -> (Tensor)
  • aten::clone(Tensor self, *, int? memory_format=None) -> (Tensor)
  • aten::copy_(Tensor(a!) self, Tensor src, bool non_blocking=False) -> (Tensor(a!))
  • aten::dim(Tensor self) -> int
  • aten::div.float(float a, float b) -> (float)
  • aten::div.int(int a, int b) -> (float)
  • aten::eq.bool(bool a, bool b) -> (bool)
  • aten::eq.float(float a, float b) -> (bool)
  • aten::eq.float_int(float a, int b) -> (bool)
  • aten::eq.int(int a, int b) -> (bool)
  • aten::eq.int_float(int a, float b) -> (bool)
  • aten::eq.str(str a, str b) -> (bool)
  • aten::extend.t(t self, t[] other) -> ()
  • aten::floor.float(float a) -> (int)
  • aten::floor.int(int a) -> (int)
  • aten::floordiv.float(float a, float b) -> (int)
  • aten::floordiv.int(int a, int b) -> (int)
  • aten::format(str self, ...) -> (str)
  • aten::ge.bool(bool a, bool b) -> (bool)
  • aten::ge.float(float a, float b) -> (bool)
  • aten::ge.float_int(float a, int b) -> (bool)
  • aten::ge.int(int a, int b) -> (bool)
  • aten::ge.int_float(int a, float b) -> (bool)
  • aten::gt.bool(bool a, bool b) -> (bool)
  • aten::gt.float(float a, float b) -> (bool)
  • aten::gt.float_int(float a, int b) -> (bool)
  • aten::gt.int(int a, int b) -> (bool)
  • aten::gt.int_float(int a, float b) -> (bool)
  • aten::is_floating_point(Tensor self) -> (bool)
  • aten::le.bool(bool a, bool b) -> (bool)
  • aten::le.float(float a, float b) -> (bool)
  • aten::le.float_int(float a, int b) -> (bool)
  • aten::le.int(int a, int b) -> (bool)
  • aten::le.int_float(int a, float b) -> (bool)
  • aten::len.t(t[] a) -> (int)
  • aten::lt.bool(bool a, bool b) -> (bool)
  • aten::lt.float(float a, float b) -> (bool)
  • aten::lt.float_int(float a, int b) -> (bool)
  • aten::lt.int(int a, int b) -> (bool)
  • aten::lt.int_float(int a, float b) -> (bool)
  • aten::mul.float(float a, float b) -> (float)
  • aten::mul.int(int a, int b) -> (int)
  • aten::ne.bool(bool a, bool b) -> (bool)
  • aten::ne.float(float a, float b) -> (bool)
  • aten::ne.float_int(float a, int b) -> (bool)
  • aten::ne.int(int a, int b) -> (bool)
  • aten::ne.int_float(int a, float b) -> (bool)
  • aten::neg.int(int a) -> (int)
  • aten::numel(Tensor self) -> int
  • aten::pow.float(float a, float b) -> (float)
  • aten::pow.float_int(float a, int b) -> (float)
  • aten::pow.int(int a, int b) -> (float)
  • aten::pow.int_float(int a, float b) -> (float)
  • aten::size(Tensor self) -> (int[])
  • aten::size.int(Tensor self, int dim) -> (int)
  • aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[])
  • aten::sqrt.float(float a) -> (float)
  • aten::sqrt.int(int a) -> (float)
  • aten::sub.float(float a, float b) -> (float)
  • aten::sub.int(int a, int b) -> (int)
  • aten::tensor(t[] data, *, int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor)
  • prim::TupleIndex(Any tup, int i) -> (Any)
  • prim::dtype(Tensor a) -> (int)
  • prim::max.bool(bool a, bool b) -> (bool)
  • prim::max.float(float a, float b) -> (bool)
  • prim::max.float_int(float a, int b) -> (bool)
  • prim::max.int(int a, int b) -> (bool)
  • prim::max.int_float(int a, float b) -> (bool)
  • prim::max.self_int(int[] self) -> (int)
  • prim::min.bool(bool a, bool b) -> (bool)
  • prim::min.float(float a, float b) -> (bool)
  • prim::min.float_int(float a, int b) -> (bool)
  • prim::min.int(int a, int b) -> (bool)
  • prim::min.int_float(int a, float b) -> (bool)
  • prim::min.self_int(int[] self) -> (int)
  • prim::shape(Tensor a) -> (int[])

What's Changed

New Contributors

Full Changelog: v1.1.0...v1.2.0