Skip to content

Commit

Permalink
Update README and versions for 20.03.1 release (#1559)
Browse files Browse the repository at this point in the history
* Fix links in documentation

* Update README and versions for 20.03.1 release

* Doc updates for V2 API

Co-authored-by: David Goodwin <[email protected]>
  • Loading branch information
dzier and David Goodwin authored May 27, 2020
1 parent 9699da8 commit 221ee61
Show file tree
Hide file tree
Showing 13 changed files with 131 additions and 137 deletions.
8 changes: 4 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -134,8 +134,8 @@ FROM ${TENSORFLOW_IMAGE} AS trtserver_tf
############################################################################
FROM ${BASE_IMAGE} AS trtserver_build

ARG TRTIS_VERSION=1.13.0dev
ARG TRTIS_CONTAINER_VERSION=20.05dev
ARG TRTIS_VERSION=1.13.0
ARG TRTIS_CONTAINER_VERSION=20.03.1

# libgoogle-glog0v5 is needed by caffe2 libraries.
# libcurl4-openSSL-dev is needed for GCS
Expand Down Expand Up @@ -319,8 +319,8 @@ ENTRYPOINT ["/opt/tritonserver/nvidia_entrypoint.sh"]
############################################################################
FROM ${BASE_IMAGE}

ARG TRTIS_VERSION=1.13.0dev
ARG TRTIS_CONTAINER_VERSION=20.05dev
ARG TRTIS_VERSION=1.13.0
ARG TRTIS_CONTAINER_VERSION=20.03.1

ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION}
ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION}
Expand Down
128 changes: 60 additions & 68 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,12 +35,6 @@ NVIDIA Triton Inference Server
the inference server in** `Roadmap
<https://github.com/NVIDIA/triton-inference-server#roadmap>`_.

**LATEST RELEASE: You are currently on the master branch which
tracks under-development progress towards the next release. The
latest release of the Triton Inference Server is 1.12.0 and
is available on branch** `r20.03
<https://github.com/NVIDIA/triton-inference-server/tree/r20.03>`_.

.. overview-begin-marker-do-not-remove
NVIDIA Triton Inference Server provides a cloud inferencing solution
Expand All @@ -49,44 +43,62 @@ via an HTTP or GRPC endpoint, allowing remote clients to request
inferencing for any model being managed by the server. For edge
deployments, Triton Server is also available as a shared library with
an API that allows the full functionality of the server to be included
directly in an application. Triton Server provides the following
features:
directly in an application.

What's New In 1.13.0
--------------------

* Updates for KFserving HTTP/REST and GRPC protocols and corresponding Python
and C++ client libraries. See Roadmap section for more information.

* Update GRPC version to 1.24.0.

* Several issues with S3 storage were resolved.

* Fix last_inferrence_timestamp value to correctly show the time when inference
last occurred for each model.

* The Caffe2 backend is deprecated. Support for Caffe2 models will be removed in
a future release.

Features
--------

* `Multiple framework support
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/model_repository.html#framework-model-definition>`_. The
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/model_repository.html#framework-model-definition>`_. The
server can manage any number and mix of models (limited by system
disk and memory resources). Supports TensorRT, TensorFlow GraphDef,
TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model
formats. Also supports TensorFlow-TensorRT and ONNX-TensorRT
integrated models. Variable-size input and output tensors are
allowed if supported by the framework. See `Capabilities
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/capabilities.html#capabilities>`_
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/capabilities.html#capabilities>`_
for detailed support information for each framework.

* `Concurrent model execution support
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/model_configuration.html#instance-groups>`_. Multiple
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/model_configuration.html#instance-groups>`_. Multiple
models (or multiple instances of the same model) can run
simultaneously on the same GPU.

* Batching support. For models that support batching, Triton Server
can accept requests for a batch of inputs and respond with the
corresponding batch of outputs. Triton Server also supports multiple
`scheduling and batching
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/model_configuration.html#scheduling-and-batching>`_
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/model_configuration.html#scheduling-and-batching>`_
algorithms that combine individual inference requests together to
improve inference throughput. These scheduling and batching
decisions are transparent to the client requesting inference.

* `Custom backend support
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/model_repository.html#custom-backends>`_. Triton
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/model_repository.html#custom-backends>`_. Triton
Server allows individual models to be implemented with custom
backends instead of by a deep-learning framework. With a custom
backend a model can implement any logic desired, while still
benefiting from the GPU support, concurrent execution, dynamic
batching and other features provided by the server.

* `Ensemble support
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/models_and_schedulers.html#ensemble-models>`_. An
ensemble represents a pipeline of one or more models and the
connection of input and output tensors between those models. A
single inference request to an ensemble will trigger the execution
Expand All @@ -96,37 +108,31 @@ features:
all system GPUs.

* Triton Server provides `multiple modes for model management
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/model_management.html>`_. These
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/model_management.html>`_. These
model management modes allow for both implicit and explicit loading
and unloading of models without requiring a server restart.

* `Model repositories
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/model_repository.html#>`_
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/model_repository.html#>`_
may reside on a locally accessible file system (e.g. NFS), in Google
Cloud Storage or in Amazon S3.

* Readiness and liveness `health endpoints
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/http_grpc_api.html#health>`_
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/http_grpc_api.html#health>`_
suitable for any orchestration or deployment framework, such as
Kubernetes.

* `Metrics
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/metrics.html>`_
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/metrics.html>`_
indicating GPU utilization, server throughput, and server latency.

* `C library inferface
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/library_api.html>`_
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/library_api.html>`_
allows the full functionality of Triton Server to be included
directly in an application.

.. overview-end-marker-do-not-remove
The current release of the Triton Inference Server is 1.12.0 and
corresponds to the 20.02 release of the tensorrtserver container on
`NVIDIA GPU Cloud (NGC) <https://ngc.nvidia.com>`_. The branch for
this release is `r20.03
<https://github.com/NVIDIA/triton-inference-server/tree/r20.03>`_.

Backwards Compatibility
-----------------------

Expand Down Expand Up @@ -182,36 +188,28 @@ already understood. The primary reasons for the name change are to :
frameworks and formats.

* Highlight that the server is aligning HTTP/REST and GRPC protocols
with a set of `KFServing community standard inference protocols
with a set of `KFServing standard inference protocols
<https://github.com/kubeflow/kfserving/tree/master/docs/predict-api/v2>`_
that have been proposed by the `KFServing project
<https://github.com/kubeflow/kfserving>`_.

Transitioning from the current protocols (version 1) to the new
protocols (version 2) will take place over several releases.

* **Current master**
* 20.03.1

* Alpha release of server support for KFServing community standard
GRPC and HTTP/REST inference protocol.
* Alpha release of Python client library that uses KFServing
community standard GRPC and HTTP/REST inference protocol.
* See `client documentation
<https://github.com/NVIDIA/triton-inference-server/tree/master/docs/client_experimental.rst>`_
for description and examples showing how to enable and use the new
GRPC and HTTP/REST inference protocol and Python client library.
* Existing HTTP/REST and GRPC protocols, and existing client APIs
continue to be supported and remain the default protocols.

* 20.05

* Beta release of KFServing community standard HTTP/REST and GRPC
inference protocol support in server, Python client, and C++
client.
* The Triton updates originally planned for 20.05 are now included
in the 20.03.1 release (Triton version 1.13.0).
* Beta release of KFServing HTTP/REST and GRPC inference protocol
support in server, Python client, and C++ client.
* Beta release of the `HTTP/REST and GRPC extensions
<https://github.com/NVIDIA/triton-inference-server/tree/master/docs/protocol>`_
to the KFServing inference protocol.
* Existing HTTP/REST and GRPC protocols are deprecated but remain
* See `client documentation
<https://github.com/NVIDIA/triton-inference-server/blob/r20.03.1/docs/client_experimental.rst>`_
for description and examples showing how to enable and use the new
client libraries.
* Existing V1 HTTP/REST and GRPC protocols are deprecated but remain
the default.
* Existing shared library inferface defined in trtserver.h continues
to be supported but is deprecated.
Expand All @@ -220,36 +218,30 @@ protocols (version 2) will take place over several releases.

* 20.06

* Triton Server version 2.0.0.
* KFserving community standard HTTP/REST and GRPC inference
protocols plus all Triton `extensions
<https://github.com/NVIDIA/triton-inference-server/tree/master/docs/protocol>`_
become the default and only supported protocols for the server.
* C++ and Python client libraries based on the KFServing standard
inference protocols become the default and only supported client
libraries.
* The new shared library interface defined in tritonserver.h becomes
the default and only supported shared library interface.
* Original C++ and Python client libraries are removed. Release
20.05 is the last release to support these libraries.
* Original shared library interface defined in trtserver.h is
removed. Release 20.05 is the last release to support the
trtserver.h shared library interface.
* Triton Server will release two containers, one for version 1.14.0
and one for version 2.0.0.
* The Triton 2.0.0 version will contain only the KFServing HTTP/REST
and GRPC inference protocols and the corresponding V2 Python and
C++ client libraries and examples.
* The Triton 2.0.0 version will support the shared library interface
defined in tritonserver.h.
* The 1.14.0 release will likely be the last release for Triton V1.
* The Triton 1.14.0 version will contain only the V1 HTTP/REST
and GRPC inference protocols and the corresponding V1 Python and
C++ client libraries and examples.
* The Triton 1.14.0 version will support the shared library interface
defined in tensorrtserver.h.

Throughout the transition the model repository struture and custom
backend APIs will remain unchanged so that any existing model
repository and custom backends will continue to work with Triton
Server.

In the 20.06 release there will be some minor changes to the
In the Triton 2.0.0 release there will be some minor changes to the
tritonserver command-line executable arguments. It will be necessary
to revisit and possible adjust invocations of tritonserver executable.

In the 20.06 release there will be some minor changes to the model
configuration schema. It is expected that these changes will not
impact the vast majority of model configurations. For impacted models
the model configuration will need minor edits to become compatible
with Triton Server version 2.0.0.
to revisit and possible adjust invocations of tritonserver
executable. Triton 1.14.0 command-line will remain unchanged from
earlier version 1 releases.

Documentation
-------------
Expand All @@ -266,7 +258,7 @@ and for `earlier releases
<https://docs.nvidia.com/deeplearning/sdk/inference-server-archived/index.html>`_.

An `FAQ
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-master-branch-guide/docs/faq.html>`_
<https://docs.nvidia.com/deeplearning/sdk/triton-inference-server-guide/docs/faq.html>`_
provides answers for frequently asked questions.

READMEs for deployment examples can be found in subdirectories of
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
1.13.0dev
1.13.0
6 changes: 3 additions & 3 deletions docs/build.rst
Original file line number Diff line number Diff line change
Expand Up @@ -368,7 +368,7 @@ Building A Custom Backend

The source repository contains several example custom backends in the
`src/custom directory
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/custom>`_.
<https://github.com/NVIDIA/triton-inference-server/blob/master-v1/src/custom>`_.
These custom backends are built using CMake::

$ mkdir builddir
Expand Down Expand Up @@ -426,11 +426,11 @@ Using the Custom Instance Wrapper Class
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The custom backend SDK provides a `CustomInstance Class
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/custom/sdk/custom_instance.h>`_.
<https://github.com/NVIDIA/triton-inference-server/blob/master-v1/src/custom/sdk/custom_instance.h>`_.
The CustomInstance class is a C++ wrapper class that abstracts away the
backend C-API for ease of use. All of the example custom backends in
`src/custom directory
<https://github.com/NVIDIA/triton-inference-server/blob/master/src/custom>`_
<https://github.com/NVIDIA/triton-inference-server/blob/master-v1/src/custom>`_
derive from the CustomInstance class and can be referenced for usage.

Building the Client Libraries and Examples
Expand Down
Loading

0 comments on commit 221ee61

Please sign in to comment.