diff --git a/Dockerfile b/Dockerfile index 6646a138ac..9322b8961e 100644 --- a/Dockerfile +++ b/Dockerfile @@ -146,8 +146,8 @@ FROM ${TENSORFLOW2_IMAGE} AS tritonserver_tf2 ############################################################################ FROM ${BASE_IMAGE} AS tritonserver_build -ARG TRITON_VERSION=2.2.0dev -ARG TRITON_CONTAINER_VERSION=20.08dev +ARG TRITON_VERSION=2.2.0 +ARG TRITON_CONTAINER_VERSION=20.08 # libgoogle-glog0v5 is needed by caffe2 libraries. # libcurl4-openSSL-dev is needed for GCS @@ -366,8 +366,8 @@ ENTRYPOINT ["/opt/tritonserver/nvidia_entrypoint.sh"] ############################################################################ FROM ${BASE_IMAGE} -ARG TRITON_VERSION=2.2.0dev -ARG TRITON_CONTAINER_VERSION=20.08dev +ARG TRITON_VERSION=2.2.0 +ARG TRITON_CONTAINER_VERSION=20.08 ENV TRITON_SERVER_VERSION ${TRITON_VERSION} ENV NVIDIA_TRITON_SERVER_VERSION ${TRITON_CONTAINER_VERSION} @@ -377,7 +377,7 @@ LABEL com.nvidia.tritonserver.version="${TRITON_SERVER_VERSION}" ENV PATH /opt/tritonserver/bin:${PATH} -# Need to include pytorch in LD_LIBRARY_PATH since Torchvision loads custom +# Need to include pytorch in LD_LIBRARY_PATH since Torchvision loads custom # ops from that path ENV LD_LIBRARY_PATH /opt/tritonserver/lib/pytorch/:$LD_LIBRARY_PATH diff --git a/README.rst b/README.rst index bbfb851c0b..3473fd7b67 100644 --- a/README.rst +++ b/README.rst @@ -30,13 +30,251 @@ NVIDIA Triton Inference Server ============================== - **NOTE: You are currently on the r20.08 branch which tracks - stabilization towards the next release. This branch is not usable - during stabilization.** - .. overview-begin-marker-do-not-remove +NVIDIA Triton Inference Server provides a cloud inferencing solution +optimized for NVIDIA GPUs. The server provides an inference service +via an HTTP/REST or GRPC endpoint, allowing remote clients to request +inferencing for any model being managed by the server. For edge +deployments, Triton Server is also available as a shared library with +an API that allows the full functionality of the server to be included +directly in an application. + +What's New in 2.2.0 +------------------- + +* TensorFlow 2.x is now supported in addition to TensorFlow 1.x. See + the Frameworks Support Matrix for the supported TensorFlow + versions. The version of TensorFlow used can be selected when + launching Triton with the + --backend-config=tensorflow,version= flag. Set to + 1 or 2 to select TensorFlow1 or TensorFlow2 respectively. By default + TensorFlow 1 is used. + +* Add inference request timeout option to Python and C++ client + libraries. + +* GRPC inference protocol updated to fix performance regression. + +* Explicit major/minor versioning added to TRITONSERVER and + TRITONBACKED APIs. + +* New CMake option TRITON_CLIENT_SKIP_EXAMPLES to disable building the + client examples. + +Features +-------- + +* `Multiple framework support + `_. The + server can manage any number and mix of models (limited by system + disk and memory resources). Supports TensorRT, TensorFlow GraphDef, + TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model + formats. Both TensorFlow 1.x and TensorFlow 2.x are supported. Also + supports TensorFlow-TensorRT and ONNX-TensorRT integrated + models. Variable-size input and output tensors are allowed if + supported by the framework. See `Capabilities + `_ + for detailed support information for each framework. + +* `Concurrent model execution support + `_. Multiple + models (or multiple instances of the same model) can run + simultaneously on the same GPU. + +* Batching support. For models that support batching, Triton Server + can accept requests for a batch of inputs and respond with the + corresponding batch of outputs. Triton Server also supports multiple + `scheduling and batching + `_ + algorithms that combine individual inference requests together to + improve inference throughput. These scheduling and batching + decisions are transparent to the client requesting inference. + +* `Custom backend support + `_. Triton + Server allows individual models to be implemented with custom + backends instead of by a deep-learning framework. With a custom + backend a model can implement any logic desired, while still + benefiting from the GPU support, concurrent execution, dynamic + batching and other features provided by the server. + +* `Ensemble support + `_. An + ensemble represents a pipeline of one or more models and the + connection of input and output tensors between those models. A + single inference request to an ensemble will trigger the execution + of the entire pipeline. + +* Multi-GPU support. Triton Server can distribute inferencing across + all system GPUs. + +* Triton Server provides `multiple modes for model management + `_. These + model management modes allow for both implicit and explicit loading + and unloading of models without requiring a server restart. + +* `Model repositories + `_ + may reside on a locally accessible file system (e.g. NFS), in Google + Cloud Storage or in Amazon S3. + +* HTTP/REST and GRPC `inference protocols + `_ + based on the community developed `KFServing protocol + `_. + +* Readiness and liveness `health endpoints + `_ + suitable for any orchestration or deployment framework, such as + Kubernetes. + +* `Metrics + `_ + indicating GPU utilization, server throughput, and server latency. + +* `C library inferface + `_ + allows the full functionality of Triton Server to be included + directly in an application. + .. overview-end-marker-do-not-remove +The current release of the Triton Inference Server is 2.2.0 and +corresponds to the 20.08 release of the tensorrtserver container on +`NVIDIA GPU Cloud (NGC) `_. The branch for +this release is `r20.08 +`_. + +Backwards Compatibility +----------------------- + +Version 2 of Triton is beta quality, so you should expect some changes +to the server and client protocols and APIs. Version 2 of Triton does +not generally maintain backwards compatibility with version 1. +Specifically, you should take the following items into account when +transitioning from version 1 to version 2: + +* The Triton executables and libraries are in /opt/tritonserver. The + Triton executable is /opt/tritonserver/bin/tritonserver. + +* Some *tritonserver* command-line arguments are removed, changed or + have different default behavior in version 2. + + * --api-version, --http-health-port, --grpc-infer-thread-count, + --grpc-stream-infer-thread-count,--allow-poll-model-repository, --allow-model-control + and --tf-add-vgpu are removed. + + * The default for --model-control-mode is changed to *none*. + + * --tf-allow-soft-placement and --tf-gpu-memory-fraction are renamed + to --backend-config="tensorflow,allow-soft-placement=" + and --backend-config="tensorflow,gpu-memory-fraction=". + +* The HTTP/REST and GRPC protocols, while conceptually similar to + version 1, are completely changed in version 2. See the `inference + protocols + `_ + section of the documentation for more information. + +* Python and C++ client libraries are re-implemented to match the new + HTTP/REST and GRPC protocols. The Python client no longer depends on + a C++ shared library and so should be usable on any platform that + supports Python. See the `client libraries + `_ + section of the documentaion for more information. + +* The version 2 cmake build requires these changes: + + * The cmake flag names have changed from having a TRTIS prefix to + having a TRITON prefix. For example, TRITON_ENABLE_TENSORRT. + + * The build targets are *server*, *client* and *custom-backend* to + build the server, client libraries and examples, and custom + backend SDK, respectively. + +* In the Docker containers the environment variables indicating the + Triton version have changed to have a TRITON prefix, for example, + TRITON_SERVER_VERSION. + +Documentation +------------- + +The User Guide, Developer Guide, and API Reference `documentation for +the current release +`_ +provide guidance on installing, building, and running Triton Inference +Server. + +You can also view the `documentation for the master branch +`_ +and for `earlier releases +`_. + +NVIDIA publishes a number of `deep learning examples that use Triton +`_. + +An `FAQ +`_ +provides answers for frequently asked questions. + +READMEs for deployment examples can be found in subdirectories of +deploy/, for example, `deploy/single_server/README.rst +`_. + +The `Release Notes +`_ +and `Support Matrix +`_ +indicate the required versions of the NVIDIA Driver and CUDA, and also +describe which GPUs are supported by Triton Server. + +Presentations and Papers +^^^^^^^^^^^^^^^^^^^^^^^^ + +* `High-Performance Inferencing at Scale Using the TensorRT Inference Server `_. + +* `Accelerate and Autoscale Deep Learning Inference on GPUs with KFServing `_. + +* `Deep into Triton Inference Server: BERT Practical Deployment on NVIDIA GPU `_. + +* `Maximizing Utilization for Data Center Inference with TensorRT + Inference Server + `_. + +* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference + `_. + +* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT + Inference Server and Kubeflow + `_. + +Contributing +------------ + +Contributions to Triton Inference Server are more than welcome. To +contribute make a pull request and follow the guidelines outlined in +the `Contributing `_ document. + +Reporting problems, asking questions +------------------------------------ + +We appreciate any feedback, questions or bug reporting regarding this +project. When help with code is needed, follow the process outlined in +the Stack Overflow (https://stackoverflow.com/help/mcve) +document. Ensure posted examples are: + +* minimal – use as little code as possible that still produces the + same problem + +* complete – provide all parts needed to reproduce the problem. Check + if you can strip external dependency and still show the problem. The + less time we spend on reproducing problems the more time we have to + fix it + +* verifiable – test the code you're about to provide to make sure it + reproduces the problem. Remove all other problems that are not + related to your request/question. + .. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg :target: https://opensource.org/licenses/BSD-3-Clause diff --git a/VERSION b/VERSION index 1650a54a93..ccbccc3dc6 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -2.2.0dev +2.2.0