From a1f3860ba65c0fd8f2be3adfcab2673efd039348 Mon Sep 17 00:00:00 2001 From: dzier Date: Mon, 16 Dec 2019 10:19:10 -0800 Subject: [PATCH] Update README and versions for 19.12 release --- Dockerfile | 8 +- README.rst | 238 ++++++++++++++++++++++++++++++++++++++++++++++++++++- VERSION | 2 +- 3 files changed, 239 insertions(+), 9 deletions(-) diff --git a/Dockerfile b/Dockerfile index 384019dc89..a834064df0 100644 --- a/Dockerfile +++ b/Dockerfile @@ -192,8 +192,8 @@ RUN python3 /workspace/onnxruntime/tools/ci_build/build.py --build_dir /workspac ############################################################################ FROM ${BASE_IMAGE} AS trtserver_build -ARG TRTIS_VERSION=1.9.0dev -ARG TRTIS_CONTAINER_VERSION=19.12dev +ARG TRTIS_VERSION=1.9.0 +ARG TRTIS_CONTAINER_VERSION=19.12 # libgoogle-glog0v5 is needed by caffe2 libraries. # libcurl4-openSSL-dev is needed for GCS @@ -348,8 +348,8 @@ ENTRYPOINT ["/opt/tensorrtserver/nvidia_entrypoint.sh"] ############################################################################ FROM ${BASE_IMAGE} -ARG TRTIS_VERSION=1.9.0dev -ARG TRTIS_CONTAINER_VERSION=19.12dev +ARG TRTIS_VERSION=1.9.0 +ARG TRTIS_CONTAINER_VERSION=19.12 ENV TENSORRT_SERVER_VERSION ${TRTIS_VERSION} ENV NVIDIA_TENSORRT_SERVER_VERSION ${TRTIS_CONTAINER_VERSION} diff --git a/README.rst b/README.rst index 301a65237d..9825b7bf2d 100644 --- a/README.rst +++ b/README.rst @@ -30,13 +30,243 @@ NVIDIA TensorRT Inference Server ================================ - **NOTE: You are currently on the r19.12 branch which tracks - stabilization towards the next release. This branch is not usable - during stabilization.** - .. overview-begin-marker-do-not-remove +The NVIDIA TensorRT Inference Server provides a cloud inferencing +solution optimized for NVIDIA GPUs. The server provides an inference +service via an HTTP or GRPC endpoint, allowing remote clients to +request inferencing for any model being managed by the server. + +What's New in 1.9.0 +------------------- +* The model configuration now includes a model warmup option. This option + provides the ability to tune and optimize the model before inference requests + are received, avoiding initial inference delays. This option is especially + useful for frameworks like TensorFlow that perform network optimization in + response to the initial inference requests. Models can be warmed-up with one + or more synthetic or realistic workloads before they become ready in the + server. + +* An enhanced sequence batcher now has multiple scheduling strategies. A new + Oldest strategy integrates with the dynamic batcher to enable improved + inference performance for models that don’t require all inference requests + in a sequence to be routed to the same batch slot. + +* The perf_client now has an option to generate requests using a realistic + poisson distribution or a user provided distribution. + +* A new repository API (available in the shared library API, HTTP, and GRPC) + returns an index of all models available in the model repositories) visible + to the server. This index can be used to see what models are available for + loading onto the server. + +* The server status returned by the server status API now includes the + timestamp of the last inference request received for each model. + +* Inference server tracing capabilities are now documented in the `Optimization + `_ + section of the User Guide. Tracing support is enhanced to provide trace for + ensembles and the contained models. + +* A community contributed Dockerfile is now available to build the TensorRT + Inference Server clients on CentOS. + +Features +-------- + +* `Multiple framework support + `_. The + server can manage any number and mix of models (limited by system + disk and memory resources). Supports TensorRT, TensorFlow GraphDef, + TensorFlow SavedModel, ONNX, PyTorch, and Caffe2 NetDef model + formats. Also supports TensorFlow-TensorRT integrated + models. Variable-size input and output tensors are allowed if + supported by the framework. See `Capabilities + `_ + for detailed support information for each framework. + +* `Concurrent model execution support + `_. Multiple + models (or multiple instances of the same model) can run + simultaneously on the same GPU. + +* Batching support. For models that support batching, the server can + accept requests for a batch of inputs and respond with the + corresponding batch of outputs. The inference server also supports + multiple `scheduling and batching + `_ + algorithms that combine individual inference requests together to + improve inference throughput. These scheduling and batching + decisions are transparent to the client requesting inference. + +* `Custom backend support + `_. The inference server + allows individual models to be implemented with custom backends + instead of by a deep-learning framework. With a custom backend a + model can implement any logic desired, while still benefiting from + the GPU support, concurrent execution, dynamic batching and other + features provided by the server. + +* `Ensemble support + `_. An + ensemble represents a pipeline of one or more models and the + connection of input and output tensors between those models. A + single inference request to an ensemble will trigger the execution + of the entire pipeline. + +* Multi-GPU support. The server can distribute inferencing across all + system GPUs. + +* The inference server provides `multiple modes for model management + `_. These + model management modes allow for both implicit and explicit loading + and unloading of models without requiring a server restart. + +* `Model repositories + `_ + may reside on a locally accessible file system (e.g. NFS), in Google + Cloud Storage or in Amazon S3. + +* Readiness and liveness `health endpoints + `_ + suitable for any orchestration or deployment framework, such as + Kubernetes. + +* `Metrics + `_ + indicating GPU utilization, server throughput, and server latency. + +* `C library inferface + `_ + allows the full functionality of the inference server to be included + directly in an application. + .. overview-end-marker-do-not-remove +The current release of the TensorRT Inference Server is 1.9.0 and +corresponds to the 19.12 release of the tensorrtserver container on +`NVIDIA GPU Cloud (NGC) `_. The branch for +this release is `r19.12 +`_. + +Backwards Compatibility +----------------------- + +Continuing in the latest version the following interfaces maintain +backwards compatibilty with the 1.0.0 release. If you have model +configuration files, custom backends, or clients that use the +inference server HTTP or GRPC APIs (either directly or through the +client libraries) from releases prior to 1.0.0 you should edit +and rebuild those as necessary to match the version 1.0.0 APIs. + +The following inferfaces will maintain backwards compatibility for all +future 1.x.y releases (see below for exceptions): + +* Model configuration as defined in `model_config.proto + `_. + +* The inference server HTTP and GRPC APIs as defined in `api.proto + `_ + and `grpc_service.proto + `_, + except as noted below. + +* The V1 custom backend interface as defined in `custom.h + `_. + +As new features are introduced they may temporarily have beta status +where they are subject to change in non-backwards-compatible +ways. When they exit beta they will conform to the +backwards-compatibility guarantees described above. Currently the +following features are in beta: + +* The inference server library API as defined in `trtserver.h + `_ + is currently in beta and may undergo non-backwards-compatible + changes. + +* The inference server HTTP and GRPC APIs related to system and CUDA + shared memory are currently in beta and may undergo + non-backwards-compatible changes. + +* The V2 custom backend interface as defined in `custom.h + `_ + is currently in beta and may undergo non-backwards-compatible + changes. + +* The C++ and Python client libraries are not stictly included in the + inference server compatibility guarantees and so should be + considered as beta status. + +Documentation +------------- + +The User Guide, Developer Guide, and API Reference `documentation for +the current release +`_ +provide guidance on installing, building, and running the TensorRT +Inference Server. + +You can also view the `documentation for the master branch +`_ +and for `earlier releases +`_. + +An `FAQ +`_ +provides answers for frequently asked questions. + +READMEs for deployment examples can be found in subdirectories of +deploy/, for example, `deploy/single_server/README.rst +`_. + +The `Release Notes +`_ +and `Support Matrix +`_ +indicate the required versions of the NVIDIA Driver and CUDA, and also +describe which GPUs are supported by the inference server. + +Other Documentation +^^^^^^^^^^^^^^^^^^^ + +* `Maximizing Utilization for Data Center Inference with TensorRT + Inference Server + `_. + +* `NVIDIA TensorRT Inference Server Boosts Deep Learning Inference + `_. + +* `GPU-Accelerated Inference for Kubernetes with the NVIDIA TensorRT + Inference Server and Kubeflow + `_. + +Contributing +------------ + +Contributions to TensorRT Inference Server are more than welcome. To +contribute make a pull request and follow the guidelines outlined in +the `Contributing `_ document. + +Reporting problems, asking questions +------------------------------------ + +We appreciate any feedback, questions or bug reporting regarding this +project. When help with code is needed, follow the process outlined in +the Stack Overflow (https://stackoverflow.com/help/mcve) +document. Ensure posted examples are: + +* minimal – use as little code as possible that still produces the + same problem + +* complete – provide all parts needed to reproduce the problem. Check + if you can strip external dependency and still show the problem. The + less time we spend on reproducing problems the more time we have to + fix it + +* verifiable – test the code you're about to provide to make sure it + reproduces the problem. Remove all other problems that are not + related to your request/question. + .. |License| image:: https://img.shields.io/badge/License-BSD3-lightgrey.svg :target: https://opensource.org/licenses/BSD-3-Clause diff --git a/VERSION b/VERSION index 31662c9819..f8e233b273 100644 --- a/VERSION +++ b/VERSION @@ -1 +1 @@ -1.9.0dev +1.9.0