Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Metrics Support in tritonfrontend #7703

Open
wants to merge 48 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 44 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
59d9aa8
conditional http/grpc endpoints
KrishnanPrash Sep 6, 2024
0b66a27
Adding tritonfrontend.whl to build.py instructions
KrishnanPrash Sep 6, 2024
173216e
Setting enable_tracing flag to 0
KrishnanPrash Sep 27, 2024
c9fa783
testing conditional appending of tracing lib
KrishnanPrash Sep 27, 2024
384da14
Merge remote-tracking branch 'origin/main' into kprashanth-conditiona…
KrishnanPrash Oct 2, 2024
40da2be
linker error
KrishnanPrash Oct 3, 2024
f08de59
Working conditional builds w Tracing=ON
KrishnanPrash Oct 4, 2024
f3fc425
Adding comments
KrishnanPrash Oct 4, 2024
a657cae
Making top-level imports more specific
KrishnanPrash Oct 4, 2024
be36c42
Fixing imports
KrishnanPrash Oct 4, 2024
80ee6e5
Catching specfic error
KrishnanPrash Oct 4, 2024
723df84
Merge remote-tracking branch 'origin/main' into kprashanth-conditiona…
KrishnanPrash Oct 9, 2024
2e2108c
Metrics Support
KrishnanPrash Oct 9, 2024
d7971b7
Working test_metrics_custom_port()
KrishnanPrash Oct 9, 2024
23b4beb
update docs with metrics support
KrishnanPrash Oct 9, 2024
6ebcdc9
Smoke tests for Metrics Bindings
KrishnanPrash Oct 9, 2024
1289f34
casting float to int for same type comparison
KrishnanPrash Oct 9, 2024
ac8e23d
removing comment
KrishnanPrash Oct 9, 2024
48acc3e
remove TODO comment
KrishnanPrash Oct 9, 2024
c6efa9e
Cleaning up build.py and CMake
KrishnanPrash Oct 9, 2024
2932600
Minimal working CMake configuration
KrishnanPrash Oct 9, 2024
73e1782
updating identity model to use CPU only
KrishnanPrash Oct 9, 2024
1f09417
removing debug statements
KrishnanPrash Oct 9, 2024
5e0df4e
Merge remote-tracking branch 'origin/kprashanth-conditional-endpoints…
KrishnanPrash Oct 9, 2024
43bd2ab
Updated documentation and removed TODO comments
KrishnanPrash Oct 9, 2024
88a710d
fixing spacing and removing unused imports
KrishnanPrash Oct 11, 2024
e0abc3b
removed todo comment
KrishnanPrash Oct 11, 2024
85d7676
moving to support library
KrishnanPrash Oct 15, 2024
8f0b4e1
making tracing lib links public
KrishnanPrash Oct 15, 2024
7607884
Making comments consistent
KrishnanPrash Oct 15, 2024
a88be2d
cleaning up includes
KrishnanPrash Oct 15, 2024
c7503b3
spacing
KrishnanPrash Oct 15, 2024
569c68d
fixing order
KrishnanPrash Oct 15, 2024
bd5c0b5
formatting
KrishnanPrash Oct 15, 2024
378fd2d
Merge branch 'kprashanth-conditional-endpoints' into kprashanth-trito…
KrishnanPrash Oct 15, 2024
82897e1
resolved merge conflicts
KrishnanPrash Oct 18, 2024
246e380
CMake changes
KrishnanPrash Oct 18, 2024
80e88b5
pre-commit changes
KrishnanPrash Oct 18, 2024
acc9e9e
removing redundant parameters
KrishnanPrash Oct 22, 2024
4402e3d
Spacing and comments
KrishnanPrash Oct 22, 2024
2712558
refactor: moving `tritonfrontend` to `@handle_triton_error` decorator…
KrishnanPrash Oct 22, 2024
8e7bba1
removed unused import
KrishnanPrash Oct 22, 2024
ba14f56
spacing
KrishnanPrash Oct 22, 2024
f17dc17
change default metrics thread count
KrishnanPrash Oct 23, 2024
717ee47
fixing type info
KrishnanPrash Oct 24, 2024
d53db9b
fixing default and lower bound on thread count
KrishnanPrash Oct 24, 2024
d2939cb
making error throwing consistent
KrishnanPrash Oct 24, 2024
46121ff
Adding guards around frontend-specific code
KrishnanPrash Oct 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions docs/customization_guide/tritonfrontend.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,14 +57,18 @@ Note: `model_path` may need to be edited depending on your setup.

2. Now, to start up the respective services with `tritonfrontend`
```python
from tritonfrontend import KServeHttp, KServeGrpc
from tritonfrontend import KServeHttp, KServeGrpc, Metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that the Metrics object is a web server, so it makes me wonder if we should rename these down the line, ex: KServeHttpService, MetricsService, etc

But I don't have a strong opinion on an alternative right now so I think it's fine, just mentioning for later. We will probably be restructing some packaging and naming in the near-mid future.

http_options = KServeHttp.Options(thread_count=5)
http_service = KServeHttp(server, http_options)
http_service.start()

# Default options (if none provided)
grpc_service = KServeGrpc(server)
grpc_service.start()

# Can start metrics service as well
metrics_service = Metrics(server)
metrics_service.start()
```

3. Finally, with running services, we can use `tritonclient` or simple `curl` commands to send requests and receive responses from the frontends.
Expand Down Expand Up @@ -97,6 +101,7 @@ print("[INFERENCE RESULTS]")
print("Output data:", output_data)

# Stop respective services and server.
metrics_service.stop()
http_service.stop()
grpc_service.stop()
server.stop()
Expand Down Expand Up @@ -139,7 +144,6 @@ With this workflow, you can avoid having to stop each service after client reque
- The following features are not currently supported when launching the Triton frontend services through the python bindings:
- [Tracing](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/trace.md)
- [Shared Memory](https://github.com/triton-inference-server/server/blob/main/docs/protocol/extension_shared_memory.md)
- [Metrics](https://github.com/triton-inference-server/server/blob/main/docs/user_guide/metrics.md)
- [Restricted Protocols](https://github.com/triton-inference-server/server/blob/main/docs/customization_guide/inference_protocols.md#limit-endpoint-access-beta)
- VertexAI
- Sagemaker
Expand Down
75 changes: 74 additions & 1 deletion qa/L0_python_api/test_kserve.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@
import tritonclient.http as httpclient
import tritonserver
from tritonclient.utils import InferenceServerException
from tritonfrontend import KServeGrpc, KServeHttp
from tritonfrontend import KServeGrpc, KServeHttp, Metrics


class TestHttpOptions:
Expand Down Expand Up @@ -78,8 +78,25 @@ def test_wrong_grpc_parameters(self):
KServeGrpc.Options(server_key=10)


class TestMetricsOptions:
def test_correct_http_parameters(self):
Metrics.Options(address="0.0.0.1", port=8080, thread_count=16)

def test_wrong_http_parameters(self):
# Out of range
with pytest.raises(Exception):
Metrics.Options(port=-15)
with pytest.raises(Exception):
Metrics.Options(thread_count=-5)

# Wrong data type
with pytest.raises(Exception):
Metrics.Options(thread_count="ten")


HTTP_ARGS = (KServeHttp, httpclient, "localhost:8000") # Default HTTP args
GRPC_ARGS = (KServeGrpc, grpcclient, "localhost:8001") # Default GRPC args
METRICS_ARGS = (Metrics, "localhost:8002") # Default Metrics args


class TestKServe:
Expand Down Expand Up @@ -271,6 +288,62 @@ def callback(user_data, result, error):
utils.teardown_client(grpc_client)
utils.teardown_server(server)

@pytest.mark.parametrize("frontend, url", [METRICS_ARGS])
def test_metrics_default_port(self, frontend, url):
server = utils.setup_server()
service = utils.setup_service(server, frontend)

metrics_url = f"http://{url}/metrics"
status_code, _ = utils.get_metrics(metrics_url)

assert status_code == 200

utils.teardown_service(service)
utils.teardown_server(server)

@pytest.mark.parametrize("frontend", [Metrics])
def test_metrics_custom_port(self, frontend, port=8005):
server = utils.setup_server()
service = utils.setup_service(server, frontend, Metrics.Options(port=port))

metrics_url = f"http://localhost:{port}/metrics"
status_code, _ = utils.get_metrics(metrics_url)

assert status_code == 200

utils.teardown_service(service)
utils.teardown_server(server)

@pytest.mark.parametrize("frontend, url", [METRICS_ARGS])
def test_metrics_update(self, frontend, url):
# For this test
# Setup Server, KServeGrpc, Metrics
server = utils.setup_server()
grpc_service = utils.setup_service(
server, KServeGrpc
) # Needed to send inference request
metrics_service = utils.setup_service(server, frontend)

# Get Metrics and verify inference count == 0 before inference
before_status_code, before_inference_count = utils.get_metrics(
f"http://{url}/metrics"
)
assert before_status_code == 200 and before_inference_count == 0

# Send 1 Inference Request with send_and_test_inference()
assert utils.send_and_test_inference_identity(GRPC_ARGS[1], GRPC_ARGS[2])

# Get Metrics and verify inference count == 1 after inference
after_status_code, after_inference_count = utils.get_metrics(
f"http://{url}/metrics"
)
assert after_status_code == 200 and after_inference_count == 1

# Teardown Metrics, GrpcService, Server
utils.teardown_service(grpc_service)
utils.teardown_service(metrics_service)
utils.teardown_server(server)

# KNOWN ISSUE: CAUSES SEGFAULT
# Created [DLIS-7231] to address at future date
# Once the server has been stopped, the underlying TRITONSERVER_Server instance
Expand Down
8 changes: 7 additions & 1 deletion qa/L0_python_api/test_model_repository/identity/config.pbtxt
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,10 @@ output [
data_type: TYPE_STRING
dims: [ 1 ]
}
]
]
instance_group [
{
count: 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ever investigate the gpu label thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated a bit, but did not find the root cause. Will create a ticket in my backlog with hopefully a more consistent reproducer.

kind : KIND_CPU
}
]
88 changes: 71 additions & 17 deletions qa/L0_python_api/testing_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,21 +25,20 @@
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import os
import queue
import re
from functools import partial
from typing import Union
from typing import Tuple, Union

import numpy as np
import requests
import tritonserver
from tritonclient.utils import InferenceServerException
from tritonfrontend import KServeGrpc, KServeHttp

# TODO: Re-Format documentation to fit:
# https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings
from tritonfrontend import KServeGrpc, KServeHttp, Metrics


def setup_server(model_repository="test_model_repository") -> tritonserver.Server:
"""
Using tritonserver, starts a server with the models: identity and delayed_identity
"""
module_directory = os.path.split(os.path.abspath(__file__))[0]
model_path = os.path.abspath(os.path.join(module_directory, model_repository))

Expand All @@ -61,9 +60,12 @@ def teardown_server(server: tritonserver.Server) -> None:

def setup_service(
server: tritonserver.Server,
frontend: Union[KServeHttp, KServeGrpc],
frontend: Union[KServeHttp, KServeGrpc, Metrics],
options=None,
) -> Union[KServeHttp, KServeGrpc]:
) -> Union[KServeHttp, KServeGrpc, Metrics]:
"""
Used to create and start any of the frontends supported by tritonfrontend.
"""
service = frontend(server=server, options=options)
service.start()
return service
Expand All @@ -73,16 +75,31 @@ def teardown_service(service: Union[KServeHttp, KServeGrpc]) -> None:
service.stop()


def setup_client(frontend_client, url: str):
def setup_client(
frontend_client: Union["tritonclient.http", "tritonclient.grpc"], url: str
):
"""
Sets up a client to communicate with the Server through the respective protocol.
"""
return frontend_client.InferenceServerClient(url=url)


def teardown_client(client) -> None:
def teardown_client(
client: Union[
"tritonclient.http.InferenceServerClient",
"tritonclient.grpc.InferenceServerClient",
Comment on lines +89 to +90
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this type hint is correct, and the other places you use Union is missing InferenceServerClient. You could probably also use InferenceServerClientBase though it'd be a bit less strict.

]
) -> None:
client.close()


# Sends an inference to test_model_repository/identity model and verifies input == output.
def send_and_test_inference_identity(frontend_client, url: str) -> bool:
def send_and_test_inference_identity(
frontend_client: Union["tritonclient.http", "tritonclient.grpc"], url: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment on type hints, apply throughout

) -> bool:
"""
Sends an inference request to the model at test_model_repository/identity
and verifies input == output
"""
model_name = "identity"
client = setup_client(frontend_client, url)
input_data = np.array(["testing"], dtype=object)
Expand All @@ -102,9 +119,13 @@ def send_and_test_inference_identity(frontend_client, url: str) -> bool:
return input_data[0] == output_data[0].decode()


# Sends multiple streaming requests to "delayed_identity" model with negligible delays,
# and verifies the inputs matches outputs and the ordering is preserved.
def send_and_test_stream_inference(frontend_client, url: str) -> bool:
def send_and_test_stream_inference(
frontend_client: Union["tritonclient.http", "tritonclient.grpc"], url: str
) -> bool:
"""
Sends multiple streaming requests to "delayed_identity" model with negligible delays
and verifies the inputs matches outputs and the ordering is preserved.
"""
num_requests = 100
requests = []
for i in range(num_requests):
Expand Down Expand Up @@ -135,14 +156,18 @@ def callback(responses, result, error):


def send_and_test_generate_inference() -> bool:
"""
Sends an inference request to and identity model through the
HTTP generate endpoint and verifies input == output
"""
model_name = "identity"
url = f"http://localhost:8000/v2/models/{model_name}/generate"
input_text = "testing"
data = {
"INPUT0": input_text,
}

response = requests.post(url, json=data, stream=True)
response = requests.post(url, json=data)
if response.status_code == 200:
result = response.json()
output_text = result.get("OUTPUT0", "")
Expand All @@ -151,3 +176,32 @@ def send_and_test_generate_inference() -> bool:
return True

return False


def get_metrics(metrics_url: str, model_name: str = "identity") -> Tuple[int, int]:
"""
Sends a request to the metrics endpoint and returns the following information:
1. Status Code = Indicates whether interaction with Metrics endpoint was successful
2. Inference Count = Indicates whether metrics data being returned is accurate
"""
response = requests.get(metrics_url)
inference_count = None

if response.status_code == 200:
inference_count = _extract_inference_count(response.text, model_name)
return response.status_code, inference_count


def _extract_inference_count(metrics_data: str, model_name: str):
"""
Helper function for _get_metrics that parses metrics_data (prometheus-friendly
format) with regex to extract the inference count of model_name.
"""
pattern = (
rf'nv_inference_count\{{.*?model="{re.escape(model_name)}".*?\}}\s+([0-9.]+)'
)
match = re.search(pattern, metrics_data)
if match:
return int(float(match.group(1)))

return None
16 changes: 16 additions & 0 deletions src/http_server.cc
Original file line number Diff line number Diff line change
Expand Up @@ -364,6 +364,22 @@ HTTPMetricsServer::Create(
return nullptr;
}

TRITONSERVER_Error*
HTTPMetricsServer::Create(
std::shared_ptr<TRITONSERVER_Server>& server,
const UnorderedMapType& options, std::unique_ptr<HTTPServer>* service)
{
int port;
std::string address;
int thread_count;

RETURN_IF_ERR(GetValue(options, "port", &port));
RETURN_IF_ERR(GetValue(options, "address", &address));
RETURN_IF_ERR(GetValue(options, "thread_count", &thread_count));

return Create(server, port, address, thread_count, service);
}

#endif // TRITON_ENABLE_METRICS

namespace {
Expand Down
4 changes: 4 additions & 0 deletions src/http_server.h
Original file line number Diff line number Diff line change
Expand Up @@ -134,6 +134,10 @@ class HTTPMetricsServer : public HTTPServer {
std::string address, int thread_cnt,
std::unique_ptr<HTTPServer>* metrics_server);

static TRITONSERVER_Error* Create(
std::shared_ptr<TRITONSERVER_Server>& server,
const UnorderedMapType& options, std::unique_ptr<HTTPServer>* service);

~HTTPMetricsServer() = default;

private:
Expand Down
10 changes: 8 additions & 2 deletions src/python/tritonfrontend/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,19 @@
from importlib.metadata import PackageNotFoundError, version

try:
from tritonfrontend._api._kservehttp import KServeHttp
from tritonfrontend._api import KServeHttp
except ImportError:
# TRITON_ENABLE_HTTP=OFF
pass

try:
from tritonfrontend._api._kservegrpc import KServeGrpc
from tritonfrontend._api import KServeGrpc
except ImportError:
# TRITON_ENABLE_GRPC=OFF
pass

try:
from tritonfrontend._api import Metrics
except ImportError:
# TRITON_ENABLE_METRICS=OFF
pass
7 changes: 7 additions & 0 deletions src/python/tritonfrontend/_api/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,10 @@
# TRITON_ENABLE_GRPC=OFF
# TritonFrontendGrpc Package was not present
pass

try:
from ._metrics import Metrics
except ImportError:
# TRITON_ENABLE_Metrics=OFF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure L0_build_variants passes

# TritonFrontendMetrics Package was not present
pass
14 changes: 14 additions & 0 deletions src/python/tritonfrontend/_api/_error_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

import sys

import tritonserver
from tritonfrontend._c.tritonfrontend_bindings import (
AlreadyExistsError,
Expand All @@ -47,3 +49,15 @@
AlreadyExistsError: tritonserver.AlreadyExistsError,
UnsupportedError: tritonserver.UnsupportedError,
}


def handle_triton_error(func):
def error_handling_wrapper(*args, **kwargs):
try:
func(*args, **kwargs)
except TritonError:
exc_type, exc_value, _ = sys.exc_info()
# raise ... from None masks the tritonfrontend Error from being added in traceback
raise ERROR_MAPPING[exc_type](exc_value) from None

return error_handling_wrapper
Loading
Loading