-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Metrics Support in tritonfrontend
#7703
base: main
Are you sure you want to change the base?
Conversation
…' into kprashanth-tritonfrontend-metrics
@@ -57,14 +57,18 @@ Note: `model_path` may need to be edited depending on your setup. | |||
|
|||
2. Now, to start up the respective services with `tritonfrontend` | |||
```python | |||
from tritonfrontend import KServeHttp, KServeGrpc | |||
from tritonfrontend import KServeHttp, KServeGrpc, Metrics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't love that the Metrics
object is a web server, so it makes me wonder if we should rename these down the line, ex: KServeHttpService
, MetricsService
, etc
But I don't have a strong opinion on an alternative right now so I think it's fine, just mentioning for later. We will probably be restructing some packaging and naming in the near-mid future.
{ | ||
count: 1 | ||
kind : KIND_CPU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you ever investigate the gpu label thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Investigated a bit, but did not find the root cause. Will create a ticket in my backlog with hopefully a more consistent reproducer.
"tritonclient.http.InferenceServerClient", | ||
"tritonclient.grpc.InferenceServerClient", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this type hint is correct, and the other places you use Union
is missing InferenceServerClient
. You could probably also use InferenceServerClientBase
though it'd be a bit less strict.
qa/L0_python_api/testing_utils.py
Outdated
# Sends an inference to test_model_repository/identity model and verifies input == output. | ||
def send_and_test_inference_identity(frontend_client, url: str) -> bool: | ||
def send_and_test_inference_identity( | ||
frontend_client: Union["tritonclient.http", "tritonclient.grpc"], url: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other comment on type hints, apply throughout
try: | ||
from ._metrics import Metrics | ||
except ImportError: | ||
# TRITON_ENABLE_Metrics=OFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure L0_build_variants
passes
def __exit__(self, exc_type, exc_value, traceback): | ||
self.triton_frontend.stop() | ||
if exc_type: | ||
raise ERROR_MAPPING[exc_type](exc_value) from None | ||
raise exc_type(exc_value) from None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment: #7720 (comment)
Why did this one keep the from None
but the others didn't?
class Options: | ||
address: str = "0.0.0.0" | ||
port: int = Field(8002, ge=0, le=65535) | ||
thread_count: int = Field(1, ge=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can thread count be zero?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only minor comments - nice work!
class Options: | ||
address: str = "0.0.0.0" | ||
port: int = Field(8002, ge=0, le=65535) | ||
thread_count: int = Field(1, ge=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix lower bound in the http/kserve classes too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should have added test case for 0 thread count, and probably for other numerical options
What does the PR do?
Adding support for
Metrics
intritonfrontend
. This involves two components:HTTPMetricsServer
Metrics
classWith this PR, similar to
KServeHttp
andKServeGrpc
, the metrics service can used with:Additional Changes made in this PR:
request.post(...)
based on this commentTest plan:
Added 3 test function to L0_python_api:
test_metrics_default_port()
: Tests whether the metrics service can start as expectedtest_metrics_custom_port()
: Tests whether arguments defined intritonfrontend.Metrics.Options
are passed successfully toHTTPMetrics
test_metrics_update()
: Tests whethernv_inference_count
value goes from 0 to 1 if inference request is performed.