Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Metrics Support in tritonfrontend #7703

Open
wants to merge 48 commits into
base: main
Choose a base branch
from

Conversation

KrishnanPrash
Copy link
Contributor

@KrishnanPrash KrishnanPrash commented Oct 15, 2024

What does the PR do?

Adding support for Metrics in tritonfrontend. This involves two components:

  • In tritonfrontend_pybind.cc, added bindings for HTTPMetricsServer
  • In tritonfrontend/_api/_metrics.py, added a Metrics class

With this PR, similar to KServeHttp and KServeGrpc, the metrics service can used with:

import tritonserver
from tritonfrontend import Metrics
server = tritonserver.Server(model_repostory=...).start(wait_until_ready=True)
metrics_service = Metrics(server)
metrics_service.start()
...
metrics_service.stop()

Additional Changes made in this PR:

  • Modified test functions documentation based on this comment
  • Removed extra parameter in request.post(...) based on this comment

Test plan:

Added 3 test function to L0_python_api:

  • test_metrics_default_port() : Tests whether the metrics service can start as expected
  • test_metrics_custom_port(): Tests whether arguments defined in tritonfrontend.Metrics.Options are passed successfully to HTTPMetrics
  • test_metrics_update(): Tests whether nv_inference_count value goes from 0 to 1 if inference request is performed.
  • CI Pipeline ID: 19197748

GuanLuo
GuanLuo previously approved these changes Oct 23, 2024
@@ -57,14 +57,18 @@ Note: `model_path` may need to be edited depending on your setup.

2. Now, to start up the respective services with `tritonfrontend`
```python
from tritonfrontend import KServeHttp, KServeGrpc
from tritonfrontend import KServeHttp, KServeGrpc, Metrics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love that the Metrics object is a web server, so it makes me wonder if we should rename these down the line, ex: KServeHttpService, MetricsService, etc

But I don't have a strong opinion on an alternative right now so I think it's fine, just mentioning for later. We will probably be restructing some packaging and naming in the near-mid future.

{
count: 1
kind : KIND_CPU
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you ever investigate the gpu label thing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated a bit, but did not find the root cause. Will create a ticket in my backlog with hopefully a more consistent reproducer.

Comment on lines +89 to +90
"tritonclient.http.InferenceServerClient",
"tritonclient.grpc.InferenceServerClient",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this type hint is correct, and the other places you use Union is missing InferenceServerClient. You could probably also use InferenceServerClientBase though it'd be a bit less strict.

# Sends an inference to test_model_repository/identity model and verifies input == output.
def send_and_test_inference_identity(frontend_client, url: str) -> bool:
def send_and_test_inference_identity(
frontend_client: Union["tritonclient.http", "tritonclient.grpc"], url: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See other comment on type hints, apply throughout

try:
from ._metrics import Metrics
except ImportError:
# TRITON_ENABLE_Metrics=OFF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure L0_build_variants passes

def __exit__(self, exc_type, exc_value, traceback):
self.triton_frontend.stop()
if exc_type:
raise ERROR_MAPPING[exc_type](exc_value) from None
raise exc_type(exc_value) from None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: #7720 (comment)

Why did this one keep the from None but the others didn't?

class Options:
address: str = "0.0.0.0"
port: int = Field(8002, ge=0, le=65535)
thread_count: int = Field(1, ge=0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can thread count be zero?

Copy link
Collaborator

@rmccorm4 rmccorm4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only minor comments - nice work!

class Options:
address: str = "0.0.0.0"
port: int = Field(8002, ge=0, le=65535)
thread_count: int = Field(1, ge=1)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix lower bound in the http/kserve classes too

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should have added test case for 0 thread count, and probably for other numerical options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR: feat A new feature
Development

Successfully merging this pull request may close these issues.

4 participants