Micro-batching and time spent from input to user handler #2173
Unanswered
andreea-anghel
asked this question in
General
Replies: 1 comment
-
@andreea-anghel If you look in bentoml._internal.frameworks, you can see the different frameworks we support. Within a framework, you can see that "load_runner" returns a subclass of "Runner" which implements either run or run_batch (we're changing this architecture soon so that it's only run fyi). If you log a timestamp when you are calling the "run" method in your api service, then log a timestamp when the run() method is called on the runner, the difference between the 2 should be the amount of time it takes to dispatch the message between the service and the model runner. Hope this helps! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have created an ML inference service using BentoML as the serving framework. I am using the adaptive BentoML micro-batching feature. I would like to measure the time spent by an HTTP request from the moment it arrives at the BentoML server until it is dispatched to the user-defined handler (being an inference service, this would be the predict function of a ML framework). Could anyone help with how to instrument the BentoML code to measure this time? Any input is appreciated - thank you.
Beta Was this translation helpful? Give feedback.
All reactions