Micro-batching and time spent from input to user handler #2173

andreea-anghel · 2022-01-14T15:18:42Z

andreea-anghel
Jan 14, 2022

I have created an ML inference service using BentoML as the serving framework. I am using the adaptive BentoML micro-batching feature. I would like to measure the time spent by an HTTP request from the moment it arrives at the BentoML server until it is dispatched to the user-defined handler (being an inference service, this would be the predict function of a ML framework). Could anyone help with how to instrument the BentoML code to measure this time? Any input is appreciated - thank you.

timliubentoml · 2022-03-31T20:59:56Z

timliubentoml
Mar 31, 2022
Collaborator

@andreea-anghel If you look in bentoml._internal.frameworks, you can see the different frameworks we support. Within a framework, you can see that "load_runner" returns a subclass of "Runner" which implements either run or run_batch (we're changing this architecture soon so that it's only run fyi).

If you log a timestamp when you are calling the "run" method in your api service, then log a timestamp when the run() method is called on the runner, the difference between the 2 should be the amount of time it takes to dispatch the message between the service and the model runner.

Hope this helps!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BentoML

Micro-batching and time spent from input to user handler #2173

{{title}}

Replies: 1 comment

{{title}}

Select a reply

BentoML

Micro-batching and time spent from input to user handler #2173

andreea-anghel Jan 14, 2022

Replies: 1 comment

timliubentoml Mar 31, 2022 Collaborator

andreea-anghel
Jan 14, 2022

timliubentoml
Mar 31, 2022
Collaborator