🐛 Fix request hanging in periodic concurrency greater than 4 #410

nv-hwoo · 2023-10-06T06:40:44Z

While testing out the new periodic concurrency mode, @matthewkotila and I have noticed that the 5-th request would consistently hang forever and wait for response from the server.

The issue was that max_threads_ was set default to 4, and if this value was less than the concurrency value, the request created after the max_threads_ number of times (e.g. the 5-th one in our case) will be missing input data. This is because the infer_data_manager that prepares all the input data, uses max_threads_ count to generate the input data and populate the requests with the input data. And I believe this is why the server - when it received the empty 5-th request - did not pass it down to the model.

The fix checks the max_threads_ value against the concurrency range (similar to how regular concurrency mode does). Not quite sure why we have max_threads_ option in the first place, but I feel like we want to eventually move away from using max_threads_ and just deduce the thread numbers from concurrency level.

Thanks @matthewkotila for drilling down and investigating the bug and @Tabrizian & @rmccorm4 for tips and insights.

matthewkotila · 2023-10-06T17:52:51Z

@Tabrizian @rmccorm4 Is it reasonable to create a feature request for verbose mode on the server logging when a request without inputs is received? That would have helped save so much time debugging this.

rmccorm4 · 2023-10-06T18:18:35Z

This is because the infer_data_manager that prepares all the input data, uses max_threads_ count to generate the input data and populate the requests with the input data. And I believe this is why the server - when it received the empty 5-th request - did not pass it down to the model.

Did you confirm the server is actually receiving the request?

Is it reasonable to create a feature request for verbose mode on the server logging when a request without inputs is received? That would have helped save so much time debugging this.

I believe the server would return an error on reading the request that the inputs didn't match model config. So I suspect the server didn't even get this far.

nv-hwoo · 2023-10-06T18:21:21Z

@rmccorm4 No this is just my guess of what happened after the request has been sent. Was too tired to go further 😅 Given that the input wasn't properly generated for the request, I think there is a possibility that request didn't even reach the server.

Creates parity with changes from #410

Creates parity with changes from triton-inference-server/client#410

Adjust max_threads according to concurrency

c5bd525

nv-hwoo requested a review from matthewkotila October 6, 2023 07:05

matthewkotila approved these changes Oct 6, 2023

View reviewed changes

nv-hwoo merged commit 22b74c2 into periodic-concurrency-mode Oct 6, 2023
3 checks passed

nv-hwoo deleted the hwoo-debug-periodic branch October 6, 2023 20:38

matthewkotila mentioned this pull request Oct 6, 2023

Add continus batch size benchmark to LLM guide #404

Merged

matthewkotila pushed a commit that referenced this pull request Oct 7, 2023

Adjust max_threads according to concurrency (#410)

a4f2ac9

matthewkotila pushed a commit that referenced this pull request Oct 7, 2023

Adjust max_threads according to concurrency (#410)

96e17fc

matthewkotila added a commit that referenced this pull request Oct 9, 2023

Fix unit test failure

dfc6850

Creates parity with changes from #410

matthewkotila mentioned this pull request Oct 9, 2023

Fix unit test failure #413

Merged

matthewkotila added a commit that referenced this pull request Oct 10, 2023

Fix unit test failure

1b0c1ca

Creates parity with changes from #410

fpetrini15 pushed a commit to triton-inference-server/perf_analyzer that referenced this pull request Jun 26, 2024

Fix unit test failure

95bca51

Creates parity with changes from triton-inference-server/client#410

fpetrini15 pushed a commit to triton-inference-server/perf_analyzer that referenced this pull request Jul 29, 2024

Fix unit test failure

110ff80

Creates parity with changes from triton-inference-server/client#410

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🐛 Fix request hanging in periodic concurrency greater than 4 #410

🐛 Fix request hanging in periodic concurrency greater than 4 #410

nv-hwoo commented Oct 6, 2023

matthewkotila commented Oct 6, 2023

rmccorm4 commented Oct 6, 2023

nv-hwoo commented Oct 6, 2023 •

edited

Loading

🐛 Fix request hanging in periodic concurrency greater than 4 #410

🐛 Fix request hanging in periodic concurrency greater than 4 #410

Conversation

nv-hwoo commented Oct 6, 2023

matthewkotila commented Oct 6, 2023

rmccorm4 commented Oct 6, 2023

nv-hwoo commented Oct 6, 2023 • edited Loading

nv-hwoo commented Oct 6, 2023 •

edited

Loading