Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 Fix request hanging in periodic concurrency greater than 4 #410

Merged
merged 1 commit into from
Oct 6, 2023

Conversation

nv-hwoo
Copy link
Contributor

@nv-hwoo nv-hwoo commented Oct 6, 2023

While testing out the new periodic concurrency mode, @matthewkotila and I have noticed that the 5-th request would consistently hang forever and wait for response from the server.

The issue was that max_threads_ was set default to 4, and if this value was less than the concurrency value, the request created after the max_threads_ number of times (e.g. the 5-th one in our case) will be missing input data. This is because the infer_data_manager that prepares all the input data, uses max_threads_ count to generate the input data and populate the requests with the input data. And I believe this is why the server - when it received the empty 5-th request - did not pass it down to the model.

The fix checks the max_threads_ value against the concurrency range (similar to how regular concurrency mode does). Not quite sure why we have max_threads_ option in the first place, but I feel like we want to eventually move away from using max_threads_ and just deduce the thread numbers from concurrency level.

Thanks @matthewkotila for drilling down and investigating the bug and @Tabrizian & @rmccorm4 for tips and insights.

@matthewkotila
Copy link
Contributor

@Tabrizian @rmccorm4 Is it reasonable to create a feature request for verbose mode on the server logging when a request without inputs is received? That would have helped save so much time debugging this.

@rmccorm4
Copy link
Contributor

rmccorm4 commented Oct 6, 2023

This is because the infer_data_manager that prepares all the input data, uses max_threads_ count to generate the input data and populate the requests with the input data. And I believe this is why the server - when it received the empty 5-th request - did not pass it down to the model.

Did you confirm the server is actually receiving the request?

Is it reasonable to create a feature request for verbose mode on the server logging when a request without inputs is received? That would have helped save so much time debugging this.

I believe the server would return an error on reading the request that the inputs didn't match model config. So I suspect the server didn't even get this far.

@nv-hwoo
Copy link
Contributor Author

nv-hwoo commented Oct 6, 2023

@rmccorm4 No this is just my guess of what happened after the request has been sent. Was too tired to go further 😅 Given that the input wasn't properly generated for the request, I think there is a possibility that request didn't even reach the server.

@nv-hwoo nv-hwoo merged commit 22b74c2 into periodic-concurrency-mode Oct 6, 2023
3 checks passed
@nv-hwoo nv-hwoo deleted the hwoo-debug-periodic branch October 6, 2023 20:38
matthewkotila added a commit that referenced this pull request Oct 9, 2023
Creates parity with changes from #410
matthewkotila added a commit that referenced this pull request Oct 10, 2023
Creates parity with changes from #410
fpetrini15 pushed a commit to triton-inference-server/perf_analyzer that referenced this pull request Jun 26, 2024
fpetrini15 pushed a commit to triton-inference-server/perf_analyzer that referenced this pull request Jul 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants