Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Forward-merge branch-0.33 to branch-0.34 #71

Closed
wants to merge 3 commits into from
Closed

Conversation

GPUtester
Copy link
Contributor

Forward-merge triggered by push to branch-0.33 that creates a PR to keep branch-0.34 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge.

This PR adds appropriate pinnings for `gtest` and `gmock` in `libucxx-tests`.

Authors:
  - Bradley Dice (https://github.com/bdice)

Approvers:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Ray Douglass (https://github.com/raydouglass)

URL: #66
@GPUtester GPUtester requested a review from a team as a code owner July 26, 2023 13:18
@GPUtester
Copy link
Contributor Author

FAILURE - Unable to forward-merge due to conflicts, manual merge is necessary. Do not use the Resolve conflicts option in this PR, follow these instructions https://docs.rapids.ai/maintainers/forward-merger/
IMPORTANT: When merging this PR, do not use the auto-merger (i.e. the /merge comment). Instead, an admin must manually merge by changing the merging strategy to Create a Merge Commit. Otherwise, history will be lost and the branches become incompatible.

This PR introduces various fixes:

* Fix destruction of `ucxx::RequestTagMulti`, which after moving to inheriting from `ucxx::Request`, using `setStatus()` was missing, thus causing the `ucxx::RequestTagMulti` to never cleanup its inflight request status. Calling `setStatus()` instead of writing the `_status` attribute directly resolves this.
* The status of `ucxx::RequestTagMulti` was until now always `UCS_INPROGRESS` or `UCS_OK` after completing. This is not right but there is no good way to combine the statuses of all underlying requests. Therefore we now set the status of the first failing request as the final status instead of `UCS_OK` if at least one of the requests failed. The user can still check each underlying's request status if granular information is required.
* Since `ucxx::RequestTagMulti` now inherits from `ucxx::Request`, the latter's trace logging for creation/destruction are sufficient, thus removing the redundant trace logs is appropriate.
* Add lifetime trace logs for `ucxx::BufferRequest`.
* Execute user-defined callback within the scope fo `ucxx::Request::setStatus()` to prevent accidental reordering mistakes, as the callback needs to always execute after the status is set, because the callback itself might make use of that information. Additionally it will prevent missing execution of the callback should `setStatus()` be called elsewhere in the future.
* Return `std::shared_ptr<Buffer> in allocateBuffer()` as returning raw pointers may cause leaks, and can be problematic specifically in `ucxx::RequestTagMulti` as the `Buffer*` will not be released unless the user takes ownership and releases it appropriately.
* Introduce mutex in `ucxx::Request` because of potential concurrency issues when running with the progress thread, which a mutex can resolve in `ucxx::Request`. A common case is when `ucxx::Endpoint` is registering an inflight request and `setStatus()` has just begun running, in which case the inflight request will be removed (a no-op as it hasn't been previously registered) and then actually registered, which will cause the `ucxx::Request` to never cleanup.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #72
@rapids-bot rapids-bot bot requested review from a team as code owners August 2, 2023 22:03
Implement and use generic callbacks to create endpoints, close endpoints and listeners, and probe tags. By doing this we ensure those don't need to take the UCS spinlock or progress the worker during the application thread, which can be sources of deadlocks.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)
  - Lawrence Mitchell (https://github.com/wence-)

Approvers:
  - Lawrence Mitchell (https://github.com/wence-)

URL: #65
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants