Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UCS/UCP/RNDV: RNDV pipeline invalidation #10204

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

iyastreb
Copy link
Contributor

@iyastreb iyastreb commented Oct 2, 2024

What

Invalidation for RNDV pipeline protocol based on put_mtype/rtr_mtype.
In the previous PR #10075, we advertised error handling capability for rndv_ppln over rnvd_(put/rtr)_mtype) so that this protocol can be selected.

Design document

Why ?

We want rndv_ppln over rnvd_(put/rtr)_mtype) protocol to be selected over transport with error handling, for performance reasons.

How ?

  • Remove RNDV_PIPELINE_ERROR_HANDLING environment variable, as we have true error handling support
  • Extend RNDV frag memory pool to have reference counting per chunks
  • Store memh completion entry in request invalidate instead of rcache region
  • Disable using rcache for RNDV frag memory pool (to avoid shared regions due to merge)
  • Rkey-only invalidation: allows to de-register only indirect rkeys while not touching memh

@iyastreb iyastreb force-pushed the ucs/ucp/rndv/ppln-invalidation branch from 5670125 to d9f1684 Compare October 4, 2024 09:42
@iyastreb iyastreb marked this pull request as ready for review October 7, 2024 07:27
Copy link
Contributor

@brminich brminich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first bunch of comments

Comment on lines +3572 to +3573
if (ucp_request_memh_check_invalidate(req, 0)) {
ucp_request_memh_invalidate(req, status, 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having one function looks cleaner to me

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it looks cleaner. But the purpose was to make it safe. Here is the existing code:

    << When ucp_request_memh_invalidate succeeds, it populates the `invalidate`
    << structure within request.send union, which invalidates all the other structures like `rndv`
    << Basically after invalidation you cannot use other structures within union
    if (ucp_request_memh_invalidate(req, status)) {
        << This is unsafe, `rndv` struct is overwritten by this moment!
        << Well, it works by chance, because for now `invalidate` struct contains just one pointer,
        << so it does not overwrite much from `rndv`. With this commit we extend `invalidate` struct,
        << so it's not possible to access rndv after invalidation
        if (req->send.rndv.rkey != NULL) {
            ucp_proto_rndv_rkey_destroy(req);
        }
        ucp_proto_request_zcopy_id_reset(req);
        return;
    }

So the split is done, because we need to perform some cleanup tasks on invalidated requests, but essentially we can do that only before invalidation actually happens.

I can check whether we can move this conditional cleanup tasks before - in that can split is not needed.

params.flags = 0;
}

ucs_trace("de-registering %smemh[%d]=%p", (rkey_only ? "rkey of " : ""),
md_index, memh->uct[md_index]);
ucs_assert(context->tl_mds[md_index].attr.flags & UCT_MD_FLAG_REG);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ucs_assertv() and print the flags

static void ucp_memh_dereg_rkey(ucp_context_h context, ucp_mem_h memh,
ucp_md_map_t md_map)
{
ucs_trace("memh %p: invalidate only indirect rkeys: md_map %" PRIx64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove indirect, as it is UCT IB specific

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

@@ -798,7 +828,8 @@ ucp_memh_init_uct_reg(ucp_context_h context, ucp_mem_h memh,

cache_md_map = context->cache_md_map[mem_type] & reg_md_map;

if ((context->rcache == NULL) || (cache_md_map == 0)) {
if ((context->rcache == NULL) || (cache_md_map == 0) ||
(uct_flags & UCT_MD_MEM_FLAG_NO_RCACHE)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should not be a new UCT flag because it has no meaning in UCT and is only used in UCP

{
ucp_mem_desc_t *mdesc = ucp_worker_mpool_get(mpool);

mdesc->memh->super.refcount ++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mdesc->memh->super.refcount ++;
mdesc->memh->super.refcount++;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

ucs_mpool_add_to_freelist(mp, elem);
}

ucs_debug("mpool %s: added %u elements of chunk %p to the freelist",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls use some other level, like trace.
debug is enabled in UCX release build and we do not add any traffic related traces to debug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


/* If we reach this point, it means that chunk is marked for invalidation,
* and all of its inflight operations are completed. Now we can de-register
* its indirect rkeys and revive the chunk = return it back to mpool. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls remove indirect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

* and all of its inflight operations are completed. Now we can de-register
* its indirect rkeys and revive the chunk = return it back to mpool. */
ucp_memh_dereg_rkey(context, memh, memh->inv_md_map);
ucs_assert(0 == memh->inv_md_map);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ucs_assert(0 == memh->inv_md_map);
ucs_assertv(0 == memh->inv_md_map, "inv_md_map=0x%lx", memh->inv_md_map);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +358 to +360
if (rkey_only) {
params.flags |= UCT_MD_MEM_DEREG_FLAG_INVALIDATE_RKEY_ONLY;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can pass uct_flags instead of rkey_only and pass UCT_MD_MEM_DEREG_FLAG_INVALIDATE_RKEY_ONLY directly. This would simplify this code

@@ -1490,6 +1504,16 @@ uct_ib_mlx5_devx_mem_dereg(uct_md_h uct_md,
return status;
}

/* If requested invalidation of only rkeys, then return success */
flags = UCT_MD_MEM_DEREG_FIELD_VALUE(params, flags, FIELD_FLAGS, 0);
if (flags & UCT_MD_MEM_DEREG_FLAG_INVALIDATE_RKEY_ONLY) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we always do invalidation of just rkey without introducing a new flag? Why do we need two flags now?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants