Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stream-Rail distribution with addr + port #1411

Closed
wants to merge 2 commits into from

Conversation

narategithub
Copy link
Collaborator

When the stream forward the stream data, it used the peer address to determine the endpoint in the rail to forward the data. This patch adds port into the calculation as well to increase the stream data distribution among the endpoints in the rail.

@narategithub
Copy link
Collaborator Author

@tom95858 it failed the compatibility test due to IPv6 issue in the test script that was addressed in #1408 . Once that pull request is merged, I'll rerun the test in this PR.

@narategithub
Copy link
Collaborator Author

@tom95858 Shall I also include the stream name into the calculation to increase the distribution?

@narategithub
Copy link
Collaborator Author

I'll add name into the calculation too. I'll mark this as draft for now.

@narategithub narategithub marked this pull request as draft July 12, 2024 01:51
@tom95858
Copy link
Collaborator

@narategithub what's the status of this?

When the stream forward the stream data, it used the peer address to
determine the endpoint in the rail to forward the data. This patch adds
`port` into the calculation as well to increase the stream data
distribution among the endpoints in the rail.
@narategithub
Copy link
Collaborator Author

@tom95858 the code is done but the ldms_stream_test script failed. I'm debugging it.

@narategithub narategithub marked this pull request as ready for review July 13, 2024 14:42
@narategithub
Copy link
Collaborator Author

@tom This is ready for your review.

@@ -478,6 +491,7 @@ __stream_deliver(struct ldms_addr *src, uint64_t msg_gn,
.name_len = name_len,
.data_len = data_len,
.name = stream_name,
.name_hash = hash,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you're going to put the hash in the message, you could put the entire thing in there and avoid the computation on every message. The port and addr aren't changing. This would also mean that the same hash is used all the way up the chain from sender to consumer.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The address got changed in the following case.

There is a logic to exclude the use of local address (e.g. 127.0.0.1) as src address here:

https://github.com/ovis-hpc/ovis/blob/f71e7adcff4a852f3ccc05b143a7794a6bc71986/ldms/src/core/ldms_stream.c#L1131-L1153

This is to prevent the case that the application running on the same host connects to sampler ldmsd using localhost address and make all src in the messages from all nodes being 127.0.0.1. Basically, if the src address is local, do not resolve it and let the next guy resolve it instead.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there must be code somewhere else that checks the address and family and if 0 sets it to the address and port of the forwarder? Where is that logic?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@narategithub In any event, all we're doing here is randomizing the seed to what we hope will be a random hash. So as-usual, we've found ourselves in a hole so we keep digging. Let's just back up and decide how we're going to generate a random hash.

Adding the (hash of) stream name into the endpoint index calculation to
increase entropy (and hopefully more even distribution among endpoints).
@narategithub
Copy link
Collaborator Author

Closing this PR. This will be a part of adaptive flow control PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants