-
Notifications
You must be signed in to change notification settings - Fork 443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
redistribute RDY in high throughput, idle producer situations #277
base: master
Are you sure you want to change the base?
Conversation
And right after I open a PR I realize that I don't correctly trade back RDY when nsqd conns go from idle / unused to busy / used. Will amend. |
I think this kind of improvement would be welcome ... but historically the ready-count distribution has been the trickiest logic, when you add in backoff and such. If we can be sure that this new logic doesn't get "stuck" in an on or off state when it shouldn't, then we probably would not want a new config option. Ideally, the code works well, and there would be no need to "turn it off if it causes problems". Although I'm currently one of the nsq maintainers, I'm not the biggest contributor to this particular repo, and don't have a lot of familiarity with this implementation of the ready-count logic 😅 so I can't promise very prompt review. |
@mscso thanks for starting this discussion/effort; I appreciate many of the challenges you described around uneven message distribution as i've experienced them as well. As @ploxiln mentioned, there is good appetite for improving this aspect of nsq / go-nsq, but an equal dose of caution because a one-size-fits-all has been elusive so far despite it being desired. I have a few high level thoughts on how we might think of changing the paradigm to resolve this at a higher level:
Do you feel any of these would better fit your needs and reduce complexities around max-in-flight settings? cc: @mreiferson @ploxiln |
When there are several producer nsqds registered on a nsqlookupd but only one of them (at least not all) is currently producing messages, the current flat max-in-flight distribution leads to the consumer effectively having fewer messages in flight than we might want.
Consider a situation of 4 hosts / nsqds being used to produce messages on a topic - but only one of them is used at a time (various reasons). Consider a single consumer setting max-in-flight of 8. These are equally spread so each nsqd connection will have a RDY count of 2. Since at any point in time 3 of the 4 nsqds are idle / not producing on the topic, we effectively only ever have 2 messages in flight.
One workaround is to increase the max-in-flight drastically (multiply by nsqd count) but then we might have more messages in flight than our consumer wants if suddenly more than one nsqd is producing messages.
We constantly deal with this situation (automatically scheduled producer containers that move between hosts), we implemented a second RDY redistribution function that trades RDY count from an unused nsqd connection to a "busy" nsqd connection.
Since this might not be useful / wanted in every use case the feature is only enabled with a config flag
RDYTrading
.The code is similar to the normal code in redistributeRDY for the
max-in-flight < len(conns)
situation but here it essentially deals withmax-in-flight > len(producing_conns)
.Let me know what you think and whether this could be useful for others and thus whether you think it could be merged upstream.
NSQ2019/12/05 14:26:34 DBG 1 [foo/bar] looking for RDY trade possibilities...
NSQ2019/12/05 14:26:34 DBG 1 [foo/bar] - moving 3 RDY from 10.13.2.51:4150 to 10.13.2.85:4150
NSQ2019/12/05 14:26:34 DBG 1 [foo/bar] - moving 3 RDY from 10.13.2.39:4150 to 10.13.2.85:4150