Hypothesis: CPU stress on standalone gateway should not cause harm cluster wise #28

Zelldon · 2020-06-11T12:06:26Z

Hypothesis

We believe that when we stress the standalone gateway CPU that this will not affect the stability of the cluster.

We expect that the latency will go up and the throughput will go down during this period of time. AFTER this it should go back to normal.

Zelldon · 2020-06-11T13:19:23Z

We did today an chaos experiment where we used our standard setup with a baseline load of 100 workflow instance and 6 workers, which can activate 120 jobs max.

On our steady state we saw that we are able to start and complete 100 workflow instances in a second. One instance took 1 - 2.5 seconds.

We expected when we introduce stress on the standalone gateway CPU that the latency of the processing goes up and the throughput goes down, but there should be no cluster wide failures happening. We expected that after removing the stress the system should come back to normal and the baseline should be reached again.

The results looks promising:

We tested it twice and saw that the throughput goes down and latency up on stress, but comes back to normal after removing it.

What was unexpected or what we found out:

Unexpected was that our Broker Backpressure goes up, which means it drops requests during the stress time. This was not expected, since the latency between writing to dispatcher and processing the event should not change. We probably need to investigate this more. Current assumption is that the gateway sends requests in batches and this causes in higher spikes on the backpressure. We need more metrics on the transport module to verify that.

We found out that the standalone gateway is not resource limited, which caused that we used at some point 12 cpu cores.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hypothesis: CPU stress on standalone gateway should not cause harm cluster wise #28

Hypothesis: CPU stress on standalone gateway should not cause harm cluster wise #28

Zelldon commented Jun 11, 2020

Zelldon commented Jun 11, 2020

Hypothesis: CPU stress on standalone gateway should not cause harm cluster wise #28

Hypothesis: CPU stress on standalone gateway should not cause harm cluster wise #28

Comments

Zelldon commented Jun 11, 2020

Hypothesis

Zelldon commented Jun 11, 2020