Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Prototype for concurrent operations limiting #385

Closed
wants to merge 1 commit into from

Conversation

nand4011
Copy link
Contributor

Modify cache Get and Get to add optional concurrent operations limiting. It works by creating an executor with a max number of threads equal to the limit. Get and Set calls are given to the executor and wait on the executor's internal queue until there is a thread free to take them, implicitly limiting the number of concurrent requests. The load generator limits concurrent requests in the same way.

Add a general ScsFutureStub method that takes a gRPC call, a gRPC to Momento response converter, and an error handler. Something like this should let us cut a lot of boilerplate out of the data client. It won't work for the batch call, since that uses a different type of stub.

Modify cache Get and Get to add optional concurrent operations limiting.
It works by creating an executor with a max number of threads equal to
the limit. Get and Set calls are given to the executor and wait on the
executor's internal queue until there is a thread free to take them,
implicitly limiting the number of concurrent requests.

Add a general ScsFutureStub method that takes a gRPC call, a gRPC to
Momento response converter, and an error handler. Something like this
should let us cut a lot of boilerplate out of the data client. It won't
work for the batch call, since that uses a different type of stub.
Copy link
Contributor

@cprice404 cprice404 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks like a promising pattern.

Can you update the PR description to leave a few notes on what you observed the behavior of the loadgen program to be like, if you configure the loadgen to do like 10k concurrent requests and then run it wthout this setting vs. running with this setting set to like 100 or 200?

And then also outline how much work you think would be left to bring this over the finish line.

@@ -3716,4 +3724,34 @@ private _DictionaryDeleteRequest.Some addSomeFieldsToRemove(@Nonnull List<ByteSt
public void close() {
scsDataGrpcStubsManager.close();
}

public static void main(String[] args) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i presume this was just for dev and would be removed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was. I left it in just to show a more simple example of how the thread pool limits the concurrent requests.

@@ -11,6 +12,8 @@ public class Configuration {
private final TransportStrategy transportStrategy;
private final RetryStrategy retryStrategy;

private final Integer concurrencyLimit;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we decide to push forward with this we should spend a minute thinking about whether this should be a top-level config setting or if it belongs on one of the inner objects. i don't have an opinion yet, just want to be deliberate about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@nand4011
Copy link
Contributor Author

I'm setting up another test run with the load generator to get more results. As for next steps:

  • Figure out the correct type of thread pool to use. ForkJoinPool doesn't seem appropriate now that I have had a deeper look at the documentation. I think either just a fixed thread pool or one that has a max and drains them over time. We don't need work stealing features.
  • Finalize the configuration. I added it to the top level here for convenience. The .NET SDK has it in the TransportStrategy.
  • Plan how we want to migrate to the new pattern. I can scan through for potentially troublesome calls other than batch get/set, but I think they should be generally be pretty easy to convert as long as we can convert them into () -> grpc stub, grpc response -> momento response, and exception -> error response sections.
    My first instinct is to finish out the scalar methods for the initial release and make tickets for the other data types.

@nand4011
Copy link
Contributor Author

I did a few runs of the Java load generator set to 5,000tps against a cache with a limit of 10,000tps. I ran it on a c6i.4xl. The concurrent request limit of the load generator was set to 10,000. Each run was 120 seconds.
No client concurrent limit:

Cumulative stats:
    total requests: 615424 (4024.77) tps, limited to 5000 tps
           success: 610507 (99.20)
server unavailable: 0 (0.00)
           timeout: 4916 (0.80)
    limit exceeded: 0 (0.00)
         cancelled: 0 (0.00)

Cumulative write latencies:
count: 277056
  min: 1.00
  p50: 1.99
  p90: 5.61
  p95: 8.63
  p96: 9.90
  p97: 11.77
  p98: 14.98
  p99: 21.45
p99.9: 44.60
  max: 61.08

Cumulative read latencies:
count: 272945
  min: 0.58
  p50: 1.23
  p90: 4.18
  p95: 6.93
  p96: 8.16
  p97: 9.97
  p98: 13.27
  p99: 19.99
p99.9: 43.91
  max: 60.52

Client concurrent requests limited to 100, using a ForkJoinPool:

Cumulative stats:
    total requests: 612918 (4999.86) tps, limited to 5000 tps
           success: 612919 (100.00)
server unavailable: 0 (0.00)
           timeout: 0 (0.00)
    limit exceeded: 0 (0.00)
         cancelled: 0 (0.00)

Cumulative write latencies:
count: 273442
  min: 0.99
  p50: 1.91
  p90: 4.92
  p95: 7.34
  p96: 8.37
  p97: 9.89
  p98: 12.46
  p99: 18.25
p99.9: 43.02
  max: 60.33

Cumulative read latencies:
count: 277892
  min: 0.57
  p50: 1.21
  p90: 3.92
  p95: 6.68
  p96: 8.05
  p97: 10.67
  p98: 15.75
  p99: 33.65
p99.9: 164.23
  max: 236.19

Client concurrent requests limited to 100, using a FixedThreadPool:

Cumulative stats:
    total requests: 612030 (4999.88) tps, limited to 5000 tps
           success: 612030 (100.00)
server unavailable: 0 (0.00)
           timeout: 0 (0.00)
    limit exceeded: 0 (0.00)
         cancelled: 0 (0.00)

Cumulative write latencies:
count: 271987
  min: 0.98
  p50: 1.91
  p90: 5.18
  p95: 7.93
  p96: 9.08
  p97: 10.85
  p98: 13.79
  p99: 19.56
p99.9: 44.27
  max: 61.21

Cumulative read latencies:
count: 278008
  min: 0.58
  p50: 1.19
  p90: 3.73
  p95: 6.04
  p96: 7.08
  p97: 8.65
  p98: 11.49
  p99: 17.61
p99.9: 42.99
  max: 59.28
Cumulative stats:
    total requests: 612274 (4999.83) tps, limited to 5000 tps
           success: 612274 (100.00)
server unavailable: 0 (0.00)
           timeout: 0 (0.00)
    limit exceeded: 0 (0.00)
         cancelled: 0 (0.00)

Cumulative write latencies:
count: 271785
  min: 1.10
  p50: 3.17
  p90: 8.91
  p95: 12.74
  p96: 14.22
  p97: 16.33
  p98: 19.58
  p99: 26.13
p99.9: 46.99
  max: 59.57

Cumulative read latencies:
count: 278208
  min: 0.68
  p50: 2.50
  p90: 7.93
  p95: 11.55
  p96: 12.92
  p97: 14.70
  p98: 17.60
  p99: 23.76
p99.9: 44.60
  max: 63.34

The biggest change is the lack of timeout errors on startup when using the client request limiting. Next is the difference in max read time between the ForkJoinPool and the FixedThreadPool. Thread creation is expensive, but I wouldn't expect it to be that expensive.
I wasn't able to configure the load generator as is to create requests that time out without a limiter but succeed with it. I suspect I would need higher throughput with meatier calls, and something other than one load generator thread per call.
Just for fun I tried 50,000 load generator concurrent calls using the client request limiter. It crashed with memory errors almost immediately, but did give a few momento timeout errors first, which was surprising.

@cprice404
Copy link
Contributor

sorry for leaving this hanging for so long. the results you posted look good to me, I just had one question about a few of your statements:

I did a few runs of the Java load generator set to 5,000tps against a cache with a limit of 10,000tps. I ran it on a c6i.4xl. The concurrent request limit of the load generator was set to 10,000. Each run was 120 seconds.

I'm not 100% sure which you set to 5k and which to 10k; I'm assuming:

  • cache limit 10k
  • load gen code 5k
  • CacheClient configuration 10k

In the output you show under that ^^, there are lots of timeouts (as I'd expect).

Then right after it you show the results with turning the CacheClient limiter down to much smaller/more reasonable values, and no more timeouts. Also as I'd hope/expect.

Then you say this:

I wasn't able to configure the load generator as is to create requests that time out without a limiter but succeed with it.

That part I'm not following. You don't have any data above for what happens if you do: load gen code 5k or 10k, CacheClient limiter disabled.

@nand4011 nand4011 closed this Oct 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants