feat: Prototype for concurrent operations limiting #385

nand4011 · 2024-08-12T02:35:51Z

Modify cache Get and Get to add optional concurrent operations limiting. It works by creating an executor with a max number of threads equal to the limit. Get and Set calls are given to the executor and wait on the executor's internal queue until there is a thread free to take them, implicitly limiting the number of concurrent requests. The load generator limits concurrent requests in the same way.

Add a general ScsFutureStub method that takes a gRPC call, a gRPC to Momento response converter, and an error handler. Something like this should let us cut a lot of boilerplate out of the data client. It won't work for the batch call, since that uses a different type of stub.

Modify cache Get and Get to add optional concurrent operations limiting. It works by creating an executor with a max number of threads equal to the limit. Get and Set calls are given to the executor and wait on the executor's internal queue until there is a thread free to take them, implicitly limiting the number of concurrent requests. Add a general ScsFutureStub method that takes a gRPC call, a gRPC to Momento response converter, and an error handler. Something like this should let us cut a lot of boilerplate out of the data client. It won't work for the batch call, since that uses a different type of stub.

cprice404

Overall this looks like a promising pattern.

Can you update the PR description to leave a few notes on what you observed the behavior of the loadgen program to be like, if you configure the loadgen to do like 10k concurrent requests and then run it wthout this setting vs. running with this setting set to like 100 or 200?

And then also outline how much work you think would be left to bring this over the finish line.

cprice404 · 2024-08-27T20:11:46Z

momento-sdk/src/main/java/momento/sdk/ScsDataClient.java

@@ -3716,4 +3724,34 @@ private _DictionaryDeleteRequest.Some addSomeFieldsToRemove(@Nonnull List<ByteSt
  public void close() {
    scsDataGrpcStubsManager.close();
  }
+
+  public static void main(String[] args) {


i presume this was just for dev and would be removed

It was. I left it in just to show a more simple example of how the thread pool limits the concurrent requests.

cprice404 · 2024-08-27T20:12:54Z

momento-sdk/src/main/java/momento/sdk/config/Configuration.java

@@ -11,6 +12,8 @@ public class Configuration {
  private final TransportStrategy transportStrategy;
  private final RetryStrategy retryStrategy;

+  private final Integer concurrencyLimit;


if we decide to push forward with this we should spend a minute thinking about whether this should be a top-level config setting or if it belongs on one of the inner objects. i don't have an opinion yet, just want to be deliberate about it.

nand4011 · 2024-08-28T23:26:18Z

I'm setting up another test run with the load generator to get more results. As for next steps:

Figure out the correct type of thread pool to use. ForkJoinPool doesn't seem appropriate now that I have had a deeper look at the documentation. I think either just a fixed thread pool or one that has a max and drains them over time. We don't need work stealing features.
Finalize the configuration. I added it to the top level here for convenience. The .NET SDK has it in the TransportStrategy.
Plan how we want to migrate to the new pattern. I can scan through for potentially troublesome calls other than batch get/set, but I think they should be generally be pretty easy to convert as long as we can convert them into () -> grpc stub, grpc response -> momento response, and exception -> error response sections.
My first instinct is to finish out the scalar methods for the initial release and make tickets for the other data types.

nand4011 · 2024-08-31T00:21:42Z

I did a few runs of the Java load generator set to 5,000tps against a cache with a limit of 10,000tps. I ran it on a c6i.4xl. The concurrent request limit of the load generator was set to 10,000. Each run was 120 seconds.
No client concurrent limit:

Cumulative stats:
    total requests: 615424 (4024.77) tps, limited to 5000 tps
           success: 610507 (99.20)
server unavailable: 0 (0.00)
           timeout: 4916 (0.80)
    limit exceeded: 0 (0.00)
         cancelled: 0 (0.00)

Cumulative write latencies:
count: 277056
  min: 1.00
  p50: 1.99
  p90: 5.61
  p95: 8.63
  p96: 9.90
  p97: 11.77
  p98: 14.98
  p99: 21.45
p99.9: 44.60
  max: 61.08

Cumulative read latencies:
count: 272945
  min: 0.58
  p50: 1.23
  p90: 4.18
  p95: 6.93
  p96: 8.16
  p97: 9.97
  p98: 13.27
  p99: 19.99
p99.9: 43.91
  max: 60.52

Client concurrent requests limited to 100, using a ForkJoinPool:

Cumulative stats:
    total requests: 612918 (4999.86) tps, limited to 5000 tps
           success: 612919 (100.00)
server unavailable: 0 (0.00)
           timeout: 0 (0.00)
    limit exceeded: 0 (0.00)
         cancelled: 0 (0.00)

Cumulative write latencies:
count: 273442
  min: 0.99
  p50: 1.91
  p90: 4.92
  p95: 7.34
  p96: 8.37
  p97: 9.89
  p98: 12.46
  p99: 18.25
p99.9: 43.02
  max: 60.33

Cumulative read latencies:
count: 277892
  min: 0.57
  p50: 1.21
  p90: 3.92
  p95: 6.68
  p96: 8.05
  p97: 10.67
  p98: 15.75
  p99: 33.65
p99.9: 164.23
  max: 236.19

Client concurrent requests limited to 100, using a FixedThreadPool:

Cumulative stats:
    total requests: 612030 (4999.88) tps, limited to 5000 tps
           success: 612030 (100.00)
server unavailable: 0 (0.00)
           timeout: 0 (0.00)
    limit exceeded: 0 (0.00)
         cancelled: 0 (0.00)

Cumulative write latencies:
count: 271987
  min: 0.98
  p50: 1.91
  p90: 5.18
  p95: 7.93
  p96: 9.08
  p97: 10.85
  p98: 13.79
  p99: 19.56
p99.9: 44.27
  max: 61.21

Cumulative read latencies:
count: 278008
  min: 0.58
  p50: 1.19
  p90: 3.73
  p95: 6.04
  p96: 7.08
  p97: 8.65
  p98: 11.49
  p99: 17.61
p99.9: 42.99
  max: 59.28

Cumulative stats:
    total requests: 612274 (4999.83) tps, limited to 5000 tps
           success: 612274 (100.00)
server unavailable: 0 (0.00)
           timeout: 0 (0.00)
    limit exceeded: 0 (0.00)
         cancelled: 0 (0.00)

Cumulative write latencies:
count: 271785
  min: 1.10
  p50: 3.17
  p90: 8.91
  p95: 12.74
  p96: 14.22
  p97: 16.33
  p98: 19.58
  p99: 26.13
p99.9: 46.99
  max: 59.57

Cumulative read latencies:
count: 278208
  min: 0.68
  p50: 2.50
  p90: 7.93
  p95: 11.55
  p96: 12.92
  p97: 14.70
  p98: 17.60
  p99: 23.76
p99.9: 44.60
  max: 63.34

The biggest change is the lack of timeout errors on startup when using the client request limiting. Next is the difference in max read time between the ForkJoinPool and the FixedThreadPool. Thread creation is expensive, but I wouldn't expect it to be that expensive.
I wasn't able to configure the load generator as is to create requests that time out without a limiter but succeed with it. I suspect I would need higher throughput with meatier calls, and something other than one load generator thread per call.
Just for fun I tried 50,000 load generator concurrent calls using the client request limiter. It crashed with memory errors almost immediately, but did give a few momento timeout errors first, which was surprising.

cprice404 · 2024-09-24T16:16:17Z

sorry for leaving this hanging for so long. the results you posted look good to me, I just had one question about a few of your statements:

I did a few runs of the Java load generator set to 5,000tps against a cache with a limit of 10,000tps. I ran it on a c6i.4xl. The concurrent request limit of the load generator was set to 10,000. Each run was 120 seconds.

I'm not 100% sure which you set to 5k and which to 10k; I'm assuming:

cache limit 10k
load gen code 5k
CacheClient configuration 10k

In the output you show under that ^^, there are lots of timeouts (as I'd expect).

Then right after it you show the results with turning the CacheClient limiter down to much smaller/more reasonable values, and no more timeouts. Also as I'd hope/expect.

Then you say this:

I wasn't able to configure the load generator as is to create requests that time out without a limiter but succeed with it.

That part I'm not following. You don't have any data above for what happens if you do: load gen code 5k or 10k, CacheClient limiter disabled.

nand4011 added the do not merge label Aug 12, 2024

nand4011 requested a review from cprice404 August 13, 2024 21:07

cprice404 reviewed Aug 27, 2024

View reviewed changes

nand4011 requested a review from cprice404 September 9, 2024 18:20

nand4011 mentioned this pull request Oct 2, 2024

feat: Concurrency limiter #395

Merged

nand4011 closed this Oct 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Prototype for concurrent operations limiting #385

feat: Prototype for concurrent operations limiting #385

nand4011 commented Aug 12, 2024

cprice404 left a comment

cprice404 Aug 27, 2024

nand4011 Aug 28, 2024

cprice404 Aug 27, 2024

nand4011 Aug 28, 2024

nand4011 commented Aug 28, 2024

nand4011 commented Aug 31, 2024

cprice404 commented Sep 24, 2024

feat: Prototype for concurrent operations limiting #385

feat: Prototype for concurrent operations limiting #385

Conversation

nand4011 commented Aug 12, 2024

cprice404 left a comment

Choose a reason for hiding this comment

cprice404 Aug 27, 2024

Choose a reason for hiding this comment

nand4011 Aug 28, 2024

Choose a reason for hiding this comment

cprice404 Aug 27, 2024

Choose a reason for hiding this comment

nand4011 Aug 28, 2024

Choose a reason for hiding this comment

nand4011 commented Aug 28, 2024

nand4011 commented Aug 31, 2024

cprice404 commented Sep 24, 2024