[Feature request]: Determine policy which caused rejection #2345

lonix1 · 2024-10-14T03:55:28Z

Is your feature request related to a specific problem? Or an existing feature?

I've spent some time learning how Polly works, and specifically how to combine policies (which is probably a common need in a non-trivial production system).

I've found a pain point in how Polly handles rejections (excuse the pun).

Suppose one combines rate limiting and concurrency control, without queuing (or after reaching the queue limit), and there is a rejection:

if rate limited: read delay from RateLimiterRejectedException.RetryAfter then retry
if concurrency limited: return to caller with error

To do that one needs to know the source of the rejection.

Describe the solution you'd like

There are two workarounds, both bad:

If the RateLimiterRejectedException.RetryAfter is null, one can infer that it was concurrency limited (rather than fixed window, sliding window or token bucket). But this is a dangerous assumption as the library could add more policies in the future, or change it's internals.
One could use a variation of the roundabout approach shown here (though I'm not even certain it works... I couldn't get it to work). Assuming it works, it's very messy and overly complicated. I wouldn't want to maintain such code if I only ever touch Polly code once or twice a year.

The ideal solution is mentioned in that thread, which is to add a new property to the exception, which identifies the source of the rejection.

Perhaps RateLimiterRejectedException.Source could be a string, equal to a configurable Name property for the policy, or if unset then equal to policy.GetType().Name.

Additional context

Thank you for considering it!

The text was updated successfully, but these errors were encountered:

peter-csala · 2024-10-14T08:13:22Z

Just to clarify: which Polly version are we talking about? The RateLimiterRejectedException was introduced in the V8 API. V7 API uses policies whereas the V8 uses strategies. I educated guess is that you are using the V8 API, can you please confirm it?

lonix1 · 2024-10-14T08:23:14Z

Sorry. I've read docs, blog posts and SO threads for both v7 and v8 - so I guess I've been using those term interchangeably.

I'm using Polly v8.

peter-csala · 2024-10-14T09:26:02Z

Could you please describe the desired behavior in a bit more detail (how many strategies are chain, in what order, how are they configured, etc.)? Or you could also share an ideal pipeline setup.

lonix1 · 2024-10-14T10:14:14Z

Suppose I'm making requests to an external API.

Pipeline example, in order

concurrency limiter: no queuing, one thread ("permit")
chained rate limiter (fixed window per-day and fixed window per-second)

Init:

private const int LIMIT_THREADS    = 1;
private const int LIMIT_PER_SECOND = 10;
private const int LIMIT_PER_DAY    = 1_000;


private readonly PartitionedRateLimiter<ResilienceContext> _rateLimiterConcurrent;
private readonly PartitionedRateLimiter<ResilienceContext> _rateLimiterPerSecond;
private readonly PartitionedRateLimiter<ResilienceContext> _rateLimiterPerDay;
private readonly PartitionedRateLimiter<ResilienceContext> _rateLimiterChained;
private readonly ResiliencePipeline _resiliencePipeline;


public void Dispose() {
  _rateLimiterConcurrent.Dispose();
  _rateLimiterPerSecond.Dispose();
  _rateLimiterPerDay.Dispose();
  _rateLimiterChained.Dispose();
}


public void InitPolly() {

  var partitionKey = "aws-ses";

  _rateLimiterConcurrent = PartitionedRateLimiter.Create<ResilienceContext, string>(context =>
    RateLimitPartition.GetConcurrencyLimiter(
      partitionKey,
      partitionKey => new ConcurrencyLimiterOptions {
        PermitLimit = LIMIT_THREADS,
        QueueLimit  = 0,
    })
  );

  _rateLimiterPerSecond = PartitionedRateLimiter.Create<ResilienceContext, string>(context =>
    RateLimitPartition.GetFixedWindowLimiter(
      partitionKey,
      partitionKey => new FixedWindowRateLimiterOptions {
        PermitLimit = LIMIT_PER_SECOND,
        QueueLimit  = 0,
        Window      = TimeSpan.FromSeconds(1),
    })
  );
  _rateLimiterPerDay = PartitionedRateLimiter.Create<ResilienceContext, string>(context =>
    RateLimitPartition.GetFixedWindowLimiter(
      partitionKey,
      partitionKey => new FixedWindowRateLimiterOptions {
        PermitLimit = LIMIT_PER_DAY,
        QueueLimit  = 0,
        Window      = TimeSpan.FromDays(1),
    })
  );
  _rateLimiterChained = PartitionedRateLimiter.CreateChained(_rateLimiterPerSecond, _rateLimiterPerDay);

  _resiliencePipeline = new ResiliencePipelineBuilder()
    // outer strategy: limit threads
    .AddRateLimiter(new RateLimiterStrategyOptions {
      RateLimiter = args => _rateLimiterConcurrent.AcquireAsync(args.Context, permitCount:1, args.Context.CancellationToken),
    })
    // inner strategy: limit requests (per second and per day)
    .AddRateLimiter(new RateLimiterStrategyOptions {
      RateLimiter = args => _rateLimiterChained.AcquireAsync(args.Context, permitCount:1, args.Context.CancellationToken),
    })
    .Build();
}

Execution:

public async Task ScheduleRequests(IEnumerable<Request> requests, CancellationToken cancellationToken) {

  var pendingRequests = requests.ToList();

  while (pendingRequests.Any()) {

    try {
      var request = pendingRequests.First();
      await _resiliencePipeline.ExecuteAsync(
        cancellationTokenInner => PerformRequest(request, cancellationTokenInner),
        cancellationToken);
      pendingRequests.Remove(request);
    }

    catch (RateLimiterRejectedException e) {
      // rejected by rate limiter
      if (e.RetryAfter is TimeSpan retryAfter) {
        Console.WriteLine($"Throttled; retry in {retryAfter}...");
        await Task.Delay(retryAfter);
      }
      // rejected by concurrency limiter
      else if (e.Source == "ConcurrencyLimiter") {              // <----------
        throw new InvalidOperationException("Rejected: too many concurrent attempts.");
      }
      // other rejection (unsure if even possible?)
      else {
        throw new InvalidOperationException("Rejected.");
      }
    }

  }
}


public async Task PerformRequest(Request request, CancellationToken cancellationToken) {
  // call external API...
}

I don't know if that's ideal; it's just a quick example. The idea is that in the catch block there is some way to know which limiter rejected the request.

peter-csala · 2024-10-14T12:58:52Z

Well, as far as I know there is only a single place where Polly throws RateLimiterRejectedException:

Polly/src/Polly.RateLimiting/RateLimiterResilienceStrategy.cs

Line 68 in d36058d

    
           var exception = retryAfter.HasValue ? new RateLimiterRejectedException(retryAfter.Value) : new RateLimiterRejectedException();

I think we could extend the RLRE with a Source property and populate it similarly like the Telemetry event's Source:

Resilience event occurred. EventName: 'OnRateLimiterRejected', Source: '(null)/(null)/RateLimiter', Operation Key: '', Result: ''
Resilience event occurred. EventName: 'OnRateLimiterRejected', Source: 'MyPipeline/MyPipelineInstance/MyRateLimiterStrategy', Operation Key: 'MyRateLimitedOperation', Result: ''

Polly/src/Polly.Extensions/Telemetry/Log.cs

Lines 26 to 35 in d36058d

    
           public static partial void ResilienceEvent( 
        
               this ILogger logger, 
        
               LogLevel logLevel, 
        
               string eventName, 
        
               string pipelineName, 
        
               string pipelineInstance, 
        
               string? strategyName, 
        
               string? operationKey, 
        
               object? result, 
        
               Exception? exception);

Polly/src/Polly.Extensions/Telemetry/TelemetryListenerImpl.cs

Lines 211 to 219 in d36058d

    
           _logger.ResilienceEvent( 
        
               level, 
        
               args.Event.EventName, 
        
               args.Source.PipelineName.GetValueOrPlaceholder(), 
        
               args.Source.PipelineInstanceName.GetValueOrPlaceholder(), 
        
               args.Source.StrategyName.GetValueOrPlaceholder(), 
        
               args.Context.OperationKey, 
        
               result, 
        
               args.Outcome?.Exception);

That would require minimal code code on your pipeline setup

_resiliencePipeline = new ResiliencePipelineBuilder()
    // outer strategy: limit threads
    .AddRateLimiter(new RateLimiterStrategyOptions {
      Name = "ThreadLimiter"
      RateLimiter = args => _rateLimiterConcurrent.AcquireAsync(args.Context, permitCount:1, args.Context.CancellationToken),
    })
    // inner strategy: limit requests (per second and per day)
    .AddRateLimiter(new RateLimiterStrategyOptions {
      Name = "RequestLimiter"
      RateLimiter = args => _rateLimiterChained.AcquireAsync(args.Context, permitCount:1, args.Context.CancellationToken),
    })
    .Build();

With this in your hand, the RateLimiterRejectedException catch block could branch based on the Source with Contains/EndsWith.

Does it sound good for you?

lonix1 · 2024-10-14T14:17:26Z

To be honest, I haven't used the telemetry stuff yet, so I can't comment on that. I hope someone with more advanced Polly experience can provide feedback on that.

But regarding your final code block: if I understand correctly, one can name each limiter, and then detect that name later in the catch block... PERFECT.

Unrelated side issue: is it correct to have the concurrency or rate limiter first?

peter-csala · 2024-10-14T16:00:23Z

@lonix1 Do you want to give it a try and file a PR?

Unrelated side issue: is it correct to have the concurrency or rate limiter first?

The ordering should not matter:

rate limiter controls inbound load
concurrency limiter controls outbound load

They are both proactive strategies. If they were reactive then the outer's ShouldHandle could be adjusted to handle the inner's thrown exception as well. But that's not the case here.

martintmk · 2024-10-14T18:21:38Z

The proposal looks good to be. The only thing I suggest is too actually use the following class as a source property:

https://github.com/App-vNext/Polly/blob/main/src/Polly.Core/Telemetry/ResilienceTelemetrySource.cs

Basically, to uniquely identify strategy you also need pipeline name and pipeline instance name.

lonix1 · 2024-10-15T00:09:08Z

PR: I don't know... I'm still new to Polly and there's many things I don't yet grok.

Ordering: I hope I'm not about to derail this thread... Actually I am using both for outgoing load, for requests to an external API. I assumed the order matters in that case (excluding the scenario of unhandled exceptions).

peter-csala · 2024-10-15T07:02:45Z

PR: I don't know... I'm still new to Polly and there's many things I don't yet grok.

Sure, no problem. @martincostello could you please assign to me this issue?

lonix1 added the feature suggestion label Oct 14, 2024

martincostello assigned peter-csala Oct 15, 2024

peter-csala mentioned this issue Oct 15, 2024

Add TelemetrySource to RateLimiterRejectedException #2346

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request]: Determine policy which caused rejection #2345

[Feature request]: Determine policy which caused rejection #2345

lonix1 commented Oct 14, 2024

peter-csala commented Oct 14, 2024

lonix1 commented Oct 14, 2024 •

edited

Loading

peter-csala commented Oct 14, 2024

lonix1 commented Oct 14, 2024

peter-csala commented Oct 14, 2024 •

edited

Loading

lonix1 commented Oct 14, 2024

peter-csala commented Oct 14, 2024

martintmk commented Oct 14, 2024

lonix1 commented Oct 15, 2024

peter-csala commented Oct 15, 2024

[Feature request]: Determine policy which caused rejection #2345

[Feature request]: Determine policy which caused rejection #2345

Comments

lonix1 commented Oct 14, 2024

Is your feature request related to a specific problem? Or an existing feature?

Describe the solution you'd like

Additional context

peter-csala commented Oct 14, 2024

lonix1 commented Oct 14, 2024 • edited Loading

peter-csala commented Oct 14, 2024

lonix1 commented Oct 14, 2024

peter-csala commented Oct 14, 2024 • edited Loading

lonix1 commented Oct 14, 2024

peter-csala commented Oct 14, 2024

martintmk commented Oct 14, 2024

lonix1 commented Oct 15, 2024

peter-csala commented Oct 15, 2024

lonix1 commented Oct 14, 2024 •

edited

Loading

peter-csala commented Oct 14, 2024 •

edited

Loading