add support for setting a larger request body for some endpoints #618

sunshowers · 2023-03-21T06:27:51Z

This is the other half of RFD 353. Add support for setting a larger
request body for some endpoints. This is particularly useful with
StreamingBody, for which we'll want to support endpoints of the order
of several hundred megabytes or even larger.

This PR is pretty straightforward -- mostly tests.

For now we only support literals, but it would be nice in the future to
support constants and function calls as well. For now I've added a
failing trybuild test with a note that maybe this can be successful in
the future.

Depends on #617.

Created using spr 1.3.4

sunshowers · 2023-03-21T06:28:47Z

Note that this is a breaking change. I've added #[non_exhaustive] everywhere so that this is the last time something like this would be a breaking change.

Side note: I think request_body_max_bytes should be u64 everywhere. Happy to make this change in the future if we're making breaking changes anyway.

Created using spr 1.3.4

davepacheco

Cool!

One thing I didn't consider until looking at this: the server-wide limit request_body_max_bytes is now misleadingly-named, since it is not the max request body size in bytes. It would be better called default_request_body_max_bytes. (I'm imagining an administrator who's using big hammers to lock things down and just wants to set the server-wide max body size to something really low. Maybe they're trying to mitigate some recent vulnerability involving large or complicated payloads. (This feels awfully familiar to me.) They might be surprised (and maybe irritated) to discover that doesn't do that.)

The obvious thing would be to just rename the server-wide tunable to avoid creating the wrong expectation. That's okay but would break people silently. (e.g., they might have set some high value in their config, but that field in their toml file will wind up ignored now and they'll get the lower default, and they won't notice until at runtime they receive a large input and it fails).

We could do that and implement a server-wide request_body_max_bytes that acts as an absolute cap -- overriding any per-endpoint values. That seems useful for the administrator trying to lock things down. We could even fail at endpoint registration if you try to register an endpoint with a cap above the server-wide cap. The only risk here is that someone wanting to use the per-endpoint limit sees this error at registration time. But they can just default_request_body_max_size to whatever request_body_max_size is and then set request_body_max_size to the value they're using on their endpoint (or None?). And they're by definition only starting to use this feature so it's not breaking anybody.

What do you all think? I think this wouldn't be that much work?

davepacheco · 2023-03-23T21:12:53Z

dropshot/src/handler.rs

+    /// Maximum request body size: typically the same as
+    /// [`server.config.request_body_max_bytes`], but can be overridden for an
+    /// individual request
+    pub request_body_max_bytes: usize,


What do you think about making this a method instead? This feels pretty similar to RequestContext::page_limit(). In both cases we want to provide the "appropriate" value for some tunable based on some non-trivial policy related to multiple places it might be set.

So I think we still need to store the value in the RequestContext somewhere -- what about storing it in RequestInfo? Then this could be a method rqctx.request.body_max_bytes(), or rqctx.request.request_body_max_bytes().

Or do you have a way to avoid storing the value on the request? I couldn't find an easy answer to this.

Sorry, I should have been clearer -- I'm fine with storing it there. I'm suggesting it not be pub and instead provide a method to access it (that takes into account the various caps).

sunshowers · 2023-03-24T00:34:17Z

We could do that and implement a server-wide request_body_max_bytes that acts as an absolute cap -- overriding any per-endpoint values.

I really like this idea. What about this definition in the config:

request_body_max_bytes = { default = 1024, max = 4096 }

default is required
max is optional
if not specified as a map and just a scalar request_body_max_bytes = 2048, sets both default and max to that limit. This maintains backwards compatibility in what I think is the correct way.

I've used this strategy in nextest a few times, e.g. https://nexte.st/book/slow-tests.html#configuring-timeouts. Nextest originally just supported setting a scalar duration, but grew to support many other features e.g. termination, backoff/jitter, etc.

davepacheco · 2023-03-24T16:35:36Z

We could do that and implement a server-wide request_body_max_bytes that acts as an absolute cap -- overriding any per-endpoint values.

I really like this idea. What about this definition in the config:
request_body_max_bytes = { default = 1024, max = 4096 }
* `default` is required

* `max` is optional

* if not specified as a map and just a scalar `request_body_max_bytes = 2048`, sets both `default` and `max` to that limit. This maintains backwards compatibility in what I think is the correct way.
I've used this strategy in nextest a few times, e.g. https://nexte.st/book/slow-tests.html#configuring-timeouts. Nextest originally just supported setting a scalar duration, but grew to support many other features e.g. termination, backoff/jitter, etc.

Just to spell it out, we've got:

an optional per-endpoint max
a required server-wide default
an optional server-wide max

The max for an endpoint is the lesser of:

the server-wide max, if specified
the per-endpoint cap, if specified, or the server-wide default if not

I think that makes sense. The only thing I'm wondering is if the server-wide max should be the one that's required, and that we use if you provided a scalar. I'm not sure it makes a difference?

Per RFD 353, add support for a new extractor that can provide data as a stream of Bytes chunks. This means that large request payloads won't have to buffer everything into a single, contiguous chunk of memory. Also add some examples showing how to use the extractor to operate on chunks in a streaming fashion. (There are also other potential examples we can add here, such as compressing and/or decompressing data on the fly). I haven't implemented those here. Finally, switch over the other extractors to be implemented on top of `StreamingBody`. In effect, the other extractors just provide a differently-shaped API on top of `StreamingBody`. This PR doesn't implement support for the other determination in RFD 353, which is to allow some endpoints' max payload sizes to be set to a higher limit. That's in #618.

sunshowers · 2023-03-24T17:26:45Z

The max for an endpoint is the lesser of:
the server-wide max, if specified
the per-endpoint cap, if specified, or the server-wide default if not

I think we need a way of communicating if the server max is less than an endpoint's specified limit. An error sounds good in theory but would admins want to be able to force a lower limit in emergencies?

If so, I think there's a tension between normal use, where it's surprising if an endpoint accepts a smaller request size than is specified, and emergency use. Maybe this suggests a force-max = true setting.

The only thing I'm wondering is if the server-wide max should be the one that's required, and that we use if you provided a scalar.

The only thing I'm wondering is if the server-wide max should be the one that's required

Interesting -- I guess I was thinking of the main use case for per-request limits being to exceed the server-wide limit. What this would mean is that if you want to bump up a limit past the currently configured maximum, you'll also (always) have to edit the config file.

There might be another tension here between developers and administrators. In general I'm thinking of the default as almost always being lower (and in many cases several orders of magnitude lower) than the limits set for particular requests.

In this case if there's a need to use per-request limits, people would have to switch very quickly to the map definition. We'd want to communicate this clearly.

and that we use if you provided a scalar.

I think these two cases:

A scalar value sets both the default and the max.
A scalar value sets just the max.

are completely equivalent.

davepacheco · 2024-08-28T21:22:02Z

Sorry this totally fell off my radar. I realize we've moved on to other things but at some point we'll probably come back to this.

If these are the semantics:

The max for an endpoint is the lesser of:

the server-wide max, if specified

the per-endpoint cap, if specified, or the server-wide default if not

then why would anyone ever specify the server-wide max? I introduced that cap in #618 (review) as a way to avoid breaking people's configs. Changing to a map would solve that problem. The only reason I could see wanting to set the max is if the per-endpoint cap became unsuitable (e.g., due to some security issue) and you didn't want to rebuild. That's the "emergency" situation we've been talking about. This would be a fine feature but I'm not convinced it's worth supporting right now. I brought it up originally not because I thought it was important to have a flag for that, but that an administrator might think the flag we already have did do that.

My inclination now is to just rename request_body_max_bytes to default_request_body_max_bytes and document it as a breaking change (which is easy to get past). We can still do all this other stuff later if we want.

[𝘀𝗽𝗿] initial version

86620fe

Created using spr 1.3.4

sunshowers requested review from davepacheco and ahl March 21, 2023 06:27

Add non_exhaustive to more spots

fe43d86

Created using spr 1.3.4

This was referenced Mar 21, 2023

Add support for a new StreamingBody extractor #617

Merged

add a test to ensure that chunked encoding obeys request size limits #620

Open

simplify

6deadef

Created using spr 1.3.4

davepacheco reviewed Mar 23, 2023

View reviewed changes

david-crespo mentioned this pull request Apr 25, 2023

Consider: use StreamingBody for image upload oxidecomputer/omicron#2923

Closed

david-crespo mentioned this pull request May 8, 2023

Image upload times out after an hour (401 response) oxidecomputer/console#1464

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add support for setting a larger request body for some endpoints #618

add support for setting a larger request body for some endpoints #618

sunshowers commented Mar 21, 2023

sunshowers commented Mar 21, 2023 •

edited

Loading

davepacheco left a comment

davepacheco Mar 23, 2023

sunshowers Mar 24, 2023

davepacheco Mar 24, 2023

sunshowers commented Mar 24, 2023 •

edited

Loading

davepacheco commented Mar 24, 2023

sunshowers commented Mar 24, 2023

davepacheco commented Aug 28, 2024

add support for setting a larger request body for some endpoints #618

Are you sure you want to change the base?

add support for setting a larger request body for some endpoints #618

Conversation

sunshowers commented Mar 21, 2023

sunshowers commented Mar 21, 2023 • edited Loading

davepacheco left a comment

Choose a reason for hiding this comment

davepacheco Mar 23, 2023

Choose a reason for hiding this comment

sunshowers Mar 24, 2023

Choose a reason for hiding this comment

davepacheco Mar 24, 2023

Choose a reason for hiding this comment

sunshowers commented Mar 24, 2023 • edited Loading

davepacheco commented Mar 24, 2023

sunshowers commented Mar 24, 2023

davepacheco commented Aug 28, 2024

sunshowers commented Mar 21, 2023 •

edited

Loading

sunshowers commented Mar 24, 2023 •

edited

Loading