Add Tus-Min/Max-Chunk-Size headers #93

Acconut · 2016-08-30T13:23:35Z

See #89 for the underlying discussion.

See #89 for the underlying discussion

kvz · 2016-08-30T13:34:40Z

protocol.md

+on how much data can be stored in a single take. The Client SHOULD use this number to
+determine in conjunction with other factors, such as the available bandwidth and the
+`Tus-Max-Chunk-Size` header, how much data SHOULD be transfered in one request. Furthermore,
+this header's value MAY represent a hard limit but also MAY be a recommendation. The Server


MAY represent a hard limit but also MAY be a recommendation is there a way we can be less ambiguous? As an implementor, I would be confused what to do here. Could the protocol make a recommendation, e.g. via SHOULD/but MAY

hard limit, but also MAY

kvz · 2016-08-30T13:36:25Z

As part of this pr, I think adding @janko-m as a contributor and a feature version bump could be cool

AJvanLoon · 2016-08-30T15:10:46Z

protocol.md

+#### Tus-Min-Chunk-Size
+
+The `Tus-Min-Chunk-Size` response header MUST be a non-negative integer indicating the
+recommended minimum size of a single chunk uploaded in a single `PATCH` request in bytes.


I would place "in bytes" more to the front: indicating the recommended minimum size in bytes of a single chunk... Remove 'in bytes' at the end.

janko · 2016-08-30T18:29:50Z

@Acconut Great work! 👍

I just wanted to briefly discuss an option of making this simpler (maybe you already discussed this with @kvz). Why not make these two headers hard limits for the client, instead of recommendations? So, something like "If the Server specifies Tus-Max-Chunk-Size, the Client MUST NOT send a chunk that is larger than Tus-Max-Chunk-Size" (and the equivalent for Tus-Min-Chunk-Size).

Because the Client currently cannot know if these are just recommendations or hard limits, so I feel like it should treat them as hard limits anyway just in case. The client could theoretically be implemented in the way that, if the bandwidth is good/bad, it tries to upload a chunk outside of a limit, and if the chunk fails it realizes that this limit is a hard limit, and next time it doesn't try. However, in tus-ruby-server I can let the client know that they've hit the limit only after the chunk has already been uploaded, so depending on the chunk size this could bring a noticeable slowdown for upload start.

What do you think about this?

@kvz Thanks! ❤️

Acconut · 2016-08-31T21:43:25Z

@kvz: As part of this pr, I think adding @janko-m as a contributor and a feature version bump could be cool

This will not be done in this PR, but in the one for the 1.1 release.

@janko-m: Great work!

Thank you for initiation this change :)

@janko-m: Why not make these two headers hard limits for the client, instead of recommendations? So, something like "If the Server specifies Tus-Max-Chunk-Size, the Client MUST NOT send a chunk that is larger than Tus-Max-Chunk-Size" (and the equivalent for Tus-Min-Chunk-Size).

I dislike this idea because this would push to much power to the Server. As we seemed to agree in #89, the Client should always have the last word in determining how big a single chunk will be and, in my opinion, this also includes having the ability and right to choose a size which is outside of the numbers outlined by the Tus-Min/Max-Chunk-Size headers. Such a scenario probably will not appear often in reality but the may turn up, for example when the bandwidth is just too bad or the device's memory is too small.

The client could theoretically be implemented in the way that, if the bandwidth is good/bad, it tries to upload a chunk outside of a limit, and if the chunk fails it realizes that this limit is a hard limit, and next time it doesn't try.

Of course, Clients could include this functionality but I am not pleased to push authors towards implementing Clients which violate the specification, if that's avoidable.

Because the Client currently cannot know if these are just recommendations or hard limits, so I feel like it should treat them as hard limits anyway just in case.

I agree, that we should include a statement which recommends to use these numbers as hard limits, if possible. Please refer to my two example for above for cases in which to may not be able to do.

However, in tus-ruby-server I can let the client know that they've hit the limit only after the chunk has already been uploaded, so depending on the chunk size this could bring a noticeable slowdown for upload start.

If the Client chooses a chunk size which is outside of the recommended range, it should be prepared to see a rejection for them. Furthermore, this is a known issue in the HTTP architecture and is also reason why they introduced the Expect: 100-continue request header and the 100 Continue status code. The proper use of these allows the Server to tell the Client whether it's going to accept a request based on the request's header without receiving the body. This should allow you to check the bodies' size before transferring the chunk.

I am looking forward to your feedback. Also, @kvz and @AJvanLoon, thank you a lot for your thoughts, I will address them! ❤️

janko · 2016-09-01T04:46:06Z

I dislike this idea because this would push to much power to the Server. As we seemed to agree in #89, the Client should always have the last word in determining how big a single chunk will be and, in my opinion, this also includes having the ability and right to choose a size which is outside of the numbers outlined by the Tus-Min/Max-Chunk-Size headers. Such a scenario probably will not appear often in reality but the may turn up, for example when the bandwidth is just too bad or the device's memory is too small.

You're right, I agree.

I agree, that we should include a statement which recommends to use these numbers as hard limits, if possible.

That would be perfect 👍

kvz · 2016-09-01T08:10:43Z

the Client should always have the last word in determining how big a single chunk will be and, in my opinion, this also includes having the ability and right to choose a size which is outside of the numbers outlined by the Tus-Min/Max-Chunk-Size headers

I'm not 100% sure about this one. I could imagine cases where the Server knows upper or lower bound limits to the chunksize it can accept. I think the Client should have all the freedom to estimate best chunksize within this range. But what good would it do leave the Client the freedom to overshoot the Server's boundaries? The protocol then allows for more errors to happen than would have been the case when making the client respect its ranges always. Let the record show that the server could have very wide ranges by default. Even say from 0 to -1, meaning unlimited. But when the Server author decides to limit it for whatever practicality, it would be nice to know that all clients will adhere to these 'speedlimits', while still having the freedom to go lower or higher than average.

Acconut · 2016-09-01T08:45:06Z

But what good would it do leave the Client the freedom to overshoot the Server's boundaries?

Servers will always need a good protection against these situation, regardless of whether they come from a misbehaving Client or a malicious DDoS attack. The Server is allowed to reject chunks which fall outside of the recommended size and Clients should be aware of this fact. They may achieve this partially using the Expect: 100-continue request header and the 100 Continue status code.

Furthermore, I think in most cases, the device running the Client will be more limited in terms of available hardware than the Server. Especially if you look at smartphones, embedded devices or micro computers running in rural areas in Africa, which we saw experimenting with tus some time ago. Therefore, the Client will usually be the entity which slows the upload down, I assume.

But when the Server author decides to limit it for whatever practicality, it would be nice to know that all clients will adhere to these 'speedlimits',

I agree, that's why we should include a statement which recommends to use these numbers as hard limits, if possible. Please see me comment above for more context.

while still having the freedom to go lower or higher than average.

Could you please explain what you mean with average? Do you refer to the recommended upper and lower chunk sizes?

kvz · 2016-09-01T09:07:41Z

DDoS-ing aside, for which I feel protection has limited place inside tusd, but rather in the HAProxy or network layers above it,

The Server is allowed to reject chunks which fall outside of the recommended size and Clients should be aware of this fact.

How will the client know what the boundaries are if the protocol suggests the client may overshoot them.

recommends to use these numbers as hard limits, if possible

I feel this is ambiguous. The client should treat the server upper & lower bounds as hard limits that it is not allowed to overshoot. The server is free to set these values to actual hard limits, or to a vague estimation of them. It's just nice to set the range that clients will try from the serverside.

I know that in Africa phones are low on bandwidth and all that, the case i think is more that when servicing a million devices, you can squeeze more performance out of a single tusd server if you can control the range in which they set their chunks. If you just provide a vague recommendation, chances are half of your devices are just going to ignore it, since the protocol isn't tough on this, meaning you have the choice of either erroring out hard to influence behavior, at the risk of which the clients don't automatically adjust, basically denying them service.

pauln · 2017-11-19T22:37:16Z

As somebody who has spent a few hours scratching his head trying to figure out why setting a client-side chunk size completely broke uploads, only to eventually figure out the failures were because I was sending chunks smaller than S3 accepts, I'm definitely in favour of a way for the server to at least suggest an acceptable chunk size range to the client. The presence of a min chunk size header would've been incredibly helpful in determining the issue quickly!

For reference, the S3 docs specify the following chunk size:

5 MB to 5 GB, last part can be < 5 MB

(emphasis added). The latter part of that makes the discussion around whether the headers should be hard or soft limits a little more complicated - for instance, if they were to be hard limits, how should a client act if presented with a file which can't be split into chunks which all fit within the server-specified min/max? Although that seems like a reasonably unlikely situation, it's worth keeping in mind - even if only to document it. A couple of options spring to mind:

The simplest option would probably be for the protocol to specify that the last chunk can always be smaller than the minimum chunk size, though that might risk alienating any storage back end which doesn't have the "last part can be smaller than min chunk size" stipulation which S3 has (if any such storage back end exists)
Alternatively, another new header (along the lines of Tus-Accepts-Small-Last-Chunk) could be introduced to cover all bases: if the server is using S3 (for instance), it can send this header to say that undersized final chunks are accepted. Since this is probably the more likely case, it might be better to invert this to something like Tus-No-Undersized-Chunks for any server which doesn't allow the last chunk to be under the limit

Thinking further on the header option, perhaps it could be used as a way for the server to say whether the limits are soft or hard:

Tus-Undersized-Chunks could be:
- omitted or any to say that Tus-Min-Chunk-Size is a soft limit,
- none to say that Tus-Min-Chunk-Size is a hard limit (with no exceptions), or
- last to say that Tus-Min-Chunk-Size is a hard limit (with the exception of the last chunk, per S3)
Tus-Oversized-Chunks could be:
- omitted or any to say that Tus-Max-Chunk-Size is a soft limit, or
- none to say that Tus-Max-Chunk-Size is a hard limit

Adds a note to the readme to mention that exceeding any server-imposed chunk size hardlimits will result in chunked uploads failing. This is to help anyone who doesn't realise that they're hitting such limits. See the following for further information / discussion: - tus/tus-resumable-upload-protocol#89 - tus/tus-resumable-upload-protocol#93

kvz · 2017-11-20T08:15:10Z

To make sure I understand: tus-js-client needs to send big enough chunks so that e.g. tusd can send those same, big enough chunks, to s3? If so I would suggest these should be decoupled at the expense of buffering on the tusd server. You should be able to define chunk sizes from tus-js-client towards tusd regardless of storage backend imho. But perhaps I’m misunderstanding this issue in which case, please clarify further :) Sent from mobile, pardon the brevity.

…

On 19 Nov 2017, at 23:37, Paul Nicholls ***@***.***> wrote: As somebody who has spent a few hours scratching his head trying to figure out why setting a client-side chunk size completely broke uploads, only to eventually figure out the failures were because I was sending chunks smaller than S3 accepts, I'm definitely in favour of a way for the server to at least suggest an acceptable chunk size range to the client. The presence of a min chunk size header would've been incredibly helpful in determining the issue quickly! For reference, the S3 docs specify the following chunk size: 5 MB to 5 GB, last part can be < 5 MB (emphasis added). The latter part of that makes the discussion around whether the headers should be hard or soft limits a little more complicated - for instance, if they were to be hard limits, how should a client act if presented with a file which can't be split into chunks which all fit within the server-specified min/max? Although that seems like a reasonably unlikely situation, it's worth keeping in mind - even if only to document it. A couple of options spring to mind: The simplest option would probably be for the protocol to specify that the last chunk can always be smaller than the minimum chunk size, though that might risk alienating any storage back end which doesn't have the "last part can be smaller than min chunk size" stipulation which S3 has (if any such storage back end exists) Alternatively, another new header (along the lines of Tus-Accepts-Small-Last-Chunk) could be introduced to cover all bases: if the server is using S3 (for instance), it can send this header to say that undersized final chunks are accepted. Since this is probably the more likely case, it might be better to invert this to something like Tus-No-Undersized-Chunks for any server which doesn't allow the last chunk to be under the limit Thinking further on the header option, perhaps it could be used as a way for the server to say whether the limits are soft or hard: Tus-Undersized-Chunks could be: omitted or any to say that Tus-Min-Chunk-Size is a soft limit, none to say that Tus-Min-Chunk-Size is a hard limit (with no exceptions), or last to say that Tus-Min-Chunk-Size is a hard limit (with the exception of the last chunk, per S3) Tus-Oversized-Chunks could be: omitted or any to say that Tus-Max-Chunk-Size is a soft limit, or none to say that Tus-Max-Chunk-Size is a hard limit — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

pauln · 2017-11-20T09:45:36Z

To make sure I understand: tus-js-client needs to send big enough chunks so that e.g. tusd can send those same, big enough chunks, to s3?

Yes - at least in the case of tusd (other implementations may be able to handle this scenario already).

If so I would suggest these should be decoupled at the expense of buffering on the tusd server.

Buffering might interfere with tusd's ability to horizontally scale, so I'm not sure whether that's likely to be implemented.

You should be able to define chunk sizes from tus-js-client towards tusd regardless of storage backend imho.

That would indeed be ideal, but as long as there's a way for the Tus server (i.e. tusd) to inform the client about (hard) limits, I don't think it's critical.

Acconut · 2017-11-20T23:12:52Z

If so I would suggest these should be decoupled at the expense of buffering on the tusd server.

As @pauln said, this is not an easy solution. Buffering on disk will break horizontal scalability (unless you use sticky sessions) and storing the buffers on S3 would probably be too slow and may reduce the performance for all other uploads, or would at least make the algorithm much more complex.

@pauln Thank you very much, for your thoughts on this. I am sorry to hear that this consumed much of your time. Actually, this fact is documented in tusd (https://github.com/tus/tusd/blob/master/s3store/s3store.go#L52) but one cannot be expected to find this.

Regarding your proposal, I would like to mention that the S3 store in tusd has flexible minimum chunk sizes, which are calculated for each upload size individually. For example, if you want to upload 5TB, then the min. size will be 500MB (the reason is that S3 only allows 10.000 chunks for each multipart upload). How would this play with your idea? Would the 5MB limit (which holds true for some uploads) be considered a soft limit?

Adds a note to the readme to mention that exceeding any server-imposed chunk size hardlimits will result in chunked uploads failing. This is to help anyone who doesn't realise that they're hitting such limits. See the following for further information / discussion: - tus/tus-resumable-upload-protocol#89 - tus/tus-resumable-upload-protocol#93

pauln · 2017-11-23T22:20:03Z

S3 store in tusd has flexible minimum chunk sizes, which are calculated for each upload size individually [...] How would this play with your idea?

Assuming that the min/max chunk sizes would be send in response to the Creation plugin's POST request, the file size is already sent in the Upload-Length header - which could then be used to determine / calculate what to send back.

Upload-Defer-Length would interfere with using Upload-Length in this way - but if a server has flexible minimum chunk sizes, perhaps it simply shouldn't support creation-defer-length. Alternatively, it could start off sending the absolute hard limit (i.e. 5MB for S3) and then send revised min/max chunk sizes in response to the PATCH which sets the Upload-Length header (once the length is known) - though that could theoretically cause problems if a very large file's length is deferred. If that's likely to be an issue (such as for tusd using the S3 store), perhaps the server could track relevant metrics and add a revised min chunk size to any arbitrary PATCH request?

Alternatively, if it's desirable to push this kind of logic onto the client as much as possible, perhaps it would be better for the POST response to include some relevant combination of:

Tus-Min-Chunk-Size
Tus-Max-Chunk-Size
Tus-Max-Chunks-Per-File
Tus-Undersized-Chunks (as per my earlier comment)
Tus-Oversized-Chunks (as per my earlier comment)

It would then be up to the client to track what it's sending (and needs to send) and adjust chunk sizes as necessary.

michaelstingl · 2021-03-18T22:18:15Z

Based on my experience with large file transfers with native desktop and mobile apps, Tus-Max-Chunk-Size is very much needed. There's too much middleware between clients and backend, that enforces size limits, but don't play nicely with tus.

Backends should know if they're behind load balancers, network security appliances, tunnel endpoints etc, and should announce Tus-Max-Chunk-Size to connecting clients.

Clients are free to optimise chunk sizes between Tus-Min-Chunk-Size and Tus-Max-Chunk-Size, based on other criteria.

Some real-world-examples were already described in other places:

Reverse proxies come with maximum request size
(Source: https://github.com/tus/tusd/blob/master/docs/faq.md#can-i-run-tusd-behind-a-reverse-proxy)
Cloudflare has a 100mb hard limit (HTTP POST request size)
(Source: Max Request Size (Chunk size) tusd#353 (comment)

Add Tus-Min/Max-Chunk-Size headers

7441bc1

See #89 for the underlying discussion

Acconut added enhancement core labels Aug 30, 2016

Acconut added this to the 1.1 milestone Aug 30, 2016

kvz reviewed Aug 30, 2016
View reviewed changes

AJvanLoon reviewed Aug 30, 2016
View reviewed changes

pauln mentioned this pull request Nov 19, 2017

Add note about server-imposed chunkSize hardlimits tus/tus-js-client#84

Merged

Acconut mentioned this pull request Mar 8, 2020

Max Request Size (Chunk size) tus/tusd#353

Open

Acconut mentioned this pull request Mar 22, 2020

Describe large file chunking as a part of protocol #24

Closed

This was referenced Apr 21, 2020

Decide on HTTP chunked-data transfer protocol cs3org/reva#290

Open

TUS upload support (WIP) owncloud/ios-sdk#65

Closed

butonic mentioned this pull request Apr 28, 2020

announce tus feature in Providerinfo cs3org/cs3apis#70

Open

PVince81 mentioned this pull request May 6, 2020

Expose TUS capabilities through capabilities API owncloud/ocis-reva#177

Closed

butonic mentioned this pull request Feb 5, 2021

implement oc chunking/checksum using tus metadata owncloud/ocis#1279

Closed

michaelstingl mentioned this pull request Oct 14, 2021

Single big file upload with desktop-client fails if OIDC token expires owncloud/ocis#1205

Closed

wkloucek mentioned this pull request Oct 14, 2021

lower TUS max chunk size from infinite to 0.1 GB owncloud/ocis#2584

Merged

9 tasks

smatsson mentioned this pull request Feb 22, 2023

Please add a way to limit the chunk size tusdotnet/tusdotnet#199

Closed

Djblammapp approved these changes Mar 26, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Tus-Min/Max-Chunk-Size headers #93

Add Tus-Min/Max-Chunk-Size headers #93

Acconut commented Aug 30, 2016

kvz Aug 30, 2016

AJvanLoon Aug 30, 2016

kvz commented Aug 30, 2016

AJvanLoon Aug 30, 2016

janko commented Aug 30, 2016

Acconut commented Aug 31, 2016

janko commented Sep 1, 2016

kvz commented Sep 1, 2016 •

edited

Loading

Acconut commented Sep 1, 2016

kvz commented Sep 1, 2016

pauln commented Nov 19, 2017

kvz commented Nov 20, 2017 via email

pauln commented Nov 20, 2017

Acconut commented Nov 20, 2017

pauln commented Nov 23, 2017

michaelstingl commented Mar 18, 2021

Add Tus-Min/Max-Chunk-Size headers #93

Are you sure you want to change the base?

Add Tus-Min/Max-Chunk-Size headers #93

Conversation

Acconut commented Aug 30, 2016

kvz Aug 30, 2016

Choose a reason for hiding this comment

AJvanLoon Aug 30, 2016

Choose a reason for hiding this comment

kvz commented Aug 30, 2016

AJvanLoon Aug 30, 2016

Choose a reason for hiding this comment

janko commented Aug 30, 2016

Acconut commented Aug 31, 2016

janko commented Sep 1, 2016

kvz commented Sep 1, 2016 • edited Loading

Acconut commented Sep 1, 2016

kvz commented Sep 1, 2016

pauln commented Nov 19, 2017

kvz commented Nov 20, 2017 via email

pauln commented Nov 20, 2017

Acconut commented Nov 20, 2017

pauln commented Nov 23, 2017

michaelstingl commented Mar 18, 2021

kvz commented Sep 1, 2016 •

edited

Loading