-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Tus-Min/Max-Chunk-Size headers #93
base: main
Are you sure you want to change the base?
Conversation
See #89 for the underlying discussion
on how much data can be stored in a single take. The Client SHOULD use this number to | ||
determine in conjunction with other factors, such as the available bandwidth and the | ||
`Tus-Max-Chunk-Size` header, how much data SHOULD be transfered in one request. Furthermore, | ||
this header's value MAY represent a hard limit but also MAY be a recommendation. The Server |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MAY represent a hard limit but also MAY be a recommendation
is there a way we can be less ambiguous? As an implementor, I would be confused what to do here. Could the protocol make a recommendation, e.g. via SHOULD/but MAY
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hard limit, but also MAY
As part of this pr, I think adding @janko-m as a contributor and a feature version bump could be cool |
#### Tus-Min-Chunk-Size | ||
|
||
The `Tus-Min-Chunk-Size` response header MUST be a non-negative integer indicating the | ||
recommended minimum size of a single chunk uploaded in a single `PATCH` request in bytes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would place "in bytes" more to the front: indicating the recommended minimum size in bytes of a single chunk... Remove 'in bytes' at the end.
@Acconut Great work! 👍 I just wanted to briefly discuss an option of making this simpler (maybe you already discussed this with @kvz). Why not make these two headers hard limits for the client, instead of recommendations? So, something like "If the Server specifies Because the Client currently cannot know if these are just recommendations or hard limits, so I feel like it should treat them as hard limits anyway just in case. The client could theoretically be implemented in the way that, if the bandwidth is good/bad, it tries to upload a chunk outside of a limit, and if the chunk fails it realizes that this limit is a hard limit, and next time it doesn't try. However, in tus-ruby-server I can let the client know that they've hit the limit only after the chunk has already been uploaded, so depending on the chunk size this could bring a noticeable slowdown for upload start. What do you think about this? @kvz Thanks! ❤️ |
This will not be done in this PR, but in the one for the 1.1 release.
Thank you for initiation this change :)
I dislike this idea because this would push to much power to the Server. As we seemed to agree in #89, the Client should always have the last word in determining how big a single chunk will be and, in my opinion, this also includes having the ability and right to choose a size which is outside of the numbers outlined by the Tus-Min/Max-Chunk-Size headers. Such a scenario probably will not appear often in reality but the may turn up, for example when the bandwidth is just too bad or the device's memory is too small.
Of course, Clients could include this functionality but I am not pleased to push authors towards implementing Clients which violate the specification, if that's avoidable.
I agree, that we should include a statement which recommends to use these numbers as hard limits, if possible. Please refer to my two example for above for cases in which to may not be able to do.
If the Client chooses a chunk size which is outside of the recommended range, it should be prepared to see a rejection for them. Furthermore, this is a known issue in the HTTP architecture and is also reason why they introduced the I am looking forward to your feedback. Also, @kvz and @AJvanLoon, thank you a lot for your thoughts, I will address them! ❤️ |
You're right, I agree.
That would be perfect 👍 |
I'm not 100% sure about this one. I could imagine cases where the Server knows upper or lower bound limits to the chunksize it can accept. I think the Client should have all the freedom to estimate best chunksize within this range. But what good would it do leave the Client the freedom to overshoot the Server's boundaries? The protocol then allows for more errors to happen than would have been the case when making the client respect its ranges always. Let the record show that the server could have very wide ranges by default. Even say from |
Servers will always need a good protection against these situation, regardless of whether they come from a misbehaving Client or a malicious DDoS attack. The Server is allowed to reject chunks which fall outside of the recommended size and Clients should be aware of this fact. They may achieve this partially using the Furthermore, I think in most cases, the device running the Client will be more limited in terms of available hardware than the Server. Especially if you look at smartphones, embedded devices or micro computers running in rural areas in Africa, which we saw experimenting with tus some time ago. Therefore, the Client will usually be the entity which slows the upload down, I assume.
I agree, that's why we should include a statement which recommends to use these numbers as hard limits, if possible. Please see me comment above for more context.
Could you please explain what you mean with average? Do you refer to the recommended upper and lower chunk sizes? |
DDoS-ing aside, for which I feel protection has limited place inside tusd, but rather in the HAProxy or network layers above it,
How will the client know what the boundaries are if the protocol suggests the client may overshoot them.
I feel this is ambiguous. The client should treat the server upper & lower bounds as hard limits that it is not allowed to overshoot. The server is free to set these values to actual hard limits, or to a vague estimation of them. It's just nice to set the range that clients will try from the serverside. I know that in Africa phones are low on bandwidth and all that, the case i think is more that when servicing a million devices, you can squeeze more performance out of a single tusd server if you can control the range in which they set their chunks. If you just provide a vague recommendation, chances are half of your devices are just going to ignore it, since the protocol isn't tough on this, meaning you have the choice of either erroring out hard to influence behavior, at the risk of which the clients don't automatically adjust, basically denying them service. |
As somebody who has spent a few hours scratching his head trying to figure out why setting a client-side chunk size completely broke uploads, only to eventually figure out the failures were because I was sending chunks smaller than S3 accepts, I'm definitely in favour of a way for the server to at least suggest an acceptable chunk size range to the client. The presence of a min chunk size header would've been incredibly helpful in determining the issue quickly! For reference, the S3 docs specify the following chunk size:
(emphasis added). The latter part of that makes the discussion around whether the headers should be hard or soft limits a little more complicated - for instance, if they were to be hard limits, how should a client act if presented with a file which can't be split into chunks which all fit within the server-specified min/max? Although that seems like a reasonably unlikely situation, it's worth keeping in mind - even if only to document it. A couple of options spring to mind:
Thinking further on the header option, perhaps it could be used as a way for the server to say whether the limits are soft or hard:
|
Adds a note to the readme to mention that exceeding any server-imposed chunk size hardlimits will result in chunked uploads failing. This is to help anyone who doesn't realise that they're hitting such limits. See the following for further information / discussion: - tus/tus-resumable-upload-protocol#89 - tus/tus-resumable-upload-protocol#93
To make sure I understand: tus-js-client needs to send big enough chunks so that e.g. tusd can send those same, big enough chunks, to s3?
If so I would suggest these should be decoupled at the expense of buffering on the tusd server. You should be able to define chunk sizes from tus-js-client towards tusd regardless of storage backend imho.
But perhaps I’m misunderstanding this issue in which case, please clarify further :)
Sent from mobile, pardon the brevity.
… On 19 Nov 2017, at 23:37, Paul Nicholls ***@***.***> wrote:
As somebody who has spent a few hours scratching his head trying to figure out why setting a client-side chunk size completely broke uploads, only to eventually figure out the failures were because I was sending chunks smaller than S3 accepts, I'm definitely in favour of a way for the server to at least suggest an acceptable chunk size range to the client. The presence of a min chunk size header would've been incredibly helpful in determining the issue quickly!
For reference, the S3 docs specify the following chunk size:
5 MB to 5 GB, last part can be < 5 MB
(emphasis added). The latter part of that makes the discussion around whether the headers should be hard or soft limits a little more complicated - for instance, if they were to be hard limits, how should a client act if presented with a file which can't be split into chunks which all fit within the server-specified min/max? Although that seems like a reasonably unlikely situation, it's worth keeping in mind - even if only to document it. A couple of options spring to mind:
The simplest option would probably be for the protocol to specify that the last chunk can always be smaller than the minimum chunk size, though that might risk alienating any storage back end which doesn't have the "last part can be smaller than min chunk size" stipulation which S3 has (if any such storage back end exists)
Alternatively, another new header (along the lines of Tus-Accepts-Small-Last-Chunk) could be introduced to cover all bases: if the server is using S3 (for instance), it can send this header to say that undersized final chunks are accepted. Since this is probably the more likely case, it might be better to invert this to something like Tus-No-Undersized-Chunks for any server which doesn't allow the last chunk to be under the limit
Thinking further on the header option, perhaps it could be used as a way for the server to say whether the limits are soft or hard:
Tus-Undersized-Chunks could be:
omitted or any to say that Tus-Min-Chunk-Size is a soft limit,
none to say that Tus-Min-Chunk-Size is a hard limit (with no exceptions), or
last to say that Tus-Min-Chunk-Size is a hard limit (with the exception of the last chunk, per S3)
Tus-Oversized-Chunks could be:
omitted or any to say that Tus-Max-Chunk-Size is a soft limit, or
none to say that Tus-Max-Chunk-Size is a hard limit
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Yes - at least in the case of
Buffering might interfere with
That would indeed be ideal, but as long as there's a way for the Tus server (i.e. |
As @pauln said, this is not an easy solution. Buffering on disk will break horizontal scalability (unless you use sticky sessions) and storing the buffers on S3 would probably be too slow and may reduce the performance for all other uploads, or would at least make the algorithm much more complex. @pauln Thank you very much, for your thoughts on this. I am sorry to hear that this consumed much of your time. Actually, this fact is documented in tusd (https://github.com/tus/tusd/blob/master/s3store/s3store.go#L52) but one cannot be expected to find this. Regarding your proposal, I would like to mention that the S3 store in tusd has flexible minimum chunk sizes, which are calculated for each upload size individually. For example, if you want to upload 5TB, then the min. size will be 500MB (the reason is that S3 only allows 10.000 chunks for each multipart upload). How would this play with your idea? Would the 5MB limit (which holds true for some uploads) be considered a soft limit? |
Adds a note to the readme to mention that exceeding any server-imposed chunk size hardlimits will result in chunked uploads failing. This is to help anyone who doesn't realise that they're hitting such limits. See the following for further information / discussion: - tus/tus-resumable-upload-protocol#89 - tus/tus-resumable-upload-protocol#93
Assuming that the min/max chunk sizes would be send in response to the Creation plugin's POST request, the file size is already sent in the
Alternatively, if it's desirable to push this kind of logic onto the client as much as possible, perhaps it would be better for the POST response to include some relevant combination of:
It would then be up to the client to track what it's sending (and needs to send) and adjust chunk sizes as necessary. |
Based on my experience with large file transfers with native desktop and mobile apps, Backends should know if they're behind load balancers, network security appliances, tunnel endpoints etc, and should announce Clients are free to optimise chunk sizes between Some real-world-examples were already described in other places:
|
See #89 for the underlying discussion.
@kvz @janko-m