-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
S3 upload stream with http throws stream mark and reset error #1748
Comments
This is a known issue and a current limitation of the SDK. There are similar posts with workarounds. Please refer to them and see if they work for you. |
Thanks a lot for the response. I saw ur answer before, but what I am trying to do here is, stream a file straight from user into S3 rather than download/buffer into our server. Thus, I don't have the file, so option 1 is out for me.
|
@thisarattr Unfortunately there's no way around this as the SDK needs to consume the full contents of the stream (which in this case requires buffering the stream to memory) to be able to set the checksum as part of the request signature. The easiest way around this would be to switch to using an HTTPS endpoint if possible. |
It sounds like this is a feature request so I'll mark it as such for now, but I'm not sure how we'll be able to avoid this. |
@dagnir I agree that, when it uses http there is no way to calculate the hash/checksum without buffering in memory. But still, it should not fail by throwing mark and reset exception, right? Because, hashing is client lib responsibility, api consumer does not need to know about it. It should throw meaningful error message instead of mark and reset exception, which does not mean much to the consumer, without looking at the client lib code. |
Okay I see; we can certainly throw/log a more descriptive error message. |
Could we actually have a specific subclass of SdkClientException for these retryable signing/hashing problems? The Hadoop S3A client already splits failures into those which may be recoverable (no response, throttle errors, socket timeouts etc and then decides which to retry. |
This issue is now closed. Comments on closed issues are hard for our team to see. |
FYI as HADOOP-19221 shows, v2 SDK actually makes things worse in terms of s3 upload recoverability. |
I am trying to stream a file straight into S3 rather than upload/buffer into our own server and reupload into S3.
When I use
http
aws client is trying to calculate message digest and failed to reset the stream. Further, I haven't set a explicit read limit, so it default to 128kb and im uploading stream larger than that.As per the AWS client code, it set the mark() to the request read limit and then it reads the whole stream, which is beyond the mark() and try to reset() it. Which is obviously going to fail and throw the reset error.
AWS4Signer.java
AbstractAWSSigner.java
Exception thrown,
The text was updated successfully, but these errors were encountered: