Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image upload times out after an hour (401 response) #1464

Closed
citrus-it opened this issue May 8, 2023 · 7 comments
Closed

Image upload times out after an hour (401 response) #1464

citrus-it opened this issue May 8, 2023 · 7 comments
Assignees
Milestone

Comments

@citrus-it
Copy link

I am trying to upload an 8GiB image from home. My connection over the VPN to the rack is higher latency than something local, but it is not that bad. When the upload starts, I see around 6Mb/s sustained upload.

image

With the base64 overhead, iI'll have to upload around 10.3GiB. The latency for each chunk is around 4 seconds. It is not going to be quick.

image

Just noting that the console constantly logs Fetch failed (because of the 204 reply code?), but the upload is working.

image

The upload progresses while I get a coffee.

image

and after an hour, the browser gets 401 responses, the page changes to show Something went wrong, and then refreshes to the login page.

image

which is a long way of asking if the upload can keep the authentication fresh to avoid this.

@david-crespo
Copy link
Collaborator

If the session timeout is really only an hour we should probably make it longer. Very odd about the fetch failed, I’ll look into that. Those are on the bulk-write posts? They’d have to be, that’s the only request being made hundred of times.

Another thing we can look into for this problem is making the chunks bigger. I think we picked a 512 KiB max in crucible because that’s the max request size we’ve configured for dropshot, but if we got this over the line oxidecomputer/dropshot#618 we could give the bulk-write endpoint a higher limit, maybe a few MiB. As far as I know there is no hard requirement in crucible that it be 512 KiB.

@david-crespo
Copy link
Collaborator

Looks like Chrome logs spurious fetch failed on 204s when the response is empty. Apparently by adding OK (or perhaps {} so it parses as JSON) in there we can avoid this. Worth doing to avoid the false appearance of jankiness. I'll make an API issue or just make the change.

https://stackoverflow.com/a/57534957/604986

@citrus-it
Copy link
Author

If the session timeout is really only an hour we should probably make it longer.

The 401s start around an hour after I start the upload, so it seems so. In the limit we probably want to make the session timeout configurable (I have worked at sites where 10 minutes is the maximum allowed for anything like this) - is there any way that hitting the bulk-write endpoint can reset the clock, or something like that?

@david-crespo
Copy link
Collaborator

I think it's supposed to work that way already. I'll look into why it's not.

The absolute TTL is a maximum lifetime on the token regardless of extension. Say the idle TTL is 1 hour and the absolute TTL is 8 hours. As long as the user pokes the site and makes API requests every 59 minutes, their session will keep getting extended, up until 8 hours, at which point it will get expired no matter what and they will have to reauthenticate.

oxidecomputer/omicron#326

Those numbers are currently supposed to be 1 hour and 8 hours.

https://github.com/oxidecomputer/omicron/blob/7e3430a01f9c7d0d44751a1642b61e3cffa0ba80/smf/nexus/config-partial.toml#L9-L10

@david-crespo
Copy link
Collaborator

david-crespo commented May 8, 2023

I was not able to reproduce the Fetch failed part in Chrome 113 on mac pointing at dogfood rack. Not sure what that's about. It would be helpful to know whether that's coming up for every bulk-write request or some subset of them.

(Oh my god upload is slow over VPN.)

@david-crespo
Copy link
Collaborator

I expect this to fix it:
oxidecomputer/omicron#3053

@morlandi7 morlandi7 added this to the MVP milestone May 9, 2023
@david-crespo david-crespo self-assigned this May 15, 2023
@david-crespo
Copy link
Collaborator

I believe this is fixed. I'm seeing the expected 8 hour TTL on the session cookie on the dogfood rack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants