-
Notifications
You must be signed in to change notification settings - Fork 811
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Large File Synchronizations Fail due to Hardcoded Timeout Value in Desktop Client #5394
Comments
Hello, happy 2024! I fully support @Ourewaeller 's suggestions for resolving this issue. Best, Martin |
Close #5394 Signed-off-by: Matthieu Gallien <[email protected]>
Close #5394 Signed-off-by: Matthieu Gallien <[email protected]>
Close #5394 Signed-off-by: Matthieu Gallien <[email protected]>
Close #5394 Signed-off-by: Matthieu Gallien <[email protected]>
My 2 cents... Seems like the problem gets worse when NextCloud app "Antivirus for files" is enabled.
I guess it's related, and occurs for smaller files if the app "Antivirus for files" is enabled... |
Just a recent observation (v.3.12.xx+) Not only a large file is not fully uploaded, but this corrupt file is synced back to the source, destroying the source file. |
Just wanted to chime in. Especially google takeout resulted in very large files, e.g. 50g-150g. I've been running some tests and basically, E2EE is useless for anything that is larger than a couple of MB. I was trying to simulate certain scenarios, e.g. if a user splits their file into 5GB files. So in the case of the 150GB file, I created 30 5GB files to see how nextcloud would handle this. Short story, complete failure, borked my E2EE completely, now I can't reset E2EE feature. The way the whole thing works is just not very well optimized. So for example you dump 30x5GB files in a folder on your system, which is supposed to be E2EE encrypted by the client. The client starts a sync run for the 30 files, it starts to encrypt one of those files and creates a temporary file in the /tmp folder while it does this. Once that file is encrypted it starts to upload it. The file is split into chunks and remotely lands in a users special temporary "uploads" folder. Once all chunks have been transmitted the server then assembles them and moves them into the actual target directory. As mentioned before in this thread, here is a big problem. So the client has a hardcoded time limit for how long it waits after uploading the last chunk. The server takes its time, the client gets a 504 status back for the move command and thinks the operation failed. So it starts over, creates ANOTHER tempfile without deleting the old one, and starts from scratch. At some point the client just stops syncing because of either missing files or sync conflicts, etc... First of all, why is there a time limit in the client anyways? The client should poll the status from the server and be aware of the progress. I ran these tests as a single user. If that had been a realworld scenario with multiple users, that would have created chaos. In this particular scenario the sync never properly finishes because of timeout issues and what not, your files end up in limbo and your /tmp folder just keeps growing until you no longer have space. There needs to be a more intelligent way of handling this. E.g. once a file has been encrypted and transmitted, remove it from the /tmp folder. Secondly, the client shouldn't wait for the MOVE command, but simply issue the request and periodically query the server if that request has completed or not. Once it has, then consider the file "properly" synced. The way it is now, E2EE is useless for anything more than a handful of few files. Especially if you value your data. |
Bug description
I am running a Nextcloud server 25.0.3 and use the Windows Desktop Client 3.6.6 on several Windows 10 installations. While working with the Nextcloud Virtual Drive / Files in Windows I encountered issues with large virtual disk files I wanted to synchronize via the virtual drive from my Windows clients to the Nextcloud server. These files are up to 120GB in sizes, but could be larger.
Regardless of what I tried, the Desktop Client aborted the sychnronization of such files 30 minutes after the upload progress bar had reached 100% reporting a "Connection timed out" error message.
So I started to dig deeper. This is what I came up with while testing with a 77GB file.
Synchronization of larger files via the Desktop Client consists of two major stages for files which do not yet exist on the Nextcloud server. In the first stage, the Desktop Client uploads the file in chunks to a temporary upload folder on the Nextcloud server. Once this is completed, the Desktop Client asks the Nextcloud server to assemble these chunks back to one file at the destination folder.
The first stage of the upload works fine. The large file gets chunked and uploaded to the upload folder of the Nextcloud server. While this is ongoing, the Desktop Client continuously updates the remaining time and the progress bar on its "Settings" screen. Once all chunks have been uploaded, the status information of the Desktop Client changes to "A few seconds left". Then it starts the second stage of the synchronization run.
The Desktop Client sends a MOVE command to the Nextcloud server and starts waiting for the reply to this request. The Nextcloud server begins to assemble the chunks at the final folder. While the Nextcloud server is assembling, the Desktop Client keeps showing the "A few seconds left" status message and visually seems to be "stuck". However it is still maintaining the connection to the Nextcloud server waiting for the reply to the MOVE command.
Based on its size and the speed of the disk drives of my Nextcloud server, assembling my 76GB test file takes about 40 minutes (sometimes even more). In case it already exists on the Nextcloud server, the overall processing time roughly doubles. Because its previous version needs to be copied by the Nextcloud server to the files_versions folder prior to the MOVE operation.
After waiting for 30 minutes on the response to the MOVE command from the Nextcloud server, the Desktop Client terminates the connection with the Nextcloud server and displays a "Connection timed out" error message.
The Nextcloud server however does not mind and finishes the MOVE operation properly. Following the time out, the Desktop Client marks the transfer as incomplete and starts the next attempt to synchronize the file. Because the file already exists on the Nextcloud server, the server creates a new version of the file, starts to assemble the chunks of the new upload, and while doing so, the Desktop Client runs into the next 30 minute timeout and the procedure starts all over again.
To make things even worse, after the second timeout the Desktop Client detects that there are chunks left over on the Nextcloud server which it believes belong to a failed synchronization. And because of that, it requests the deletion of those chunks from the Nextcloud server. The server deletes the chunks in a second thread, while the first thread initiated by the timed out connection is still assembling. As a result, the assembling thread fails in the middle of its execution, because the remaining chunks are no longer available. It stops, leaving a partially assembled fragment of the original file at the destination folder behind. Hence the second synchronization creates a corrupted new version of the file on the server.
While trying to find the origin of that 30 minute timeout, I checked the source code of the Desktop Client and detected, that it is caused by a hardcoded maximum value inside of the method PropagateUploadFileCommon::adjustLastJobTimeout of the file libsync\propagateupload.cpp.
In order to verify my assumption, I built my own version of the Desktop Client with that value set to 120 minutes (which would still cause issues with files larger than mine). I was able to confirm that this time my files synchronized like expected. The Desktop Client did not run into the 30 minute timeout. It waited until the finish of the MOVE operation and completed the synchronization successfully with the green check mark.
The related method contains a formular which calculates the MOVE timeout based on the size of the files. That value would have worked for me. But it limits the calculated value to 30 minutes (hard coded). This might make sense to avoid "stuck" Desktop Client synchronization runs, but for larger files which need longer to synchronize leads to this exact problem.
To make a long story short. I would really appreciate if the hard coded limit would be increased or - which would even be better - could be set or disabled by using a configuration parameter of the Desktop Client.
I am really sorry for the long post, but it took me almost a week to figure out why my uploads aborted. So I wanted to share as much information as possible.
Please consider changing this behaviour in one of the future releases. Thank you very much for all of your past and future contributions to this project.
Steps to reproduce
Expected behavior
The synchronization will successfully finish without a timeout error.
Which files are affected by this bug
libsync\propagateupload.cpp - method PropagateUploadFileCommon::adjustLastJobTimeout
Operating system
Windows
Which version of the operating system you are running.
Windows 10
Package
Appimage
Nextcloud Server version
25.0.3
Nextcloud Desktop Client version
3.6.6
Is this bug present after an update or on a fresh install?
Fresh desktop client install
Are you using the Nextcloud Server Encryption module?
Encryption is Disabled
Are you using an external user-backend?
Nextcloud Server logs
No response
Additional info
No response
The text was updated successfully, but these errors were encountered: