Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

turbo: speed up rebuildSegments #12114

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

stevemilk
Copy link
Contributor

  1. do not reopen segments/indexed if already opened during rebuildSegments
  2. accelerate scanning files by avoiding doubly nested loop

@AskAlexSharov
Copy link
Collaborator

looks good. but seems we have corner-case: we don't know "is download of file finished or not". i think it's possible to open some file (like .idx). i feel - it's possible to open .idx file - which was bad (metadata section was not downloaded) - and still get non-nil object in this case (no error may happen - if validation will not catch it).

In all other cases i think your fix will work.

@stevemilk
Copy link
Contributor Author

find a way to cover the corner case:
if size of downloaded file is equal to the size stored in corresponding torrent file, download is considered as finished.
And only if file is complete(download finished || generate locally) and not opened yet, we reopen the file.

@mh0lt
Copy link
Contributor

mh0lt commented Oct 2, 2024

find a way to cover the corner case: if size of downloaded file is equal to the size stored in corresponding torrent file, download is considered as finished. And only if file is complete(download finished || generate locally) and not opened yet, we reopen the file.

Unfortunately this check won't work. The torrent lib truncates the mmap to its full size on initiation so it can randomly write to the full extent. So this will always be true once the download starts.

We have added a callback from the downloader when download completes - which can be used to know when the file is downloaded:

TorrentCompleted(ctx context.Context, in *TorrentCompletedRequest, opts ...grpc.CallOption) (Downloader_TorrentCompletedClient, error)

Making this call available to the snapshot code is part of a bigger refactor.

@stevemilk
Copy link
Contributor Author

Thx for reminding. Will try implementing the callback and other possible alternatives.

@stevemilk
Copy link
Contributor Author

stevemilk commented Oct 7, 2024

sorry for late reply, on vacation last week.

i implement the hash comparison to check if file is fully downloaded.
test with fully downloaded/almost downloaded/zero downloaded files, only hash of fully downloaded file is equal to the hash stored in torrent file.
here's result of benchmark test of hash generation, it shows the time taken is directly proportional to the file size:

timecost: 4.12621525s fileSize: 5606 MB
timecost: 183.506042ms fileSize: 219 MB
timecost: 6.796708ms fileSize: 5 MB

but where to do hash comparison is still under consideration

@stevemilk
Copy link
Contributor Author

stevemilk commented Oct 8, 2024

another idea is to create commitment file for segment/index after download finished.
so that we can do below integrity check before rebuildSegments:

  • if torrent file does not exist, target file is created locally and must be complete
  • if torrent file exists, target file is downloaded
    • if commitment file exists and if its modification time is newer than target file, target file is complete
    • otherwise, target file is not complete

@stevemilk
Copy link
Contributor Author

This PR conducts file hashing function during rebuild, which is quite expensive. Also above corner case is still unsolved cause getting file hash and reopen file are not atomic.

So I prepared another PR #12245 according to this quote:

another idea is to create commitment file for segment/index after download finished. so that we do integrity check before rebuildSegments:

  • if torrent file does not exist, target file is created locally and must be complete

  • if torrent file exists, target file is downloaded

    • if commitment file exists and if its modification time is newer than target file, target file is complete
    • otherwise, target file is not complete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants