Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

concurrent multipart support for transfer #130

Merged

Conversation

giacomo-alzetta-aiven
Copy link
Contributor

@giacomo-alzetta-aiven giacomo-alzetta-aiven commented Jul 24, 2023

About this change - What it does

Alternative version of #128

Why this way

Simplifies API (no resume) and avoids an extra class.

@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch from 7e8e7c3 to 9fe71ae Compare July 24, 2023 15:50
@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch from 9fe71ae to 8fc2fad Compare July 24, 2023 15:52
@codecov-commenter
Copy link

codecov-commenter commented Jul 24, 2023

Codecov Report

Patch coverage: 81.14% and project coverage change: +1.08 🎉

Comparison is base (5f65d18) 65.25% compared to head (33ed123) 66.33%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #130      +/-   ##
==========================================
+ Coverage   65.25%   66.33%   +1.08%     
==========================================
  Files          31       31              
  Lines        3606     3844     +238     
==========================================
+ Hits         2353     2550     +197     
- Misses       1253     1294      +41     
Impacted Files Coverage Δ
rohmu/object_storage/base.py 76.31% <77.27%> (+0.12%) ⬆️
rohmu/object_storage/local.py 86.13% <77.94%> (-3.41%) ⬇️
rohmu/object_storage/s3.py 66.32% <78.68%> (+2.19%) ⬆️
rohmu/util.py 73.68% <85.71%> (+24.84%) ⬆️
rohmu/object_storage/azure.py 36.28% <100.00%> (ø)
rohmu/object_storage/google.py 40.71% <100.00%> (ø)

... and 2 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch 3 times, most recently from 696209f to 33ed123 Compare July 24, 2023 16:06
@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch 2 times, most recently from ff706bd to 6d7e50a Compare July 25, 2023 15:34
@giacomo-alzetta-aiven giacomo-alzetta-aiven marked this pull request as ready for review July 25, 2023 15:34
@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch from 6d7e50a to df85b54 Compare July 27, 2023 09:14
Copy link
Contributor

@sjamgade sjamgade left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would insist on trying to reuse as much as possible.
Also the main problem with this approach is the *transfer classes get bloated as we add more code to them, it would be nice if there some Protocol could be expected from *transfer classes to better support concurrent uploads

rohmu/util.py Show resolved Hide resolved
rohmu/object_storage/s3.py Outdated Show resolved Hide resolved
rohmu/object_storage/s3.py Outdated Show resolved Hide resolved
rohmu/object_storage/s3.py Outdated Show resolved Hide resolved
rohmu/object_storage/s3.py Outdated Show resolved Hide resolved
test/test_object_storage_s3.py Outdated Show resolved Hide resolved
rohmu/object_storage/s3.py Show resolved Hide resolved
@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch from df85b54 to 5915528 Compare August 1, 2023 10:52
@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch 4 times, most recently from 70d7775 to ef9ecd2 Compare August 1, 2023 14:56
@giacomo-alzetta-aiven
Copy link
Contributor Author

I did some test trying to upload files using botocore and see the heap size. From what I can tell botocore is using a very small buffer, the RAM is basically constant during upload (I have uploaded a chunk of 2GB, and during the whole duration RAM use was constant).

I discovered that botocore has some heuristics in upload_part. I believe it tries to detect if the input stream is something like an http body or a file. The ProgressStream is treated as a file and thus it calls seek to move around and compute the md5 hash and size, so I had to add seek support for that. From the tests I've done it seems to work fine although it is a bit fragile.

@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch from f2c6500 to 70ff235 Compare August 2, 2023 10:49
ilpon
ilpon previously requested changes Aug 8, 2023
rohmu/object_storage/s3.py Outdated Show resolved Hide resolved
@giacomo-alzetta-aiven giacomo-alzetta-aiven force-pushed the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch from f78523e to 38f2ea1 Compare August 8, 2023 07:14
@ilpon
Copy link
Contributor

ilpon commented Aug 8, 2023

I'm not authorized to merge this.

@ilpon ilpon merged commit 22f91d0 into main Aug 9, 2023
6 checks passed
@ilpon ilpon deleted the giacomo-alzetta-aiven-concurrent-multipart-support-from-transfer branch August 9, 2023 04:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants