You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What are you trying to achieve?
Transfer AMI binary between S3 us-east-2 & us-gov-east-1
What is the expected result?
Transfer completes with performance on-par or close to AWS CLI.
What are you seeing instead?
Transfer is taking ~2-3min per gigabyte which is much slower than CLI.
Steps/code to reproduce the problem
To be clear, smart_open IS working. However, I will not be able to use it for my project because the speed is too slow. My largest file presently is ~27GB. At 2min per gigabytes that is ~54min to transfer a single file. Am I utilize this project correctly? If there are suggestions to increase performance, I would very much appreciate more info.
did you try reading and writing buffer_size byte chunks instead of reading and writing line-by-line? for multipart upload you can go up to smart_open.s3.MAX_PART_SIZE (5GiB).
while (chunk:=fr.read(buffer_size)):
fw.write(chunk)
the line iterator checks every character for carriage returns: big chance your code is CPU bound and not IO bound.
if you have enough RAM/swap, you can save yourself some API charges by doing only a single GET (doing a single fr.read() without size argument) and then a single PUT (doing a single fw.write() with multipart_upload=False transport_param).
multiple chunk reads (GETs) and multiple part writes (PUTs, plus init, plus commit) are all billed by AWS
Problem description
What are you trying to achieve?
Transfer AMI binary between S3 us-east-2 & us-gov-east-1
What is the expected result?
Transfer completes with performance on-par or close to AWS CLI.
What are you seeing instead?
Transfer is taking ~2-3min per gigabyte which is much slower than CLI.
Steps/code to reproduce the problem
To be clear, smart_open IS working. However, I will not be able to use it for my project because the speed is too slow. My largest file presently is ~27GB. At 2min per gigabytes that is ~54min to transfer a single file. Am I utilize this project correctly? If there are suggestions to increase performance, I would very much appreciate more info.
AWS CLI - python output as well
Versions
Checklist
Before you create the issue, please make sure you have:
The text was updated successfully, but these errors were encountered: