-
Notifications
You must be signed in to change notification settings - Fork 287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async/data persistence #2829
Async/data persistence #2829
Conversation
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2829 +/- ##
==========================================
- Coverage 87.23% 82.79% -4.45%
==========================================
Files 26 3 -23
Lines 1426 186 -1240
==========================================
- Hits 1244 154 -1090
+ Misses 182 32 -150 ☔ View full report in Codecov by Sentry. |
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
…d function Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
flytekit/core/data_persistence.py
Outdated
return fsspec.filesystem(protocol, **storage_options) | ||
return fsspec.filesystem(protocol, **kwargs) | ||
|
||
def get_async_filesystem_for_path(self, path: str = "", anonymous: bool = False, **kwargs) -> AsyncFileSystem: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
def get_async_filesystem_for_path(self, path: str = "", anonymous: bool = False, **kwargs) -> AsyncFileSystem: | |
async def get_async_filesystem_for_path(self, path: str = "", anonymous: bool = False, **kwargs) -> AsyncFileSystem: |
Can you share the code that you used for benchmarking? |
https://github.com/unionai/debugyt/blob/master/user/ytong/src/yt_dbg/data_perf/measure_wfs.py |
Signed-off-by: Yee Hing Tong <[email protected]>
kf-pytorch tests are hanging (notice how they ran for 5h+): https://github.com/flyteorg/flytekit/actions/runs/11451023024/job/31860491868?pr=2829#step:7:1 I kicked off a re-run but it seems like it's also hanging. |
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is amazing.
Why are the changes needed?
A previous PR added async versions of the type engine. This change leverages that by converting flytekit's main I/O layer to async as well.
What changes were proposed in this pull request?
get_filesystem
to take into account passed in kwargs even if no special if condition handling is present (this affected using https filesystem).How was this patch tested?
Tested locally and then tested on internal clusters. Performance benefit is mainly noticeable across large number of small files. A task that produces a list of 1000 files, 2MB each, decreases from around 135s to about 40s on a t3a.xlarge. No improvement is seen for folders because those were already async.
Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link