Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Async/Batching of coroutines #2855

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open

Async/Batching of coroutines #2855

wants to merge 8 commits into from

Conversation

wild-endeavor
Copy link
Contributor

@wild-endeavor wild-endeavor commented Oct 24, 2024

Why are the changes needed?

Because of the unbounded nature of two of the transformers, we noticed higher levels of memory usage when running tests. This runs the coroutines in batches.

What changes were proposed in this pull request?

For the list and dict transformers only, instead of doing asyncio.gather on everything at once, run them via fsspec's helper function that batches coroutines. This has the added benefit that it makes the batch size controllable through fsspec's config mechanism.

How was this patch tested?

Setup process

Screenshots

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Docs link

Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Signed-off-by: Yee Hing Tong <[email protected]>
Copy link

codecov bot commented Oct 24, 2024

Codecov Report

Attention: Patch coverage is 16.66667% with 5 lines in your changes missing coverage. Please review.

Project coverage is 43.74%. Comparing base (3fc51af) to head (22552f3).
Report is 2 commits behind head on master.

Files with missing lines Patch % Lines
flytekit/core/type_engine.py 16.66% 5 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2855      +/-   ##
==========================================
- Coverage   45.53%   43.74%   -1.80%     
==========================================
  Files         196      196              
  Lines       20418    20436      +18     
  Branches     2647     2649       +2     
==========================================
- Hits         9298     8940     -358     
- Misses      10658    11271     +613     
+ Partials      462      225     -237     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@wild-endeavor wild-endeavor changed the title wip Async/batching Async/Batching of coroutines Oct 24, 2024
@wild-endeavor wild-endeavor marked this pull request as ready for review October 24, 2024 15:23
Signed-off-by: Yee Hing Tong <[email protected]>
@@ -22,6 +22,7 @@
import msgpack
from dataclasses_json import DataClassJsonMixin, dataclass_json
from flyteidl.core import literals_pb2
from fsspec.asyn import _run_coros_in_chunks # pylint: disable=W0212
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we bring in this code from fsspec? I rather not depend on private API from another library.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fsspec is not going anywhere. I don't understand why that particular method is private, but that function existed in its current form since 2021. IMO it's safe to assume that's going to live there for a long time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asked them to make it public: fsspec/filesystem_spec#1740. i think we should just use the private function until they make it public. if they refuse, we can always copy later. If we copy, i'll also be copying in the config code cuz I think it is nice to have some user facing controls around that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan? is this satisfactory?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants