Dask Chunking Approach: Example in Documentation #65

dougbrn · 2024-10-17T22:32:14Z

We've discussed there being motivation for a chunking approach as an alternative to sending massive task graphs to Dask. The main appeal being that chunking can potentially provide a more memory-stable compute at the cost of adding some looping overhead to the overall performance, which would help users that run into dask issues avoid dask troubleshooting as their only path forward.

@wilsonbb and I talked about this in more depth, and we came to the conclusion that the likely best output of this would be to have an example within our documentation that shows how one would do this on something like workflow in #42 . This is preferable to building a bespoke chunk function, as a built-in function would have many limitations regarding the graphs it can chunk (for example anything where a global value is computed) and therefore may set bad expectations for users. And building something that's more general would risk building an entire dask streaming interface that directly competes with Dask's workflow.

The first step to this is to actually verify that a chunking approach performs well, which @wilsonbb has agreed to explore as part of his exploration in #42

The text was updated successfully, but these errors were encountered:

dougbrn added the documentation Improvements or additions to documentation label Oct 17, 2024

wilsonbb self-assigned this Oct 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dask Chunking Approach: Example in Documentation #65

Dask Chunking Approach: Example in Documentation #65

dougbrn commented Oct 17, 2024

Dask Chunking Approach: Example in Documentation #65

Dask Chunking Approach: Example in Documentation #65

Comments

dougbrn commented Oct 17, 2024