WIP: Move from mt to dist #24

andresrossb · 2021-12-24T23:04:25Z

Created pfor, pfor observations and lsqDB_dist.
will not run yet, still need to add using distributed and initialize the workers.
It's simply a starting idea on how to build the matrix.

cortner · 2021-12-24T23:19:24Z

thank you, Andres. I see this is a PR from a fork. Would you be willing to give myself and @wcwitt push access so we can collaborate on this PR?

Created pfor, pfor observations and lsqDB_dist. will not run yet, still need to add using distributed and initialize the workers. It's simply a starting idea on how to build the matrix.

wcwitt · 2022-01-31T22:28:20Z

My last version used distributed assembly, but with the design matrix as a SharedArray, which meant it couldn't work across multiple nodes. I now have a version with the design matrix as a DArray. I still convert it back to a regular array for the solvers, but they'll be the next step.

However, I recently rebased on the latest IPFitting, so I'll need to force push to get it here. Do you mind?

cortner · 2022-02-02T21:30:18Z

What do you mean "get it here"? You want it to go onto the v0.10 branch? I think then it needs to be rebased. Unfortunately we started planning this before the decision was made to retire IPFitting.

cortner · 2022-02-02T21:31:01Z

specifically I think we need to rebase onto main - how do you feel about this? Or do you actually WANT it on the 0.5.x branch?

wcwitt · 2022-02-02T21:44:32Z

I mean I've already rebased to v0.10, so I would need to force push here. And this isn't my PR, so thought I should ask first.

cortner · 2022-02-02T21:45:44Z

oh, I see, perfect. yes please go ahead - can you edit the target branch, or shall I?

wcwitt · 2022-02-02T21:55:53Z

Thanks. I'm not able to edit the target branch, sorry

wcwitt · 2022-02-02T21:58:38Z

More generally, I don't think we should merge this until I have at least one of the solvers working with the distributed matrix. I'll keep rebasing to v.0.x as appropriate and flag you when it's ready for discussion/review

cortner · 2022-02-02T22:11:18Z

I think once it is rebased onto main you can just start merging changes in main into this branch. Personally I like that better, but I guess it is a matter of taste?

cortner · 2022-02-02T22:13:32Z

just given you write access; so once this is done you can keep the PRs here if you prefer.

…terwards.

wcwitt · 2022-02-08T22:01:41Z

While attempting to use the LSQR routine from IterativeSolvers, I've discovered some of the DistributedArray functionality is rather brittle.

For example, thus far I have been distributing over the configs for convenience, which means that the submatrices belonging to different workers are not all the same size. (Not great for load balancing, but the easiest way to start.) Constructing the DArray this way works fine. But it turns out some of the DArray math routines (e.g., matrix multiplication) implicitly assume submatrices of equal size.

Just noting this for now - still need to figure out a solution, likely by distributing more carefully.

cortner · 2022-02-08T22:04:34Z

what about distributing one way for assembly and then "redistribute" them for the linalg? But this is extremely weird in my view.

wcwitt · 2022-02-09T11:25:17Z

Yeah that might work, although I'm trying to avoid ever putting the full matrix on one worker, which makes it a little tricky. I'll keep thinking about it.

In the meantime, I filed an issue, JuliaParallel/DistributedArrays.jl#237.

jameskermode · 2022-02-21T17:16:43Z

We also needed to use constant size blocks for the gap_fit parallelisation as it's a ScaLAPACK requirement. We had also distributed by configuration, applying some heuristics to give reasonable workload balance. Until now we added zero padding to the blocks to equalise the sizes which doesn't change to solution to the linear system (so long as you add zeros the RHS vectors as well) but is not optimal for memory usage or time, so we're rethinking whether it would have been better to distribute completely evenly. @Sideboard can fill in more details when we speak.

wcwitt · 2022-02-21T17:37:26Z

I've been thinking about this and I'm leaning towards distributing fully evenly. It won't take that much more code than the zero padding. Will be good to hear your experience

cortner mentioned this pull request Dec 24, 2021

Move from MT to Dist #23

Open

cortner changed the title ~~First attempt at distributed~~ WIP: Move from mt to dist Dec 24, 2021

andresrossb and others added 2 commits January 28, 2022 22:17

First attempt at distributed

4e84c13

Created pfor, pfor observations and lsqDB_dist. will not run yet, still need to add using distributed and initialize the workers. It's simply a starting idea on how to build the matrix.

enables distributed matrix assembly.

511a4e3

cortner changed the base branch from master to main February 2, 2022 22:11

now builds matrix as DistributedArray, converting to regular array af…

f953caf

…terwards.

wcwitt force-pushed the distdev branch from c76604f to f953caf Compare February 8, 2022 12:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Move from mt to dist #24

WIP: Move from mt to dist #24

andresrossb commented Dec 24, 2021

cortner commented Dec 24, 2021

wcwitt commented Jan 31, 2022

cortner commented Feb 2, 2022

cortner commented Feb 2, 2022 •

edited

Loading

wcwitt commented Feb 2, 2022

cortner commented Feb 2, 2022

wcwitt commented Feb 2, 2022

wcwitt commented Feb 2, 2022

cortner commented Feb 2, 2022

cortner commented Feb 2, 2022

wcwitt commented Feb 8, 2022

cortner commented Feb 8, 2022

wcwitt commented Feb 9, 2022

jameskermode commented Feb 21, 2022

wcwitt commented Feb 21, 2022

WIP: Move from mt to dist #24

Are you sure you want to change the base?

WIP: Move from mt to dist #24

Conversation

andresrossb commented Dec 24, 2021

cortner commented Dec 24, 2021

wcwitt commented Jan 31, 2022

cortner commented Feb 2, 2022

cortner commented Feb 2, 2022 • edited Loading

wcwitt commented Feb 2, 2022

cortner commented Feb 2, 2022

wcwitt commented Feb 2, 2022

wcwitt commented Feb 2, 2022

cortner commented Feb 2, 2022

cortner commented Feb 2, 2022

wcwitt commented Feb 8, 2022

cortner commented Feb 8, 2022

wcwitt commented Feb 9, 2022

jameskermode commented Feb 21, 2022

wcwitt commented Feb 21, 2022

cortner commented Feb 2, 2022 •

edited

Loading