You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Joe at Uni Bristol looked into ways of improving distance storage and calculations. I past here some text. Problem: I have sets of global climate data at a reasonably high spatial resolution (720x360), but when I first tried out gen3sis with these landscape grids, I quickly ran into vector size allocation problems. I've been running my simulations at a lower resolution (180x90) with great results. Possible general solution: Ways in which the simulation capacity could be boosted and I found a potential solution in the tools I was developing. Description: In short, it relies on representing the gridded landscape as a graph of local connections which needs much less storage space than a full sparse distance matrix for a landscape grid of a given size. Any pairwise distances that are needed in the other internal functions can then be found very quickly using the cppRouting R package, maintaining the efficiency of the simulation loop at run time. The graph method also has a few other potential benefits, which are what led me onto the problem of working with gridded landscapes in the first place I am still figuring out a few issues with my implementation (in particular, I am struggling with maintaining the optimality of the method in the tbdscan C++ functions), but if this would potentially be of interest to you, I would be very happy to send over some of the code I have put together. I have tried to make the edits as compatible with the current structure of gen3sis as possible, but they are still non-trivial so I thought it might be best to contact you via email, rather than submitting any edits via Github or similar.
The text was updated successfully, but these errors were encountered:
I agree that the current implementation of the distance matrices is the part of the simulation with the worst scaling behaviour.
Part of this idea has already been implemented. The distances can be saved as a sparse pair-wise matrix (local_distances). Currently we can either save the local distances or the full all-to-all distances. We can calculate limited distance matrices during a simulation run only (for the moment). This will be updated to allow pre-computing those partial distance matrices "soonish".
Calculating pair-wise distances on demand will have tremendous impacts on performance and/or require a quite elaborate caching scheme. The distance matrices are used for different things, obviously in the dispersal, but also for the clustering in both the evolution and speciation functions.
Switching over to such a scheme would completely replace the tdbscan function calls and "should" work with the cpp implementation for the distances by setting all inhabited cells as starting cells for the distance calculations.
It should be "straight forward" to change those implementations over to a graph based approach, but the computational overhead is going to be tremendously expensive as those calculations will have to either be cached or repeated multiple times for each species in the simulation.
If you (Joe) are interested in exploring this in detail I suggest we do a direct call to discuss this upfront?
Joe at Uni Bristol looked into ways of improving distance storage and calculations. I past here some text.
Problem: I have sets of global climate data at a reasonably high spatial resolution (720x360), but when I first tried out gen3sis with these landscape grids, I quickly ran into vector size allocation problems. I've been running my simulations at a lower resolution (180x90) with great results.
Possible general solution: Ways in which the simulation capacity could be boosted and I found a potential solution in the tools I was developing.
Description: In short, it relies on representing the gridded landscape as a graph of local connections which needs much less storage space than a full sparse distance matrix for a landscape grid of a given size. Any pairwise distances that are needed in the other internal functions can then be found very quickly using the cppRouting R package, maintaining the efficiency of the simulation loop at run time. The graph method also has a few other potential benefits, which are what led me onto the problem of working with gridded landscapes in the first place I am still figuring out a few issues with my implementation (in particular, I am struggling with maintaining the optimality of the method in the tbdscan C++ functions), but if this would potentially be of interest to you, I would be very happy to send over some of the code I have put together. I have tried to make the edits as compatible with the current structure of gen3sis as possible, but they are still non-trivial so I thought it might be best to contact you via email, rather than submitting any edits via Github or similar.
The text was updated successfully, but these errors were encountered: