-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel scaling: should we care about it? #2
Comments
My experience with WENO is with finite difference WENO. If you use symmetric, bandwidth optimized WENO, you can perform high fidelity turbulence DNS at moderate and high Mach numbers with only 4th order accurate spatial discretization. On a 4th order accurate symmetric, bandwidth-optimized WENO scheme, each candidate stencil is only 4 grid points and the collection of candidate stencils span 8 grid points. While the smoothness indicator computation is one of the most expensive parts of the computation, I don’t understand what your concerns about parallel efficiency are, regarding the non-linearity of the WENO scheme and the small stencil size. The non-linearity of the scheme arises from the smoothness measurements which are used to weight the candidate stencils. This is a mathematical non linearity, since the stencil weights are now some function of the solution at those points, unlike a linear finite difference scheme, where the stencil coefficients/weights are typically constant. It does not refer to the algorithmic complexity. The algorithmic complexity is linear, constant number of operations per grid point. This makes load balancing trivial: try to match the sizes of the subdomains used on different cores/MPI ranks/etc. I am also confused as to why you think the small stencil size will hurt the parallel efficiency. The latency associated with sending boundary points will be the same, no matter how large the stencil (i.e., how much data needs to be sent), but sending less data means that you won’t have to wait as long from the time that the receiver gets the first part of the boundary until it gets the last. Perhaps in the context of finite-volume simulations there are other issues to consider. I agree that you shouldn’t spend much/any time optimizing at first. Pretty much every operation can be expressed in vector operations, without forming matrices. This helps make it natural for a compiler to vectorize. If you’re performing flux splitting, some information can be reused between neighboring cells. |
Hi @zbeekman
Indeed I am not really concerned about parallelism, I just like to create a room where we can discuss about this.
Sure this is not a matter.
Not at all, I was not clear (sorry for my bad English). What I meant is: probably, there is no room for speeding up the actual weno interpolation by parallelize it via OpenMP or MPI just due to the reduced size of the stencil computation (maybe we lost more time in communications rather than in computations in this case). On the contrary, the fluxes computations procedures (taken as an example of outer procedures calling weno interpolator) generally operate on larger stencil, thus they can be easily parallelized. In my typical application, a domain decomposition is done over mesh blocks and parallilized via MPI, the computations into each block are parallelized via OpenMP and "atomic" computations are (hopefully) vectorized. In this regards, I think that we should care that our WENO library is just thread-safe (in order to be safely used by user procedures being eventually parallelized) and "internally vectorizable". In order to be thread-safe, I plan to exploit Rouson lessons: encapsulate all data in one object it being thread-safe via hiding data and exposing only the necessary thread-safe methods. I am using an abstract derived type as a contract for actual weno interpolator that we will implement with different algorithms, but providing the same API. As soon as I make clear my messy idea, I would like to test Rouson lessons on Factory pattern to this aim, but this is an argument of another discussion... To summarize, these are my conclusions:
Do you agree? |
I agree and think that if your aim is to create an efficient WENO library for interpolation, that tries to divorce itself from notions of CFD, Reimann solvers, etc. then yes, it should be thread safe, and need not worry at all about parallelism or threading explicitly. |
Yes, we are interested in CFD, but many Fortraners no, we should think to the whole of our community. Today I hope to push some updates. |
WENO schemes are in general non linear procedure (due to the smoothness indicators computation) that could take a not so negligible cpu-time. However, WENO procedures typically operate on a very reduced size stencil that seems (at a first view) to not offer much space for parallel scaling.
As Rouson cleary states into his great book, preliminary optimization could be very dangerous: for the moment we do not care about eventual performance bottlenecks for parallel architectures. However, here I would like to discuss about future strategies for supporting parallel architectures.
As a first guess, I suppose that the main parallel features that library should provides asap are:
In this contest, your experience (I am thinking to Zaak, Andrea, Francesco, Rouson, Muller, Americi and many other members of our group) is very important. Please, feel free to post any pertinent comments.
The text was updated successfully, but these errors were encountered: