Skip to content
This repository has been archived by the owner on Jun 4, 2018. It is now read-only.

update roofline for high order #48

Open
ggorman opened this issue Sep 18, 2015 · 12 comments
Open

update roofline for high order #48

ggorman opened this issue Sep 18, 2015 · 12 comments
Assignees

Comments

@ggorman
Copy link
Contributor

ggorman commented Sep 18, 2015

Repeat benchmarks on SENAI machine (Xeon and Xeon Phi) for different spatial orders (2,4,6,8,10,12).

Need the OI and peak flops for both so we can update roofline plot.

@ggorman
Copy link
Contributor Author

ggorman commented Sep 18, 2015

@tj-sun - can you help @felippezacarias get your branch running to do the benchmarking?

@felippezacarias - can you run with a domain 512*_3 so that we can be sure most of the problem is not sitting in L3. Also - so ensure you are not messing up the alignment can you carefully set the domain size, n, such that n+boundary_depth_2 == 512.

@tj-sun
Copy link
Collaborator

tj-sun commented Sep 18, 2015

Please use the feature_higher_spatial_order branch.
grid.set_accuracy() to set the order
in command line, you can run python tests/eigenwave3d.py -so n
where n is the spatial order divided by 2. So -so 2 for 4th order.
Note that due to different implementation for the boundary conditions in different orders, the errors are not comparable between different orders. But for now I think we just focus on the kernel performance.

@tj-sun
Copy link
Collaborator

tj-sun commented Sep 18, 2015

I also just added output of kernel AI when you run python tests/eigenwave3d.py

@tj-sun
Copy link
Collaborator

tj-sun commented Sep 18, 2015

also note that the number of ghost cells equals the spatial order.
So for 4th order, if setting grid size=100, you will have 105 grid points in total (one more because both side end with grid points)
i.e. make sure that grid_size + n*2 + 1 = 512
where n is the number you pass in with -so

@tj-sun
Copy link
Collaborator

tj-sun commented Sep 18, 2015

I've just done some amendments to our AI calculation in the new commit. Currently I see 4th order weighted AI=1.46 and 8th order 2.74. Which I think is about right for float. (The article below seems to be using doubles?) I guess we will see when we got some results.

https://redmine.scorec.rpi.edu/attachments/111/roofline_for_FastMath.pdf

@felippezacarias
Copy link
Contributor

@ggorman should I use the --profiling flag and get the Mflops and walltime from papi or instrumentalize the velocity and stress kernels with time measurement like we did before?

@tj-sun I generated the codes to different orders here, but it seems that no matter what grid size or order I use, dim1, dim2 and dim3 always come with grid_size + 5. Is it correct?

@tj-sun
Copy link
Collaborator

tj-sun commented Sep 18, 2015

Hi,
The dimension should change to gridsize + 1 + 2*margin, where margin should equal to order. If that's not what you see I will take a look when I'm back home.

-----Original Message-----
From: "felippezacarias" [email protected]
Sent: ‎18/‎09/‎2015 19:26
To: "opesci/opesci-fd" [email protected]
Cc: "tj-sun" [email protected]
Subject: Re: [opesci-fd] update roofline for high order (#48)

@ggorman should I use the --profiling flag and get the Mflops and walltime from papi or instrumentalize the velocity and stress kernels with time measurement like we did before?
@tj-sun I generated the codes to different orders here, but it seems that no matter what grid size or order I use, dim1, dim2 and dim3 always come with grid_size + 5. Is it correct?

Reply to this email directly or view it on GitHub.

@ggorman
Copy link
Contributor Author

ggorman commented Sep 18, 2015

Why don't you do both (papi + hand instrument) and compare? If there is a big difference we will want to know why.

@tj-sun
Copy link
Collaborator

tj-sun commented Sep 18, 2015

@felippezacarias - you are absolutely right on the grid_size. I didn't recalculate the grid_size after setting new order. It's fixed now.

@ggorman
Copy link
Contributor Author

ggorman commented Sep 19, 2015

@tj-sun going back to your comments above "Currently I see 4th order weighted AI=1.46 and 8th order 2.74. Which I think is about right for float. (The article below seems to be using doubles?) "

This is not making sense to me. Previously we estimated that AI for 4th order was ~0.8 --- remember that initially @felippezacarias reported 1.7 and then you pointed out that this has to be divided by two to take into account floats. I could buy that figure because it was consistent with the figure of 0.94 reported in roofline_for_FastMath.pdf (BTW - your suggestion that the article was talking about double would imply that the AI for floats would be twice that again).

Can we focus on getting this right as it is a key metric.

@tj-sun
Copy link
Collaborator

tj-sun commented Sep 19, 2015

I read the article again yesterday but I think the 0.94 in the article is double precision, so I began to think our AI is too low. I checked again and found the overall calculation earlier was done wrongly. I also added boundary conditions and ghost cell adjustments (according to page 31 of the article)

@tj-sun
Copy link
Collaborator

tj-sun commented Sep 21, 2015

@felippezacarias please note that in the new commit f337943 the behaviour of setting spatial order has changed. Now -so=4 will set 4th order instead of 8th order. This is to address issue #41

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants