Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproduce benchmark results from SP paper #4

Open
bennn opened this issue Aug 3, 2023 · 10 comments
Open

Reproduce benchmark results from SP paper #4

bennn opened this issue Aug 3, 2023 · 10 comments

Comments

@bennn
Copy link
Member

bennn commented Aug 3, 2023

The Static Python paper has many tables of benchmark results in the appendix:

https://programming-journal.org/2023/7/2/

Let's reproduce these to make sure the current version of SP is aligned with the one from the paper.

  1. Make a copy of their benchmarks in this repo --- in case we need to modify them, or the SP versions change in the future
  2. Check that every version of the benchmarks runs. (Expect issues with nqueens and fannkuch.)
  3. Write a script to run each benchmark as described in the paper
  4. Convert the results to fractions. Normalize by the left-most column in the paper's tables. Do the same conversion for the paper's numbers. Do the fractions match up, with similar slowdowns & speedups?

The script in step 3 will be the starting point for our own measurements.

Paper source: https://github.com/brownplt/insta-model

@CrypticNumbers8

@CrypticNumbers8
Copy link
Contributor

Hello Professor,
I ran all the scripts in the paper(except orig) on Richards, Deltablue, and Nbody, however, I could not get Fannkuch to run without errors.
I got the data for running time duration for these three benchmarks for T-Max and T-Min for 10 iterations and I have normalized these values by dividing them from first column(T-Max SP JIT SF) values.

I will compare these normalized values with values in the paper today and let you know the results.

@bennn
Copy link
Member Author

bennn commented Aug 7, 2023

Sounds good, but please push everything to this repo! Benchmarks, plain data, normalized data.

What's the fannkuch error?

@CrypticNumbers8
Copy link
Contributor

Fannkuch error is:
Traceback (most recent call last):
File "/vol/cinder/Tools/benchmarks/fannkuch_static.py", line 12, in
res = fannkuch(DEFAULT_ARG)
File "/vol/cinder/Tools/benchmarks/fannkuch_static_lib.py", line 19, in fannkuch
count: ArrayI64 = ArrayI64(range(1, nb + 1))
TypeError: an integer is required

I am working to get the orig results today on Richards, Deltablue, and Nbody.

@bennn
Copy link
Member Author

bennn commented Aug 9, 2023

Ok. You'll want to delete the type alias ArrayI64 and instead use an array type directly.

See the version of fannkuch in the Benchmarks folder: https://github.com/utahplt/static-python-perf/blob/main/Benchmark/fannkuch/advanced/main.py

Same for nqueens!

@bennn
Copy link
Member Author

bennn commented Aug 9, 2023

Some of the numbers are strange. For example DeltaBlue T-Min says that the JIT alone is far better than the SP JIT SP combo:

Iteration,T-Min SP JIT SF,T-Min SP JIT,T-Min SP,T-Min JIT SF,T-Min JIT,T-Min
Iteration 1,8.663,9.058,10.496,7.142,1.557,8.722

How'd you get the numbers? Commit & push the script.

@CrypticNumbers8
Copy link
Contributor

I just pushed the script professor inside the deltablue folder. I will soon push the scripts for other benchmarks as well.
Working on getting fannkuch and nqueens to run right now.

@CrypticNumbers8
Copy link
Contributor

My apologies, I think I found my mistake, I wrote the commands for richards, and just replaced richards for the other benchmarks, I will make sure to get exact commands for each benchmark one by one from the paper.

@bennn
Copy link
Member Author

bennn commented Aug 10, 2023

Write a script that runs everything, for any benchmark! We'll need to use it a LOT in the future.

@CrypticNumbers8
Copy link
Contributor

On it!

@CrypticNumbers8
Copy link
Contributor

Hello professor,
I just completed the script and committed it, it worked on all the 4 benchmarks(richards, deltablue, fannkuch, and nbody) without any errors for a single iteration. Currently, I am running it for 10 iterations which will take considerably long time, I will push the tables soon as they are done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants