Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug fixes for 1) over-counting instructions 2) broken functional sim #142

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

SerinaTan
Copy link

@SerinaTan SerinaTan commented Aug 8, 2019

Apologies for accidentally merging two separate commits into a single PR. (I didn't know pushing a new commit to the same branch will update the existing PR...). Anyhow, here are the descriptions for the two bug fixes:

Fix No.1:
In case of a "vector load" instruction which has multiple register destinations (e.g. ld.global.v4.u32 {%r1903, %r1902, %r1905, %r1904}, [%rd384]), ldst_unit::L1_latency_queue_cycle() would call warp_inst_complete() multiple times and hence over-count the number of completed instructions. This behavior is inconsistent with other ldst_unit functions such as ldst_unit::writeback().

Fix: move the warp_inst_complete() call out of the for loop iterating output registers.

Fix No.2:
In gpgpu_cuda_ptx_sim_main_func, kernel.increment_cta_id() is NOT called when checkpoint option is disabled during functional simulation. This causes functionalCoreSim::initializeCTA to be called twice in a row with the same cta_id and eventually a seg fault.

Fix: move kernel.increment_cta_id() out of the else block so that it's always executed. I am assuming the intended behaviour of the if-block is to only allow 1) checkpoint off or 2) prior to perf sim resume to execute the cta instructions. However, whether this if-condition is true/false, we should always increment the kernel's cta id or else the while loop won't break.

@SerinaTan SerinaTan changed the title Bug fix: over counting completed instruction for vector load Bug fixes for 1) over-counting instructions 2) broken functional sim Aug 23, 2019


if(cp_op==0 || (cp_op==1 && cta_launched<cp_cta_resume && kernel.get_uid()==cp_kernel) || kernel.get_uid()< cp_kernel) // just fro testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, Serina - do you know what this if is testing for?
I know this is not your code - but your change effect code flows when this thing is true and I cannot tell from quick inspection what it is for.

Also, thanks for the fixes! :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the if-statement guards whether to perform functional simulation for each CTA. It is evaluated to true when any of the following is true: 1) not doing checkpoint cp_op==0 2) checkpoint is on and we have reached the checkpoint kernel boundary, but we have not reached the CTA boundary cp_op==1 && cta_launched<cp_cta_resume && kernel.get_uid()==cp_kernel 3) we have not reached the checkpoint kernel boundary kernel.get_uid() < cp_kernel. We need to call kernel.increment_cta_id() so that the functional simulation can progress with an updated cta id (returned by kernel.get_next_cta_id_single()).

If this if-statement is off, we have reached the checkpoint boundary and we should halt functional simulation and transition into performance simulation. However, we still need to break the while loop by incrementing the kernel cta id (kernel.increment_cta_id()).

That being said, I am actually not sure how this code EVER worked with checkpointing...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants