Bug fixes for 1) over-counting instructions 2) broken functional sim #142
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Apologies for accidentally merging two separate commits into a single PR. (I didn't know pushing a new commit to the same branch will update the existing PR...). Anyhow, here are the descriptions for the two bug fixes:
Fix No.1:
In case of a "vector load" instruction which has multiple register destinations (e.g.
ld.global.v4.u32 {%r1903, %r1902, %r1905, %r1904}, [%rd384]
), ldst_unit::L1_latency_queue_cycle() would call warp_inst_complete() multiple times and hence over-count the number of completed instructions. This behavior is inconsistent with other ldst_unit functions such as ldst_unit::writeback().Fix: move the warp_inst_complete() call out of the for loop iterating output registers.
Fix No.2:
In
gpgpu_cuda_ptx_sim_main_func
,kernel.increment_cta_id()
is NOT called when checkpoint option is disabled during functional simulation. This causesfunctionalCoreSim::initializeCTA
to be called twice in a row with the same cta_id and eventually a seg fault.Fix: move
kernel.increment_cta_id()
out of the else block so that it's always executed. I am assuming the intended behaviour of the if-block is to only allow 1) checkpoint off or 2) prior to perf sim resume to execute the cta instructions. However, whether this if-condition is true/false, we should always increment the kernel's cta id or else the while loop won't break.