Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix maxHitsInModule for SiPixelRecHitFromSoAAlpaka #46686

Merged
merged 1 commit into from
Nov 13, 2024

Conversation

AdrianoDee
Copy link
Contributor

@AdrianoDee AdrianoDee commented Nov 13, 2024

PR description:

Small fix for SiPixelRecHitFromSoAAlpaka that uses the obsolete pixelClustering::maxHitsInModule() to limit the number of hits in a module. It should be TrackerTraits::maxHitsInModule. The backport to 14_1_X will fix #46656.

PR validation:

*.402 wfs for HI conditions

@AdrianoDee
Copy link
Contributor Author

type bug-fix

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 13, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @AdrianoDee for master.

It involves the following packages:

  • RecoLocalTracker/SiPixelRecHits (reconstruction)

@cmsbuild, @jfernan2, @mandrenguyen can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @VinInn, @VourMa, @dkotlins, @felicepantaleo, @ferencek, @gpetruc, @missirol, @mmusich, @mroguljic, @mtosi, @rovere, @threus, @tsusa, @tvami this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@mmusich
Copy link
Contributor

mmusich commented Nov 13, 2024

enable gpu

@AdrianoDee
Copy link
Contributor Author

test parameters:

  • relval_gpu_opt = -w upgrade
  • workflow_gpu = 14949.402, 17434.402

@AdrianoDee
Copy link
Contributor Author

please test

@mmusich
Copy link
Contributor

mmusich commented Nov 13, 2024

urgent

@AdrianoDee
Copy link
Contributor Author

please abort

@AdrianoDee
Copy link
Contributor Author

test parameters:

  • relval_opts_gpu = -w upgrade
  • workflow_gpu = 14949.402, 17434.402

@AdrianoDee
Copy link
Contributor Author

please test

@cmsbuild
Copy link
Contributor

-1

Failed Tests: RelVals-GPU
Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-46ae81/42791/summary.html
COMMIT: 2c3f90d
CMSSW: CMSSW_14_2_X_2024-11-12-2300/el8_amd64_gcc12
Additional Tests: GPU
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week1/cms-sw/cmssw/46686/42791/install.sh to create a dev area with all the needed externals and cmssw changes.

RelVals-GPU

  • 14949.40214949.402_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka/step3_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka.log

Comparison Summary

Summary:

@AdrianoDee
Copy link
Contributor Author

test parameters:

  • relval_opts_gpu = -w upgrade
  • workflow_gpu = 14949.402

@AdrianoDee
Copy link
Contributor Author

AdrianoDee commented Nov 13, 2024

Ok the failure

src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:205: void alpaka_cuda_async::vertexFinder::clusterTracksByDensity(const alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::integral_constant<unsigned long, 1UL>, unsigned int> &, reco::ZVertexLayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, reco::ZVertexTracksLayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, vertexFinder::PixelVertexWSSoALayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, int, float, float, float): block: [0,0,0], thread: [768,0,0] Assertion `static_cast<int>(foundClusters) < data.metadata().size()` failed.

needs to be investigated (I think it's related to #45887) but it is for sure unrelated to this PR.

@jfernan2
Copy link
Contributor

+1
Signing due to the urgency, @AdrianoDee please create another PR for the new failure at your earliest convenience

@cmsbuild
Copy link
Contributor

This pull request is fully signed and it will be integrated in one of the next master IBs (but tests are reportedly failing). This pull request will now be reviewed by the release team before it's merged. @sextonkennedy, @mandrenguyen, @rappoccio, @antoniovilela (and backports should be raised in the release meeting by the corresponding L2)

@mmusich
Copy link
Contributor

mmusich commented Nov 13, 2024

needs to be investigated (I think it's related to #45887) but it is for sure unrelated to this PR.

indeed, I tested in a plain CMSSW_14_2_X_2024-11-12-2300 (without anything checked out on top) with:

runTheMatrix.py --what gpu,upgrade -l 14949.402 -t 4

and I still see

$ more 14949.402_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka/step3_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka.log 
RAW2DIGI:RawToDigi_pixelOnly,RECO:reconstruction_pixelTrackingOnly,VALIDATION:@pixelTrackingOnlyValidation,DQM:@pixelTrackingOnlyDQM
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
with DB:
entry file:step2.root
Step: RAW2DIGI Spec: ['RawToDigi_pixelOnly']
Step: RECO Spec: ['reconstruction_pixelTrackingOnly']
Step: VALIDATION Spec: ['@pixelTrackingOnlyValidation']
@pixelTrackingOnlyValidation in preparing validation
Step: DQM Spec: ['@pixelTrackingOnlyDQM']
customising the process with customiseAlpakaServiceMemoryFilling from HeterogeneousCore/AlpakaServices/customiseAlpakaServiceMemoryFilling
customising the process with setCrossingFrameOn from SimGeneral/MixingModule/fullMixCustomize_cff
Starting  cmsRun  step3_RAW2DIGI_RECO_VALIDATION_DQM.py
%MSG-i ThreadStreamSetup:  (NoModuleName) 13-Nov-2024 14:45:57 CET pre-events
setting # threads 4
setting # streams 4
%MSG
%MSG-i AlpakaService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
AlpakaServiceSerialSync succesfully initialised.
Found 1 device:
  - Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
%MSG
%MSG-i CUDAService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
CUDA runtime version 12.4, driver version 12.4, NVIDIA driver version 550.127.05
CUDA device 0: Tesla T4 (sm_75)
%MSG
%MSG-i AlpakaService:  (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
AlpakaServiceCudaAsync succesfully initialised.
Found 1 device:
  - Tesla T4
%MSG
13-Nov-2024 14:46:06 CET  Initiating request to open file file:step2.root
13-Nov-2024 14:46:11 CET  Successfully opened file file:step2.root
%MSG-w NonConsumedConditionalModules:  AfterModConstruction  13-Nov-2024 14:46:14 CET pre-events
The following modules were part of some ConditionalTask, but were not
consumed by any other module in any of the Paths to which the ConditionalTask
was associated. Perhaps they should be either removed from the
job, or moved to a Task to make it explicit they are unscheduled.

 siPixelDigis@cpu
 siPixelRecHitsPreSplitting@cpu
%MSG
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:16.182 CET
Begin processing the 2nd record. Run 1, Event 6, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:16.211 CET
Begin processing the 3rd record. Run 1, Event 7, LumiSection 1 on stream 3 at 13-Nov-2024 14:46:16.229 CET
Begin processing the 4th record. Run 1, Event 5, LumiSection 1 on stream 0 at 13-Nov-2024 14:46:16.229 CET
Begin processing the 5th record. Run 1, Event 9, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:18.913 CET
Begin processing the 6th record. Run 1, Event 10, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:18.955 CET
Begin processing the 7th record. Run 1, Event 2, LumiSection 1 on stream 3 at 13-Nov-2024 14:46:18.995 CET
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:19.258 CET
Begin processing the 9th record. Run 1, Event 4, LumiSection 1 on stream 0 at 13-Nov-2024 14:46:19.258 CET
Begin processing the 10th record. Run 1, Event 3, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:20.056 CET
src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:205: void alpaka_cuda_async::vertexFinder::clusterTracksByDensity(const alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::in
tegral_constant<unsigned long, 1UL>, unsigned int> &, reco::ZVertexLayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, reco::ZVertexTracksLayout<128UL, false>::ViewTemplateFreeParam
s<128UL, false, true, true> &, vertexFinder::PixelVertexWSSoALayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, int, float, float, float): block: [0,0,0], thread: [640,0,0] Asserti
on `static_cast<int>(foundClusters) < data.metadata().size()` failed.

@makortel
Copy link
Contributor

Should 14949.402 be added to the set of workflows being tested in IBs?

@mmusich
Copy link
Contributor

mmusich commented Nov 13, 2024

Should 14949.402 be added to the set of workflows being tested in IBs?

sounds like it should.

@mandrenguyen
Copy link
Contributor

ignore tests-rejected with ib-failure
See discussion above. Failures are unrelated to this PR.

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 4b5dede into cms-sw:master Nov 13, 2024
13 of 14 checks passed
@AdrianoDee AdrianoDee deleted the fix_hits_in_module_converter branch November 13, 2024 15:13
@mmusich
Copy link
Contributor

mmusich commented Nov 13, 2024

sounds like it should.

I opened #46693 to keep track.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment