Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paused jobs in Prompt Reco due to MaxPSS reached #46040

Open
malbouis opened this issue Sep 18, 2024 · 47 comments
Open

Paused jobs in Prompt Reco due to MaxPSS reached #46040

malbouis opened this issue Sep 18, 2024 · 47 comments

Comments

@malbouis
Copy link
Contributor

Dear all,

There are two jobs that failed due to MaxPSS reached in Tier0 processing:

  • for Muon0 PD, collision run number 385728;
  • For Muon1 PD, collision run number 385738.

The tar ball can be found at /eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/job_569724

Would experts please please investigate?

Thanks!

@cmsbuild
Copy link
Contributor

cmsbuild commented Sep 18, 2024

cms-bot internal usage

@cmsbuild
Copy link
Contributor

A new Issue was created by @malbouis.

@Dr15Jones, @antoniovilela, @makortel, @mandrenguyen, @rappoccio, @sextonkennedy, @smuzaffar can you please review it and eventually sign/assign? Thanks.

cms-bot commands are listed here

@makortel
Copy link
Contributor

The tar ball can be found at /eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/job_569724

There seems to be very little information there. E.g. I don't see CMSSW logs or the configuration. The wmagentJob.log shows

2024-09-18 03:42:42,498:INFO:PerformanceMonitor:PSS: 8215517; RSS: 8681824; PCPU: 212; PMEM: 3.2
<cut>
2024-09-18 06:13:09,492:INFO:PerformanceMonitor:PSS: 10095653; RSS: 10286620; PCPU: 756; PMEM: 3.9
<cut>
2024-09-18 09:03:30,419:INFO:PerformanceMonitor:PSS: 10273310; RSS: 10381460; PCPU: 766; PMEM: 3.9
<cut>
2024-09-18 09:23:35,351:INFO:PerformanceMonitor:PSS: 10693552; RSS: 10806524; PCPU: 767; PMEM: 4.1
<cut>
2024-09-18 09:38:36,992:INFO:PerformanceMonitor:PSS: 11046040; RSS: 11246436; PCPU: 767; PMEM: 4.2
2024-09-18 09:43:38,421:INFO:PerformanceMonitor:PSS: 11318562; RSS: 11471652; PCPU: 767; PMEM: 4.3
2024-09-18 09:48:38,893:INFO:PerformanceMonitor:PSS: 11658895; RSS: 11866360; PCPU: 767; PMEM: 4.5
2024-09-18 09:53:39,318:INFO:PerformanceMonitor:PSS: 11840042; RSS: 11950812; PCPU: 768; PMEM: 4.5
2024-09-18 09:58:39,852:INFO:PerformanceMonitor:PSS: 12126577; RSS: 12207752; PCPU: 768; PMEM: 4.6
2024-09-18 10:03:40,190:INFO:PerformanceMonitor:PSS: 12468206; RSS: 12574136; PCPU: 768; PMEM: 4.7
2024-09-18 10:08:40,511:INFO:PerformanceMonitor:PSS: 12553465; RSS: 12652744; PCPU: 768; PMEM: 4.8
2024-09-18 10:13:40,954:INFO:PerformanceMonitor:PSS: 13047006; RSS: 13150440; PCPU: 768; PMEM: 4.9
2024-09-18 10:18:41,530:INFO:PerformanceMonitor:PSS: 13397210; RSS: 13525900; PCPU: 768; PMEM: 5.1
2024-09-18 10:23:42,898:INFO:PerformanceMonitor:PSS: 13576974; RSS: 13752052; PCPU: 768; PMEM: 5.2
2024-09-18 10:28:44,000:INFO:PerformanceMonitor:PSS: 14084790; RSS: 14202436; PCPU: 768; PMEM: 5.3
2024-09-18 10:33:44,468:INFO:PerformanceMonitor:PSS: 14320801; RSS: 14423916; PCPU: 768; PMEM: 5.4
2024-09-18 10:38:45,646:INFO:PerformanceMonitor:PSS: 14525319; RSS: 14654568; PCPU: 768; PMEM: 5.5
2024-09-18 10:43:46,187:INFO:PerformanceMonitor:PSS: 14916861; RSS: 15010812; PCPU: 768; PMEM: 5.7
2024-09-18 10:48:46,523:INFO:PerformanceMonitor:PSS: 15372452; RSS: 15477132; PCPU: 769; PMEM: 5.8
2024-09-18 10:53:47,070:INFO:PerformanceMonitor:PSS: 15506350; RSS: 15730204; PCPU: 769; PMEM: 5.9
2024-09-18 10:58:47,469:INFO:PerformanceMonitor:PSS: 15627389; RSS: 15716988; PCPU: 769; PMEM: 5.9
2024-09-18 11:03:47,648:INFO:PerformanceMonitor:PSS: 15862479; RSS: 15958220; PCPU: 769; PMEM: 6.0
2024-09-18 11:08:47,876:INFO:PerformanceMonitor:PSS: 16160977; RSS: 16250008; PCPU: 769; PMEM: 6.1
2024-09-18 11:08:47,877:ERROR:PerformanceMonitor:Error in CMSSW step cmsRun1
Number of Cores: 8
Job has exceeded maxPSS: 16000 MB
Job has PSS: 16160 MB

The job stayed quite steadily under 11 GB for almost 6 hours, and then in the last 1.5 hours the memory usage increased by 5 GB.

@makortel
Copy link
Contributor

assign reconstruction, dqm

Just guessing the high memory usage would be caused by the application code

@cmsbuild
Copy link
Contributor

New categories assigned: reconstruction,dqm

@jfernan2,@mandrenguyen,@rvenditti,@syuvivida,@tjavaid,@nothingface0,@antoniovagnerini you have been requested to review this Pull request/Issue and eventually sign? Thanks

@jeyserma
Copy link

We have another paused job for the muon PD with exceeding memory. The tarball can be found here:

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1

I copied the RAW input file to the following location so that the issue can be reproduced anytime:

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1/8959d673-4a4c-487b-8e25-213767c3a788.root

Best,
Jan

@makortel
Copy link
Contributor

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1

Looking the cmsRun log by eye the RSS seems to grow from 4.3 GB around 9th event to 15.5 GB around 204907th event. The growth seems to be gradual rather than moving up and down. This growth would correspond to about 58 kB/event hoarding or leak.

@makortel
Copy link
Contributor

I plotted the RSS and VSIZE vs timestamp (vs the event record number gives similar picture and got
image
image

that looks pretty much what one would expect from hoarding or leak.

@davidlange6
Copy link
Contributor

confused... the log file in /eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1/job/WMTaskSpace/cmsRun1

is from run 386037 - or so the fwk thinks.

the job ran for 40 hrs and 200k events - eg, 9kHz into a pd. seems garbage data (certainly not good cosmics/circulating data)

@davidlange6
Copy link
Contributor

28-Sep-2024 07:02:12 UTC Initiating request to open file root://eoscms.cern.ch//eos/cms/tier0/store/data/Run2024H/Cosmics/RAW/v1/000/386/037/00000/8959d673-4a4c-487b-8e25-213767c3a788.root?eos.app=cmst0

indeed, lumi section 100 of this run has very high rates.

@jeyserma
Copy link

jeyserma commented Oct 1, 2024

Oh sorry, I mixed up the job tarballs of a different paused job. Please ignore my previous comment about the files. The one you analyzed (run 386037) was a cosmic run where DT got out of global run at LS ~ 100.

I will try to get another example, hoping the files are still on disk.
Sorry for the inconvenience.

@jeyserma
Copy link

jeyserma commented Oct 1, 2024

I found the correct tarball + RAW file and copied them over to the same location (removed the old files):

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1

The maxPSS error is visible in the wmagentJob.log file:

2024-09-29 09:33:54,759:INFO:PerformanceMonitor:PSS: 9373714; RSS: 9630908; PCPU: 678; PMEM: 3.6
2024-09-29 09:38:55,498:INFO:PerformanceMonitor:PSS: 9339344; RSS: 9563400; PCPU: 679; PMEM: 3.6
2024-09-29 09:43:56,208:INFO:PerformanceMonitor:PSS: 9326689; RSS: 9467828; PCPU: 680; PMEM: 3.5
2024-09-29 09:48:56,939:INFO:PerformanceMonitor:PSS: 9255149; RSS: 9470780; PCPU: 681; PMEM: 3.5
2024-09-29 09:53:57,710:INFO:PerformanceMonitor:PSS: 9468200; RSS: 9722240; PCPU: 682; PMEM: 3.6
2024-09-29 09:58:58,372:INFO:PerformanceMonitor:PSS: 10211382; RSS: 10400500; PCPU: 683; PMEM: 3.9
2024-09-29 10:03:59,080:INFO:PerformanceMonitor:PSS: 13383424; RSS: 13646516; PCPU: 684; PMEM: 5.1
2024-09-29 10:08:59,743:INFO:PerformanceMonitor:PSS: 16107196; RSS: 16698828; PCPU: 685; PMEM: 6.3
2024-09-29 10:08:59,743:ERROR:PerformanceMonitor:Error in CMSSW step cmsRun1
Number of Cores: 8
Job has exceeded maxPSS: 16000 MB
Job has PSS: 16107 MB

2024-09-29 10:08:59,745:ERROR:PerformanceMonitor:Attempting to kill step using SIGUSR2
2024-09-29 10:10:28,134:INFO:CMSSW:Step cmsRun1: Chirp_WMCore_cmsRun_ExitCode 0
2024-09-29 10:10:28,318:INFO:CMSSW:Step cmsRun1: Chirp_WMCore_cmsRun1_ExitCode 0

@makortel
Copy link
Contributor

makortel commented Oct 8, 2024

New look into

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1

shows RSS vs. event
image

and VSIZE vs event
image

While VSIZE grows somewhat gradually (even if in steps) after the first ~1000 events, the RSS shows rapid growth at towards the end of the job, starting around 7160th event (the job processed total of 7382 events).

(btw, the earlier case could have been interesting to study as well to find the ~58 kB/event hoarding/leak, even if the physics content was garbage)

@germanfgv
Copy link
Contributor

We have a new instance of this issue. I put the tarball and RAW input here in case it can help the investigation:

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386604_Muon0

@makortel
Copy link
Contributor

makortel commented Oct 9, 2024

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386604_Muon0

Here are the RSS and VSIZE plots vs. event number. The RSS behavior in particular is interesting, showing two periods of steep rise
image
image

@makortel
Copy link
Contributor

makortel commented Oct 9, 2024

I wonder if these "sudden grow" periods could be related to unrelated process being terminated in the presence of high fragmentation that was brought up e.g. in #42387.

@makortel
Copy link
Contributor

makortel commented Oct 9, 2024

@germanfgv Have you tested if resubmission of the paused job would make any difference? I'm just wondering if we'd already have any evidence for these RSS/PSS rapid growths being reproducible or not.

@makortel
Copy link
Contributor

makortel commented Oct 10, 2024

Continuing on

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1

I ran the job on cmsdev42 (via slc7 container on el8 host), and got the following RSS and VSIZE (to be compared to #46040 (comment))

image
image

I think the difference in RSS behavior hints towards the behavior being dependent on the overall system state (supporting the hypothesis of the "RSS storm in presence of high fragmentation").

@jeyserma
Copy link

Two more occurred and are reported here:

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386694_Muon0/

The RAW files are also copied over.

@makortel
Copy link
Contributor

/eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386694_Muon0/

Here are memory plots from the logs
8678369f-4998-464d-aef9-ce8b1ece0259-111-0-logArchive.tar.gz
image

8678369f-4998-464d-aef9-ce8b1ece0259-148-0-logArchive.tar.gz
image

@makortel
Copy link
Contributor

makortel commented Oct 14, 2024

@jeyserma Does Tier0 usually try again these jobs, or fail them after the first attempt?

@jeyserma
Copy link

maxPSS paused jobs are automatically retried 3 times by our agent.

For this particular memory issue, we increased the memory limit to 17 GB (default 16 GB for 8 cores), and they ran fine.

@makortel
Copy link
Contributor

Thanks @jeyserma. Are these failure reports posted if the job has failed all 4 times (i.e. it continues to fail), or if it failed any time even a later retry succeeded?

@makortel
Copy link
Contributor

makortel commented Oct 17, 2024

Under the assumption that we should reduce the memory footprint in general I ran IgProf in the job of /eos/home-c/cmst0/public/PausedJobs/Run2024G/maxPSS/PromptReco_Run386319_Muon1 on slc7 on a single thread, and inspecting the heap state after 20th event. The full profile can be found from here, and below is my summary

Total amount of allocated memory: 4.22 GB, divides roughly into (including only the largest contributors or are otherwise interesting

  • 1.11 GB in initialization (EventProcessor::init(), link)
    • 883 MB in EDModule constructors (link)
      • 390 MB in cut/expression parser via many modules (link) [Run3 PromptReco] 390 MB in cut/expression parser #46493
        • 103 MB via ObjectSelectorBase<SingleElementCollectionSelector<edm::View<reco::Muon>, ...> (link)
        • 93 MB via ObjectSelectorBase<SingleElementCollectionSelector<std::vector<pat::GenericParticle, ...> (link)
        • 90 MB via BXVectorSimpleFlatTableProducer<l1t::EGamma> (link)
        • 26 MB via SimpleFlatTableProducerBase<reco::BeamSpot, reco::BeamSpot> (link)
        • 17 MB via TopSingleLeptonDQM (link
        • 12 MB via TopMonitor (link)
        • 11 MB via VersionedIdProducer<edm::Ptr<reco::Photon> (link)
        • 8 MB via RecoTauPiZeroProducer (link)
        • 5 MB via SimpleFlatTableProducer<pat::Jet> (link)
        • 5 MB via SimpleFlatTableProducer<pat::Electron> (link)
      • 140 MB in ONNX models (link)
        • 47 MB via BoostedJetONNXJetTagsProducer::initializeGlobalCache()
        • 32 MB via BaseMVAValueMapProducer<pat::Muon>::initializeGlobalCache()
        • 32 MB via pat::MuonMvaIDEstimator (via pat::PATMuonProducer)
        • 22 MB via UnifiedParticleTransformerAK4ONNXJetTagsProducer::initializeGlobalCache()
        • 7.6 MB via DeepFlavourONNXJetTagsProducer::initializeGlobalCache()
      • 88 MB in GBRForests (link)
        • 28 MB via LowPtGsfElectronSeedProducer (link
        • 22 MB via MVAValueMapProducer<reco::GsfElectron> (link)
        • 14 MB via LowPtGsfElectronIDProducer (link)
        • 6 MB via MVAValueMapProducer<reco::Photon> (link)
        • 3.6 MB via PFElecTkProducer (link)
      • 15+35=50 MB in TensorFlow graphs (link) and sessions (link)
        • 11+23=34 MB via DeepTauId::initializeGlobalCache() (link)
        • 1.8+3.9=5.7 MB via TfGraphDefProducer::produce() (link) although this is really via EventSetup
        • 0.8+1.7=2.5 MB via DeepCoreSeedGenerator::initializeGlobalCache( (link)
      • 30 MB in CSCTriggerPrimitivesProducer (link) [Run3 PromptReco] CSCTriggerPrimitivesProducer constructor uses 30 MB / stream #46432
      • 13 MB in DeepTauId (link)
        • 12.8 MB via TensorFlow inference call? (link)
      • 9 MB in PoolOutputModule (link)
        • nearly all in product selection rules (link)
      • 5.1 MB in CSCMonitorModule (link)
      • 4.3 MB in MuonIdProducer (link)
        • nearly all of that in reading something from a TFile (link)
    • 96 MB in PSet registry (link)
    • 87 MB in Cling (link)
    • 81 MB in product registry (link)
      • 37 MB in ROOT dictionary code (link)
      • 37 MB in more ROOT dictionary code (link)
    • 28 MB in PoolSource (link)
  • 3.09 GB in data processing (EventProcessor::runToCompletion(), link)
    • EventSetup 1018 MB (link) after subtracting the contribution of edm::one EDModules
      • 210 MB in SiPixelTemplateStoreESProducer::produce() (link)
      • 119 MB in SiPixelGainCalibrationOffline via CondDB (link)
      • 79 MB in magneticfield::DD4hep_VolumeBasedMagneticFieldESProducerFromDB::produce() (link)
      • 68 MB in PixelCPEClusterRepairESProducer::produce() (link)
      • 66 MB in SiPixel2DTemplateDBObject via CondDB (link)
      • 49 MB in GBRForestD via CondDB (link)
      • 42 MB in CaloGeometryDBEP<EcalPreshowerGeometry, CaloGeometryDBReader>::produceAligned() (link)
      • 42 MB in EcalCondObjectContainer<EcalPulseCovariance> via CondDB (link)
    • beginRun
      • 470 MB in DQM
        • 415 MB as edm::stream (link)
          • 70 MB via SiPixelPhase1Base::bookHistograms()
          • 60 MB via JetMonitor::bookHistograms()
          • 36 MB via TopMonitor::bookHistograms()
          • orthgonally, 55 MB of the 415 MB comes via GenericTriggerEventFlag::initRun() (link)
            • 19 MB via JetMonitor::bookHistograms()
            • 15 MB via METMonitor::bookHistograms()
            • 13 MB via BPHMonitor::bookHistograms()
        • 55 MB as edm::one (link)
          • 29 MB of this is actually HLTConfigProvider::init() (link)
      • 11 MB in L1TMuonOverlapPhase1TrackProducer::beginRun() (link)
    • after Event transition
      • 1.1 GB in PoolOutputModule::write() (link)
      • 72 MB in tensorflow::run() (link), after subtracting the component from DeepTauId constructor
        • 49 MB via DeepTauId::produce() (link)
        • 7.6 MB via DeepMETProducer::produce() (link)
        • 5.3 MB via TrackMVAClassifierBase::produce() (link)
      • 48 MB in cms::Ort::ONNXRuntime::run() (link)
        • 29 MB via DeepFlavourONNXJetTagsProducer::produce()
        • 17 MB via BoostedJetONNXJetTagsProducer::produce()
      • 48 MB in L1TMuonEndCapTrackProducer::produce() (link) L1TMuonEndCapTrackProducer::produce() takes 96 MB memory per stream #42526
        • 47 MB in PtAssignmentEngine::load() (link)
      • 20 MB in SiStripRecHitConverter::produce() (link)
        • 12 MB via produced edmNew::DetSetVector<SiStripRecHit2D>
        • 7.8 MB via something in the produce() body
      • 19 MB in SeedCreatorFromRegionHitsEDProducerT<SeedFromConsecutiveHitsCreator>::produce() (link)
      • 15 MB in pat::PATPackedCandidateProducer::produce() (link)
        • 14 MB via loading covariance parametrization from TFile (link)
      • 11 MB in SeedCreatorFromRegionHitsEDProducerT<SeedFromConsecutiveHitsTripletOnlyCreator>::produce() (link)
      • 11 MB in TrackCollectionMerger::produce() (link)
      • 10 MB in TrackProducer::produce() (link)
      • ...
      • 1.5 MB in AlcaBeamMonitor::analyze() (link) AlcaBeamMonitor hoards memory #42995
        • This looks like a good candidate for hoarding data per event

@makortel
Copy link
Contributor

This one had already an issue open in #42995

@makortel
Copy link
Contributor

  • 1440 M in CaloSubdetectorGeometry::cellGeomPtr() (link, already noted above)

Spinned off to #46433

@makortel
Copy link
Contributor

  • 70 to 140 MB (70 MB increase) in SiPixelPhase1Base::bookHistograms() (4 streams link)
    • 10 to 40 MB (30 MB increase) in a std::map<std::vector<std::pair<int, double>>, AbstractHistogram>
    • 9.9 to 40 MB (30 MB increase) in 3 member functions of GeometryInterface

Spinned off to #46446

@makortel
Copy link
Contributor

  • 55 to 218 MB (164 MB increase) in GenericTriggerEventFlag::initRun() (4 streams link)

Spinned off to #46448

@makortel
Copy link
Contributor

makortel commented Oct 18, 2024

  • 7.8 to 31 MB (23 MB increase) in BaseMVAValueMapProducer<pat::Muon> (4 streams link)

Spinned off to #46449

@makortel
Copy link
Contributor

Spinned off to #46450

@jeyserma
Copy link

Hi @makortel. What I claimed previously was wrong: a promptReco job that exceeds memory is never retried automatically (it's only true to Express, therefore my confusion).

We had a few extra paused jobs last week due to this memory issue and I decided to retry them without increasing the memory, and they all finished successfully. But probably that's not true for all maxMemory jobs that paused, though we can't try it anymore.

Nevertheless probably on average, the Muon is closer to the maxMemory limit and therefore we see such an increase in paused jobs.

@makortel
Copy link
Contributor

Thanks @jeyserma!

We had a few extra paused jobs last week due to this memory issue and I decided to retry them without increasing the memory, and they all finished successfully. But probably that's not true for all maxMemory jobs that paused, though we can't try it anymore.

Would the logs of those failed jobs be still available? I'd like to collect more statistics of the "failing behavior".

Given the evidence so far suggests operating system's dynamic behavior playing a role, my suggestion would be to re-try a job paused because of reaching MaxPSS once (or maybe twice) either with the same or a bit larger limit.

@makortel
Copy link
Contributor

  • 4.8 MB to 19 MB (14 MB increase) in HLTPrescaleProvider::init()

Spinned off to #46466

@makortel
Copy link
Contributor

  • 0.686 G (2.89 %) in CAHitNtupletEDProducerT<CAHitQuadrupletGenerator>::produce() (link)
  • 0.680 G (2.86 %) in CAHitNtupletEDProducerT<CAHitTripletGenerator>::produce() (link)

These are already discussed in #37698

@makortel
Copy link
Contributor

Just out of curiosity I collected the total cost of ML algorithms from the aforementioned IgProf memory profiles (numbers reflecting the state of a long-running 8 thread/stream job; not counting the PtAssignmentEngine from L1TMuonEndCapTrackProducer as that should be reworked in several ways)

  • Models
    • ONNX: 140 MB
    • GBRForest: 137 MB
    • TMVA: 87 MB (11 MB / stream)
    • Tensorflow: 57 MB (49 MB + 1 MB / stream)
    • XgBoost: 14 MB
  • Inference
    • Tensorflow: 117 MB (number for 1 stream though)
    • ONNX: 72 MB
    • XGBoost: ~ 60 kB (or less)
    • TMVA ~30 kB
    • GBRForest: ~ 0 ? (I guess the inputs need some temporary allocations, but nothing from the inference gets held by the modules)

So a total of about 620 MB. FYI @cms-sw/ml-l2

@makortel
Copy link
Contributor

  • 390 MB in cut/expression parser via many modules (link)

Spinned off to #46493

@makortel
Copy link
Contributor

Spinned off to #46494

@makortel
Copy link
Contributor

  • 26 M from SiStripFolderOrganizer::getSubDetFolderAndTag()

Spinned off to #46498

@makortel
Copy link
Contributor

@cms-sw/tracking-pog-l2
While my feeling is that significants improvements during the remaining Run 3 would not be feasible, I think it is nevertheless worth of noting that the tracking makes (at least) 10.2 million memory allocations per event, on average (this corresponds to about 43 % of all memory allocations done during the event processing). (for more details see above)

@slava77
Copy link
Contributor

slava77 commented Oct 23, 2024

@cms-sw/tracking-pog-l2 While my feeling is that significants improvements during the remaining Run 3 would not be feasible, I think it is nevertheless worth of noting that the tracking makes (at least) 10.2 million memory allocations per event, on average (this corresponds to about 43 % of all memory allocations done during the event processing). (for more details see above)

seems like a large fraction is the cost of recomputing a hit depending on track parameters.

Unfortunately that's apparently inlined and not clearly visible in the igprof:

std::unique_ptr<SiStripRecHit2D> TkClonerImpl::operator()(SiStripRecHit2D const& hit,
TrajectoryStateOnSurface const& tsos) const {
/// FIXME: this only uses the first cluster and ignores the others
const SiStripCluster& clust = hit.stripCluster();
StripClusterParameterEstimator::LocalValues lv = stripCPE->localParameters(clust, *hit.detUnit(), tsos);
return std::make_unique<SiStripRecHit2D>(lv.first, lv.second, *hit.det(), hit.omniCluster());

vs
https://mkortela.web.cern.ch/mkortela/cgi-bin/navigator/issue46040/test_17_total/304

1M in MultiHitFromChi2EDProducer apparently comes from perhaps 25K seeds per event (e.g. pixelLess in that run has 16k/ev), or 40 allocs per final seed, while the number of considered permutations is likely larger. So, doesn't look too unreasonable.

Does it matter?

@makortel
Copy link
Contributor

Does it matter?

The memory churn from O(1 MHz) of memory allocations may have a significant impact towards memory getting more fragmented, that could then lead to the OS to use (much) more RSS in some situations (I mean, this sounds plausible, there is little direct evidence beyond what was presented in #42387).

Or in other words, the main practical impact of memory churn is some slowdown, until it gets so bad that everything breaks.

@makortel
Copy link
Contributor

makortel commented Nov 7, 2024

  • 335 M in HLTPrescaleProvider::prescaleValuesInDetail() (link)

This one seems to be likely improved in #46628

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants