Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

status for automatic splitting should better report when probe jobs failed #5142

Open
belforte opened this issue Dec 21, 2021 · 0 comments
Open

Comments

@belforte
Copy link
Member

from https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/6264/1.html
and also the predag log should avoid raising a divide by 0 exception because the
number of processed events is 0 !

*** Discussion title: Computing Tools

thanks Tamas,
well.. if you type 'crab status --long' things are more clear [1],
all probe jobs failed because running out of memory.
I do not think we can venture into improving the automatic splitting
algorithm to better estimate the needed memory, some work on that
side had been done already originally and it is supported in an
'as is' way since its developer left.
But it is indeed a long standing shortcoming that the plain
'crab status' output in this case could say "all probe jobs failed"
instead of just "refer to this FAQ for ....".

I will add to the (alas long) list of minor improvements to do.
Stefano

[1]
Status on the scheduler:        FAILED

The job splitting of this task is 'Automatic', please refer to this FAQ for a description of the jobs status summary:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/CRAB3FAQ#What_is_the_Automatic_splitting
Probe stage log:                https://cmsweb.cern.ch:8443/scheddmon/0144/cms578/211220_225709:tvami_crab_Analysis_SingleMuon_UL2017CEra_wProbQ_newMethod_v1/AutomaticSplitting_Log0.txt
More details of Automatic Splitting process for this task (including possible failures) are in the Dagman Log files in:
https://cmsweb.cern.ch:8443/scheddmon/0144/cms578/211220_225709:tvami_crab_Analysis_SingleMuon_UL2017CEra_wProbQ_newMethod_v1/AutomaticSplitting/

Probe jobs status:              no output               100.0% (5/5)

No publication information available yet

Error Summary: (use crab status --verboseErrors for details about the errors)

5 jobs failed with exit code 50660

Have a look at https://twiki.cern.ch/twiki/bin/viewauth/CMSPublic/JobExitCodes for a description of the exit codes.

Extended Job Status Table:

  Job State        Most Recent Site        Runtime   Mem (MB)      CPU %    Retries   Restarts      Waste       Exit Code
  0-1 no output    T2_DE_DESY              0:15:11       2122         13          0          0    0:02:46           50660
  0-2 no output    T2_DE_DESY              0:15:12       2118         17          0          0    0:02:51           50660
  0-3 no output    T2_DE_DESY              0:15:11       2155         12          0          0    0:02:55           50660
  0-4 no output    T2_DE_DESY              0:10:11       2009          7          0          0    0:02:54           50660
  0-5 no output    T2_DE_DESY              0:15:12       2142         21          0          0    0:02:52           50660

On 21/12/2021 01:00, Tamas Vami wrote:
>
> *** Discussion title: Computing Tools
>
> Hi Stefano,
>
> My automatic splitting failed on the scheduler, here is the HELP link
> https://cmsweb.cern.ch/crabserver/ui/task/211220_225709%3Atvami_crab_Analysis_SingleMuon_UL2017CEra_wProbQ_newMethod_v1
> and here is the Probe stage log:
> https://cmsweb.cern.ch:8443/scheddmon/0144/cms578/211220_225709:tvami_crab_Analysis_SingleMuon_UL2017CEra_wProbQ_newMethod_v1/AutomaticSplitting_Log0.txt
>
> The error it reports is "ZeroDivisionError: division by zero".
> The Status on the CRAB server still says "SUBMITTED".
>
> It seems to be working fine if I change to a lumi based splitting.
>
> Can you please have a look at this error?
> Cheers,
> Tamas
>
>   [ MIME part of type text/html without a name stripped ]
>
> -------------------------------------------------------------
> Visit this CMS message (to reply or unsubscribe) at:
> https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/6264.html
>

-------------------------------------------------------------
Visit this CMS message (to reply or unsubscribe) at: 
https://hypernews.cern.ch/HyperNews/CMS/get/computing-tools/6264/1.html


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant