-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
slurm scheduler not working with slurm-wlm-torque qstat #280
Comments
Interesting that you are the first to run into this (or at least the first to report it). I guess it must be unusual for SLURM clusters to not have qstat installed. The implementation that does not require the XML output option is still supported, so I think we should be able to convince the SLURM executor to use it. As you say, it'd be nice if we can make something more efficient but as a fallback it should work. I don't have a good test method for SLURM right now - I will see if I can set up something that uses AWS parallelcluster so I can get this working properly. Sorry for the problem and will look into what to do. |
@nemartins I have just put in a commit that I think should fix the SLURM issue at least in a basic sense. I was able to create a test cluster on AWS and confirmed that it seemed to work. If you are able to build from master and try it out then that would be great. Otherwise, let me know I can provide you with a build or you can test it with the next release. Thanks for reporting this issue! |
Thank you for looking into this! I will try to build it from master and run it on the cluster early next week. Thanks again |
that's a very clever way to try and solve it - would be interesting to see it if you are interested to share. It could allow us to used the the pooled status monitor with slurm which would be a better solution (current solution will cause an individual job status command to be issued for every active job every minute or so - not very scalable, which was why the pooled status monitor which queries multiple jobs at a time was introduced. Let me know how it goes! |
I've ran bpipe from master, and it works well. Thank you for the quick solution! Here's the script I've come up with. It could probably be way simpler/elegant, but it was a very rush job, and the first time I've used jq/yq export params="${@:2:1000}"
qsub $params |\
sed -e 's|Job id|JobID|g' -e 's|Time Use|TimeUse|g' |\
csvtk space2tab --comment-char '-' |\
csvtk csv2json -t |\
jq -c .[] |\
jq -n 'reduce inputs as $line ({};. + { ("DataZ"+$line.JobID) : { "Job": {"Job_Id": ($line.JobID),"job_state": ($line.S)}} })' |\
yq -o xml |\
sed -e 's|DataZ.*>|Data>|g' |\
tr -d "\n" | tr -d "\ " |\
awk -v RS='</Data>' -v ORS='</Data>\n' ' {print}' Best |
I'm trying to run bpipe on a slurm cluster.
This cluster does not have qstat installed, so the pipeline never progresses to the next step.
I've tried to use qstat from slurm-wlm-torque package, but there's no xml output option.
Is it possible to create a SlurmStatusMonitor that alleviates the dependency for the qstat xml output or that uses the native slurm tools (sstat, scontrol)?
Thanks in advance,
The text was updated successfully, but these errors were encountered: