Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

call jf from container #196

Open
yw-fang opened this issue Oct 16, 2024 · 4 comments
Open

call jf from container #196

yw-fang opened this issue Oct 16, 2024 · 4 comments

Comments

@yw-fang
Copy link

yw-fang commented Oct 16, 2024

Hi, all

Due to the requirement of the HPC I am using, I had to install it in a container. I found that in the interactive shell mode, I could run jf command in this way:
singularity exec ~/jobflowenv.sif jf and it worked as expected.

However, it didn't work if I tried execute it in the slurm job script.
Here's my job script submit.sh

#!/bin/bash

#SBATCH --partition=debug
#SBATCH --job-name=relax_job
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --mem=30gb
#SBATCH --time=00:10:00
#SBATCH --output=/scratch/qejobflow/87/50/55/875055c1-7fa4-4a40-bee5-6ddb63be599a_1/queue.out
#SBATCH --error=/scratch/qejobflow/87/50/55/875055c1-7fa4-4a40-bee5-6ddb63be599a_1/queue.err
cd /scratch/qejobflow/87/50/55/875055c1-7fa4-4a40-bee5-6ddb63be599a_1/
module purge
module load QuantumESPRESSO
alias jf="singularity exec ~/jobflowenv.sif jf"
alias python="singularity exec ~/jobflowenv.sif python"
singularity exec ~/jobflowenv.sif python --version
singularity exec ~/jobflowenv.sif jf --help

jf -fe execution run /scratch/qejobflow/87/50/55/875055c1-7fa4-4a40-bee5-6ddb63be599a_1

By examining queue.out, I found the lines beginning with 'singularity' in the submit.sh worked and printed the information about python version and help information of jobflow-remote. However, the last line jf -fe execution run /scratch/qejobflow/87/50/55/875055c1-7fa4-4a40-bee5-6ddb63be599a_1 didn't work and it raised an error "/var/spool/slurmd/job8184623/slurm_script: line 24: jf: command not found
" in the queue.err file.

I am wondering if anyone has similar experience. I'll greatly appreciate if anyone could make some comments!

@gpetretto
Copy link
Contributor

Hi @yw-fang,
unfortunately I don't have much experience running with containers on HPC systems. I see that you already tried to set an alias for the jf command, but it seems it did not work.
In general, I think that if this is a use case it would be fine to add an option to the jobflow-remote configuration to customize the jf -fe execution run command. In that way you could replace it directly with singularity exec ~/jobflowenv.sif jf -fe execution run.

Before proceeding, can you maybe check if this would actually solve your problem? In principle you could manually edit the submission script and replace the last line with one that calls jf through the container. Unless stop the runner immediately after the job has been SUBMITTED, jobflow-remote will still see the job as failed, but at least you can check if the job is executed correctly. (based on the submission script I suppose this could be some quantum espresso simulation, so you should still be able to see if that was properly completed).

@yw-fang
Copy link
Author

yw-fang commented Oct 17, 2024

Hi, @gpetretto thank you very much for your response! It could work if I replaced "jf" with "singularity exec ~/jobflowenv.sif jf" manually and the subsequent espresso calculation would be done.
How to rewrite the jobflow-remote configuration so that 'jf' is automatically replaced by "singularity exec ~/jobflowenv.sif jf"? This is I was looking for but didn't find how to make it.

@gpetretto
Copy link
Contributor

Good to know that it works in that case. Unfortunately the option is not there at the moment, but I can probably implement it in the next days.
In the meanwhile, if you wish you can manually edit this line in the source code:

script_commands = [f"jf -fe execution run {remote_path}"]

so that it adds the call to the container in your submission script.

@yw-fang
Copy link
Author

yw-fang commented Oct 17, 2024

@gpetretto It works. Many thanks. If you don't mind, I'd like to keep this issue open until you have other implement to avoid users to change the source code directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants