-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add script to run Jupyter Lab HPC natively #59
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you also add tapisjob_run.sh file to the repo/PR.
{ | ||
"name": "launchParameters", | ||
"arg": "echo 'Job is done.'", | ||
"inputMode": "FIXED", | ||
"notes": { | ||
"isHidden": true | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed with zip runtime.
{ | |
"name": "launchParameters", | |
"arg": "echo 'Job is done.'", | |
"inputMode": "FIXED", | |
"notes": { | |
"isHidden": true | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. - added couple comments that need to be addressed before merge
echo "TACC: ERROR - ssh tunnels failed to launch" | ||
exit 1 | ||
fi | ||
echo "TACC: created reverse ports on Frontera logins" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove system specific language
echo "TACC: created reverse ports on Frontera logins" | |
echo "TACC: created reverse ports on system logins" |
# verify jupyter is up. if not, give one more try, then bail | ||
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then | ||
echo "TACC: first jupyter launch failed. Retrying..." | ||
nohup ${JUPYTER_BIN} ${JUPYTER_ARGS} &> ${JUPYTER_LOGFILE} && rm ${TAP_LOCKFILE} & | ||
fi | ||
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then | ||
echo "TACC: ERROR - jupyter failed to launch" | ||
echo "TACC: ERROR - this is often due to an issue in your python or conda environment" | ||
echo "TACC: ERROR - jupyter logfile contents:" | ||
cat ${JUPYTER_LOGFILE} | ||
echo "TACC: job ${SLURM_JOB_ID} execution finished at: $(date)" | ||
exit 1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified to the below, and moved to after curl
on line 153
# verify jupyter is up. if not, give one more try, then bail | |
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then | |
echo "TACC: first jupyter launch failed. Retrying..." | |
nohup ${JUPYTER_BIN} ${JUPYTER_ARGS} &> ${JUPYTER_LOGFILE} && rm ${TAP_LOCKFILE} & | |
fi | |
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then | |
echo "TACC: ERROR - jupyter failed to launch" | |
echo "TACC: ERROR - this is often due to an issue in your python or conda environment" | |
echo "TACC: ERROR - jupyter logfile contents:" | |
cat ${JUPYTER_LOGFILE} | |
echo "TACC: job ${SLURM_JOB_ID} execution finished at: $(date)" | |
exit 1 | |
fi | |
${JUPYTER_BIN} ${JUPYTER_ARGS} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to hang on the jupyter process after everything is setup, then do cleanup after
# verify jupyter is up. if not, give one more try, then bail | ||
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then | ||
echo "TACC: first jupyter launch failed. Retrying..." | ||
nohup ${JUPYTER_BIN} ${JUPYTER_ARGS} &> ${JUPYTER_LOGFILE} && rm ${TAP_LOCKFILE} & | ||
fi | ||
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then | ||
echo "TACC: ERROR - jupyter failed to launch" | ||
echo "TACC: ERROR - this is often due to an issue in your python or conda environment" | ||
echo "TACC: ERROR - jupyter logfile contents:" | ||
cat ${JUPYTER_LOGFILE} | ||
echo "TACC: job ${SLURM_JOB_ID} execution finished at: $(date)" | ||
exit 1 | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want to hang on the jupyter process after everything is setup, then do cleanup after
mkdir -p ${NB_SERVERDIR} | ||
mkdir -p ${HOME}/.tap | ||
|
||
TAP_LOCKFILE=${HOME}/.tap/.${SLURM_JOB_ID}.lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chandra-tacc what is TAP_LOCKFILE for?
export JUPYTERLAB_SETTINGS_DIR="${HOME}/.jupyterlab" | ||
|
||
# launch jupyter-lab | ||
JUPYTER_LOGFILE=${NB_SERVERDIR}/${NODE_HOSTNAME_PREFIX}.log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for JUPYTER_LOGFILE, logs will be logged to tapisjob.out automatically
c = get_config() | ||
c.IPKernelApp.pylab = "inline" | ||
c.ServerApp.ip = "0.0.0.0" | ||
c.ServerApp.port = 5902 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c.ServerApp.port = 5902 | |
c.ServerApp.port = ${LOCAL_PORT} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To keep it consistent
[DesignSafe] Enable MPI in Jupyter Lab HPC (CPU)
Created app.json that uses a ZIP runtime to run an interactive JupyterLab session. It is not containerized, allowing users more flexibility.