Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add script to run Jupyter Lab HPC natively #59

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

van-go
Copy link
Contributor

@van-go van-go commented Oct 23, 2024

[DesignSafe] Enable MPI in Jupyter Lab HPC (CPU)

Created app.json that uses a ZIP runtime to run an interactive JupyterLab session. It is not containerized, allowing users more flexibility.

@van-go van-go changed the title feat: Add script to run Jupyter Lab HPC natively Add script to run Jupyter Lab HPC natively Oct 30, 2024
Copy link
Contributor

@chandra-tacc chandra-tacc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also add tapisjob_run.sh file to the repo/PR.

Comment on lines 27 to 34
{
"name": "launchParameters",
"arg": "echo 'Job is done.'",
"inputMode": "FIXED",
"notes": {
"isHidden": true
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed with zip runtime.

Suggested change
{
"name": "launchParameters",
"arg": "echo 'Job is done.'",
"inputMode": "FIXED",
"notes": {
"isHidden": true
}
}

Copy link
Contributor

@chandra-tacc chandra-tacc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. - added couple comments that need to be addressed before merge

applications/jupyter-lab-hpc-native/tapisjob_app.sh Outdated Show resolved Hide resolved
echo "TACC: ERROR - ssh tunnels failed to launch"
exit 1
fi
echo "TACC: created reverse ports on Frontera logins"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove system specific language

Suggested change
echo "TACC: created reverse ports on Frontera logins"
echo "TACC: created reverse ports on system logins"

Comment on lines +124 to +136
# verify jupyter is up. if not, give one more try, then bail
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then
echo "TACC: first jupyter launch failed. Retrying..."
nohup ${JUPYTER_BIN} ${JUPYTER_ARGS} &> ${JUPYTER_LOGFILE} && rm ${TAP_LOCKFILE} &
fi
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then
echo "TACC: ERROR - jupyter failed to launch"
echo "TACC: ERROR - this is often due to an issue in your python or conda environment"
echo "TACC: ERROR - jupyter logfile contents:"
cat ${JUPYTER_LOGFILE}
echo "TACC: job ${SLURM_JOB_ID} execution finished at: $(date)"
exit 1
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be simplified to the below, and moved to after curl on line 153

Suggested change
# verify jupyter is up. if not, give one more try, then bail
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then
echo "TACC: first jupyter launch failed. Retrying..."
nohup ${JUPYTER_BIN} ${JUPYTER_ARGS} &> ${JUPYTER_LOGFILE} && rm ${TAP_LOCKFILE} &
fi
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then
echo "TACC: ERROR - jupyter failed to launch"
echo "TACC: ERROR - this is often due to an issue in your python or conda environment"
echo "TACC: ERROR - jupyter logfile contents:"
cat ${JUPYTER_LOGFILE}
echo "TACC: job ${SLURM_JOB_ID} execution finished at: $(date)"
exit 1
fi
${JUPYTER_BIN} ${JUPYTER_ARGS}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to hang on the jupyter process after everything is setup, then do cleanup after

Comment on lines +124 to +136
# verify jupyter is up. if not, give one more try, then bail
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then
echo "TACC: first jupyter launch failed. Retrying..."
nohup ${JUPYTER_BIN} ${JUPYTER_ARGS} &> ${JUPYTER_LOGFILE} && rm ${TAP_LOCKFILE} &
fi
if ! $(ps -fu ${USER} | grep ${JUPYTER_BIN} | grep -qv grep) ; then
echo "TACC: ERROR - jupyter failed to launch"
echo "TACC: ERROR - this is often due to an issue in your python or conda environment"
echo "TACC: ERROR - jupyter logfile contents:"
cat ${JUPYTER_LOGFILE}
echo "TACC: job ${SLURM_JOB_ID} execution finished at: $(date)"
exit 1
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to hang on the jupyter process after everything is setup, then do cleanup after

mkdir -p ${NB_SERVERDIR}
mkdir -p ${HOME}/.tap

TAP_LOCKFILE=${HOME}/.tap/.${SLURM_JOB_ID}.lock
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chandra-tacc what is TAP_LOCKFILE for?

export JUPYTERLAB_SETTINGS_DIR="${HOME}/.jupyterlab"

# launch jupyter-lab
JUPYTER_LOGFILE=${NB_SERVERDIR}/${NODE_HOSTNAME_PREFIX}.log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for JUPYTER_LOGFILE, logs will be logged to tapisjob.out automatically

c = get_config()
c.IPKernelApp.pylab = "inline"
c.ServerApp.ip = "0.0.0.0"
c.ServerApp.port = 5902
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
c.ServerApp.port = 5902
c.ServerApp.port = ${LOCAL_PORT}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep it consistent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants