Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation for Warewulf 4 on Rocky 9.4 with Slurm on x86_64 #2048

Merged
merged 1 commit into from
Oct 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions components/admin/docs/SPECS/docs.spec
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@ from the OpenHPC software stack.
#pushd docs/recipes/install/centos8/x86_64/warewulf/slurm
#make ; %{parser} steps.tex > recipe.sh ; popd

pushd docs/recipes/install/rocky9/x86_64/warewulf4/slurm
make ; %{parser} steps.tex > recipe.sh ; popd

pushd docs/recipes/install/rocky9/x86_64/warewulf/slurm
make ; %{parser} steps.tex > recipe.sh ; popd

Expand Down Expand Up @@ -170,6 +173,10 @@ install -m 0644 -p docs/Release_Notes.txt %{buildroot}/%{OHPC_PUB}/doc/Release_N

# x86_64 guides

%define lpath rocky9/x86_64/warewulf4/slurm
install -m 0644 -p -D docs/recipes/install/%{lpath}/steps.pdf %{buildroot}/%{OHPC_PUB}/doc/recipes/%{lpath}/Install_guide.pdf
install -m 0755 -p -D docs/recipes/install/%{lpath}/recipe.sh %{buildroot}/%{OHPC_PUB}/doc/recipes/%{lpath}/recipe.sh

%define lpath rocky9/x86_64/warewulf/slurm
install -m 0644 -p -D docs/recipes/install/%{lpath}/steps.pdf %{buildroot}/%{OHPC_PUB}/doc/recipes/%{lpath}/Install_guide.pdf
install -m 0755 -p -D docs/recipes/install/%{lpath}/recipe.sh %{buildroot}/%{OHPC_PUB}/doc/recipes/%{lpath}/recipe.sh
Expand Down
1 change: 1 addition & 0 deletions docs/recipes/install/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ steps.pdf
vc.tex
pkg-ohpc.chglog*

steps.synctex.gz
29 changes: 29 additions & 0 deletions docs/recipes/install/common/add_ww4_hosts_finalize.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
\iftoggleverb{isx86}
% ohpc_validation_newline
% ohpc_validation_comment Optionally, define IPoIB network settings (required if planning to mount Lustre over IB)
% ohpc_command if [[ ${enable_ipoib} -eq 1 ]];then
% ohpc_indent 5
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily]
# Optionally define IPoIB network settings (required if planning to mount Lustre/BeeGFS over IB)
[sms](*\#*) for ((i=0; i<$num_computes; i++)) ; do
wwctl node set --yes ${c_name[$i]} --netdev=ib0 --ipaddr=${c_ipoib[$i]} --netmask=${ipoib_netmask}
done
\end{lstlisting}
% ohpc_indent 0
% ohpc_command fi
% ohpc_validation_newline
% end_ohpc_run
\fi

Finally, we reconfigure build the overlays and update the Warewulf configuration.
It is necessary to rebuild the overlays whenever a overlay is modified.

% begin_ohpc_run
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily,literate={BOSVER}{\baseos{}}1]
# build the overlays for all the nodes
[sms](*\#*) wwctl overlay build

# Update Warewulf configure
[sms](*\#*) wwctl configure --all
\end{lstlisting}
% end_ohpc_run
10 changes: 10 additions & 0 deletions docs/recipes/install/common/add_ww4_hosts_intro.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
%\iftoggle{isx86}{\clearpage}
% begin_ohpc_run
% ohpc_validation_comment Add hosts to cluster
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily,literate={BOSVER}{\baseos{}}1]
[sms](*\#*) for ((i=0; i<$num_computes; i++)) ; do
wwctl node add --container=rocky-9.4 \
--ipaddr=${c_ip[$i]} --hwaddr=${c_mac[$i]} --netmask=${internal_netmask} ${c_name[i]}
done
\end{lstlisting}
% end_ohpc_run
11 changes: 11 additions & 0 deletions docs/recipes/install/common/add_ww4_hosts_slurm.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Now that the nodes are defined, we can start munge and Slurm. This must be done
after the nodes are defined and the Warewulf configuration is updated.

% begin_ohpc_run
% ohpc_validation_comment Enable and start munge and slurmctld (Cont.)
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily,literate={BOSVER}{\baseos{}}1]
# Enable and start munge and slurmctld
[sms](*\#*) systemctl enable --now munge
[sms](*\#*) systemctl enable --now slurmctld
\end{lstlisting}
% end_ohpc_run
2 changes: 2 additions & 0 deletions docs/recipes/install/common/bos.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
master} host. Alternatively, if choosing to use a pre-installed server, please
verify that it is provisioned with the required \baseOS{} distribution. \\

\ifnottoggleverb{isWarewulf4}
Prior to beginning the installation process of \OHPC{} components, several additional
considerations are noted here for the SMS host configuration. First,
the installation recipe herein assumes that
Expand All @@ -15,6 +16,7 @@
\begin{lstlisting}[language=bash,keywords={}]
[sms](*\#*) echo ${sms_ip} ${sms_name} >> /etc/hosts
\end{lstlisting}
\fi

While it is theoretically possible to enable SELinux on a cluster provisioned
with \provisioner{},
Expand Down
24 changes: 24 additions & 0 deletions docs/recipes/install/common/finalize_warewulf4_provisioning.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
\subsection{Finalizing provisioning configuration} \label{sec:assemble_bootstrap}

\Warewulf{} provisions a node with an image then customizes it with overlays.
This section highlights creation of the node image and overlays, followed by the
registration of desired compute nodes.

\subsubsection{Build container image and overlays}

The bootstrap image includes the runtime kernel and associated modules, as well
as some simple scripts to complete the provisioning process.

% begin_ohpc_run
% ohpc_comment_header Assemble bootstrap image \ref{sec:assemble_bootstrap}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true,literate={BOSVER}{\baseos{}}1]
# Build image
[sms](*\#*) wwctl container build BOSVER
[sms](*\#*) wwctl overlay build
\end{lstlisting}
% end_ohpc_run

\subsubsection{Register nodes for provisioning}

Nodes can be registered for provisioning using the following syntax.

17 changes: 17 additions & 0 deletions docs/recipes/install/common/import_ww4_files.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
The \Warewulf{} system includes functionality to import arbitrary files from
the provisioning server for distribution to managed hosts through a system
called "overlays". Some files, like \texttt{/etc/passwd}, and \texttt{/etc/hosts}
handled in this way by default. Here we add directories and files to the
\texttt{generic} overlay that is applied to all nodes.

% begin_ohpc_run
% ohpc_comment_header Import files \ref{sec:file_import}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
# Add the following to support unprivileged user namespaces for tools like Apptainer
[sms](*\#*) wwctl overlay import generic /etc/subuid
[sms](*\#*) wwctl overlay import generic /etc/subgid

# Identify master host as local NTP server
[sms](*\#*) echo "server ${sms_ip} iburst" | wwctl overlay import generic <(cat) /etc/chrony.conf
\end{lstlisting}
% \end_ohpc_run
19 changes: 19 additions & 0 deletions docs/recipes/install/common/import_ww4_files_ib_centos.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
%\iftoggle{isCentOS}{\clearpage}

\noindent Finally, to add {\em optional} support for controlling IPoIB
interfaces (see \S\ref{sec:add_ofed}), \OHPC{} includes a
template file for \Warewulf{} that can optionally be imported and used later to provision
\texttt{ib0} network settings.

% begin_ohpc_run
% ohpc_validation_newline
% ohpc_command if [[ ${enable_ipoib} -eq 1 ]];then
% ohpc_indent 5
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
[sms](*\#*) wwctl overlay mkdir generic /etc/sysconfig/network-scripts/
[sms](*\#*) wwctl overlay import generic /opt/ohpc/pub/examples/network/centos/ifcfg-ib0.ww \
/etc/sysconfig/network-scripts/ifcfg-ib0.ww
\end{lstlisting}
% ohpc_indent 0
% ohpc_command fi
% \end_ohpc_run
17 changes: 17 additions & 0 deletions docs/recipes/install/common/import_ww4_files_slurm.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
\noindent Similarly, we can configure Slurm and import the cryptographic
key that is required by the {\em munge} authentication library to be available
on every host in the resource management pool, issue the following:

% begin_ohpc_run
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
# Configure Slurm server in the overlay (using "configless" option)
[sms](*\#*) wwctl overlay mkdir generic /etc/sysconfig/
[sms](*\#*) wwctl overlay import generic <(echo SLURMD_OPTIONS="--conf-server ${sms_ip}") /etc/sysconfig/slurmd

# Configure munge
[sms](*\#*) wwctl overlay mkdir generic --mode 0700 /etc/munge
[sms](*\#*) wwctl overlay import generic /etc/munge/munge.key
[sms](*\#*) wwctl overlay chown generic /etc/munge/munge.key $(id -u munge) $(id -g munge)
[sms](*\#*) wwctl overlay chown generic /etc/munge $(id -u munge) $(id -g munge)
\end{lstlisting}
% \end_ohpc_run
1 change: 1 addition & 0 deletions docs/recipes/install/common/inputs.tex
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ \subsection{Inputs} \label{sec:inputs}
\iftoggleverb{isWarewulf}
& \texttt{\$\{eth\_provision\}} & {\small \# Provisioning interface for computes} \\
\fi
& \texttt{\$\{internal\_network\}} & {\small \# Subnet network address for internal network} \\
& \texttt{\$\{internal\_netmask\}} & {\small \# Subnet netmask for internal network} \\
& \texttt{\$\{ntp\_server\}} & {\small \# Local ntp server for time synchronization} \\
& \texttt{\$\{bmc\_username\}} & {\small \# BMC username for use by IPMI} \\
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
With the \OHPC{} repository enabled, we can now begin adding desired components onto the
{\em master} server. This repository provides a number of aliases that group
logical components together in order to help aid in this process. For
reference, a complete list of available group aliases and RPM packages available
via \OHPC{} are provided in Appendix~\ref{appendix:manifest}. To add
support for provisioning services, the following command adds a common base
package followed along with the Warewulf provisioning system. Then the main
Warewulf configuration file is edited to reflect the environment.

%\nottoggle{isCentOS}{\clearpage}

% begin_ohpc_run
% ohpc_comment_header Add baseline OpenHPC and provisioning services \ref{sec:add_provisioning}
\begin{lstlisting}[language=bash,keywords={}]
# Install base packages
[sms](*\#*) (*\install*) ohpc-base warewulf-ohpc hwloc-ohpc netmask
\end{lstlisting}
% end_ohpc_run


8 changes: 8 additions & 0 deletions docs/recipes/install/common/ohpc-doc.sty
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@
\newtoggle{isaarch}
\newtoggle{ispbs}
\newtoggle{isWarewulf}
\newtoggle{isWarewulf3}
\newtoggle{isWarewulf4}
\newtoggle{isSLURM}
\newtoggle{isxCAT}
\newtoggle{isxCATstateful}
Expand All @@ -76,6 +78,12 @@
{\csname etb@tgl@#1\endcsname\iftrue\iffalse}
{\etb@noglobal\etb@err@notoggle{#1}\iffalse}%
}
% inverse of above
\newcommand{\ifnottoggleverb}[1]{%
\ifcsdef{etb@tgl@#1}
{\csname etb@tgl@#1\endcsname\iffalse\iftrue}
{\etb@noglobal\etb@err@notoggle{#1}\iftrue}%
}

\pagestyle{fancy}
\setlength\headheight{59pt}
Expand Down
2 changes: 1 addition & 1 deletion docs/recipes/install/common/reset_computes.tex
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
c4 05:03am up 0:02, 0 users, load average: 0.15, 0.12, 0.05
\end{lstlisting}

\iftoggleverb{isWarewulf}
\iftoggleverb{isWarewulf3}
\begin{center}
\begin{tcolorbox}[]
\small While the \texttt{pxelinux.0} and \texttt{lpxelinux.0} files that ship
Expand Down
2 changes: 1 addition & 1 deletion docs/recipes/install/common/rocky_repos.tex
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@
disabled in a standard install, but can be enabled from EPEL as follows:

\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
[sms](*\#*) dnf install dnf-plugins-core
[sms](*\#*) dnf -y install dnf-plugins-core
[sms](*\#*) dnf config-manager --set-enabled crb
\end{lstlisting}
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
The process used in the previous step is designed to
provide a minimal \baseOS{} configuration. Next, we add additional components
to include resource management client services, NTP support, and
other additional packages to support the default \OHPC{} environment. This
process modifies the base provisioning image and will access the BOS and \OHPC{}
repositories to resolve package install requests. We begin by installing a few
common base packages:

% begin_ohpc_run
% ohpc_comment_header Add OpenHPC base components to compute image \ref{sec:add_components}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true,literate={BOSVER}{\baseos{}}1]
# Install compute node base meta-package
[sms](*\#*) wwctl container exec rocky-9.4 /bin/bash <<- EOF
dnf -y install ohpc-base-compute
EOF
\end{lstlisting}
% end_ohpc_run

\noindent Now, we can include additional required components to the compute
instance including resource manager client, NTP, and development environment modules support.

Adding packages can be done by entering the image with \texttt{wwctl container shell},
\texttt{wwctl container exec}, or using a CHROOT.

12 changes: 12 additions & 0 deletions docs/recipes/install/common/warewulf4_kargs_post.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
\noindent If any components have added to the boot time kernel command line arguments for the compute nodes,
the following command is required to store the configuration in Warewulf:
% begin_ohpc_run
% ohpc_validation_newline
% ohpc_validation_comment Optionally, add arguments to bootstrap kernel
% ohpc_command if [[ ${enable_kargs} -eq 1 ]]; then
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily]
# Set optional compute node kernel command line arguments.
[sms](*\#*) wwctl node set --yes --kernelargs="${kargs}" "${compute_regex}"
\end{lstlisting}
% ohpc_command fi
% end_ohpc_run
34 changes: 34 additions & 0 deletions docs/recipes/install/common/warewulf4_mkchroot_rocky.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
With the provisioning services enabled, the next step is to define and
customize a system image that can subsequently be used to provision one or more
{\em compute} nodes. The following subsections highlight this process.

\subsubsection{Build initial BOS image} \label{sec:assemble_bos}
\Warewulf{} 4 supports using container images as the base file system for
provisioning, and it can import these images directly from an OCI registry like
Docker Hub. Container images must be created especially for use with \Warewulf{}
since they need to include things like a kernel and an init system. In this
example we will import our base image from a set maintained by the \Warewulf{}
community on the GitHub container registry.

The \texttt{wwctl container exec} command runs the commands below it, these commands
also be run interactively one a time with the command \texttt{wwctl container
shell \baseos{}}. You can add \texttt{/bin/false} as the last command to prevent
the image from rebuilding (it will show an error) and rebuild later with the
`wwctl container build` command.

% begin_ohpc_run
% ohpc_comment_header Create compute image for Warewulf \ref{sec:assemble_bos}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true,keepspaces,literate={BOSVER}{\baseos{}}1]
# Import the base image from ghcr
[sms](*\#*) wwctl container import docker://ghcr.io/warewulf/warewulf-rockylinux:9 BOSVER --syncuser

# Enable OpenHPC inside image and update container
[sms](*\#*) wwctl container exec rocky-9.4 /bin/bash <<- EOF
dnf -y install http://repos.openhpc.community/OpenHPC/3/EL_9/x86_64/ohpc-release-3-1.el9.x86_64.rpm
dnf -y update
EOF

# Define chroot location
[sms](*\#*) export CHROOT=/srv/warewulf/chroots/BOSVER/rootfs
\end{lstlisting}
% end_ohpc_run
5 changes: 5 additions & 0 deletions docs/recipes/install/common/warewulf4_setup.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
At this point, all of the packages necessary to use \Warewulf{} on the {\em
master} host should be installed. Next, we need to update the configuration
to allow \Warewulf{} to work with \baseOS{}, update the hosts file, and to
support local provisioning using a second private interface (refer to
Figure~\ref{fig:physical_arch}).
41 changes: 41 additions & 0 deletions docs/recipes/install/common/warewulf4_setup_centos.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
% begin_ohpc_run
% ohpc_comment_header Complete basic Warewulf setup for master node \ref{sec:setup_ww}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true,keepspaces]
# Enable internal interface for provisioning
[sms](*\#*) ip link set dev ${sms_eth_internal} up
[sms](*\#*) ip address add ${sms_ip}/${internal_netmask} broadcast + dev ${sms_eth_internal}

# Compute the network address for the internal network
[sms](*\#*) internal_cidr=$(netmask ${sms_ip}/${internal_netmask})
[sms](*\#*) internal_network=${internal_cidr%/*}

# Edit the warewulf.conf file to use appropriate interface and settings
[sms](*\#*) perl -pi -e "s/ipaddr:.*/ipaddr: ${sms_ip}/" /etc/warewulf/warewulf.conf
[sms](*\#*) perl -pi -e "s/netmask:.*/netmask: ${internal_netmask}/" /etc/warewulf/warewulf.conf
[sms](*\#*) perl -pi -e "s/network:.*/network: ${internal_network}/" /etc/warewulf/warewulf.conf
[sms](*\#*) perl -pi -e 's/template:.*/template: static/' /etc/warewulf/warewulf.conf
[sms](*\#*) perl -pi -e "s/range start:.*/range start: ${c_ip[0]}/" /etc/warewulf/warewulf.conf
[sms](*\#*) perl -pi -e "s/range end:.*/range end: ${c_ip[$((num_computes-1))]}/" /etc/warewulf/warewulf.conf
[sms](*\#*) perl -pi -e "s/mount: false/mount: true/" /etc/warewulf/warewulf.conf

# Configure /etc/hostname on master and compute nodes
[sms](*\#*) perl -pi -e "s/warewulf/${sms_name}/" /srv/warewulf/overlays/host/rootfs/etc/hosts.ww
[sms](*\#*) perl -pi -e "s/warewulf/${sms_name}/" /srv/warewulf/overlays/generic/rootfs/etc/hosts.ww

# Bugfix: dhcpd.template does not set next-server
[sms](*\#*) echo "next-server ${sms_ip};" >> /srv/warewulf/overlays/host/rootfs/etc/dhcpd.conf.ww

# Configuring Warewulf will restart/enable relevant services to support provisioning
[sms](*\#*) systemctl enable --now warewulfd
[sms](*\#*) wwctl configure --all

# Generate ssh keys (usually generated on login)
[sms](*\#*) bash /etc/profile.d/ssh_setup.sh
\end{lstlisting}
% end_ohpc_run

% begin_ohpc_run
% ohpc_validation_newline
% ohpc_validation_comment Update /etc/hosts template to have ${hostname}.localdomain as the first host entry
% ohpc_command sed -e 's_\({{$node.Id.Get}}{{end}}\)_{{$node.Id.Get}}.localdomain \1_g' -i /srv/warewulf/overlays/host/rootfs/etc/hosts.ww
% end_ohpc_run
Loading
Loading