Skip to content

Commit

Permalink
Basic Warewulf4 documentation
Browse files Browse the repository at this point in the history
This adds basic documentation for building a Warewulf 4 Slurm cluster
running Rocky 9.4 on x86_64.  This is based upon the work of David
Godlove (https://github.com/GodloveD/ohpc/tree/warewulf4_doc_update)
and Timothy Middelkoop (https://github.com/MiddelkoopT/ohpc-jetstream2).

Signed-off-by: Timothy Middelkoop <[email protected]>
  • Loading branch information
MiddelkoopT committed Oct 21, 2024
1 parent ad05f74 commit cef317c
Show file tree
Hide file tree
Showing 19 changed files with 162 additions and 245 deletions.
7 changes: 7 additions & 0 deletions components/admin/docs/SPECS/docs.spec
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,9 @@ from the OpenHPC software stack.
#pushd docs/recipes/install/centos8/x86_64/warewulf/slurm
#make ; %{parser} steps.tex > recipe.sh ; popd

pushd docs/recipes/install/rocky9/x86_64/warewulf4/slurm
make ; %{parser} steps.tex > recipe.sh ; popd

pushd docs/recipes/install/rocky9/x86_64/warewulf/slurm
make ; %{parser} steps.tex > recipe.sh ; popd

Expand Down Expand Up @@ -167,6 +170,10 @@ install -m 0644 -p docs/Release_Notes.txt %{buildroot}/%{OHPC_PUB}/doc/Release_N

# x86_64 guides

%define lpath rocky9/x86_64/warewulf4/slurm
install -m 0644 -p -D docs/recipes/install/%{lpath}/steps.pdf %{buildroot}/%{OHPC_PUB}/doc/recipes/%{lpath}/Install_guide.pdf
install -m 0755 -p -D docs/recipes/install/%{lpath}/recipe.sh %{buildroot}/%{OHPC_PUB}/doc/recipes/%{lpath}/recipe.sh

%define lpath rocky9/x86_64/warewulf/slurm
install -m 0644 -p -D docs/recipes/install/%{lpath}/steps.pdf %{buildroot}/%{OHPC_PUB}/doc/recipes/%{lpath}/Install_guide.pdf
install -m 0755 -p -D docs/recipes/install/%{lpath}/recipe.sh %{buildroot}/%{OHPC_PUB}/doc/recipes/%{lpath}/recipe.sh
Expand Down
1 change: 1 addition & 0 deletions docs/recipes/install/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,4 @@ steps.pdf
vc.tex
pkg-ohpc.chglog*

steps.synctex.gz
13 changes: 8 additions & 5 deletions docs/recipes/install/common/add_ww4_hosts_finalize.tex
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,15 @@
% end_ohpc_run
\fi

\input{common/wwnodescan}
Finally, we reconfigure build the overlays and update the Warewulf configuration.
It is necessary to rebuild the overlays whenever a overlay is modified.

% begin_ohpc_run
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily]
# Restart dhcp / update PXE
[sms](*\#*) systemctl restart dhcpd
[sms](*\#*) wwsh pxe update
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily,literate={BOSVER}{\baseos{}}1]
# build the overlays for all the nodes
[sms](*\#*) wwctl overlay build

# Update Warewulf configure
[sms](*\#*) wwctl configure --all
\end{lstlisting}
% end_ohpc_run
27 changes: 5 additions & 22 deletions docs/recipes/install/common/add_ww4_hosts_intro.tex
Original file line number Diff line number Diff line change
@@ -1,28 +1,11 @@
%\iftoggle{isx86}{\clearpage}
% begin_ohpc_run
% ohpc_validation_comment Add hosts to cluster

\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily,]
# Set provisioning interface as the default networking device
[sms](*\#*) echo "GATEWAYDEV=${eth_provision}" > /tmp/network.$$
[sms](*\#*) wwsh -y file import /tmp/network.$$ --name network
[sms](*\#*) wwsh -y file set network --path /etc/sysconfig/network --mode=0644 --uid=0

# Add nodes to Warewulf data store
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily,literate={BOSVER}{\baseos{}}1]
# add the nodes with --discoverable so warewulf will identify new mac addresses
[sms](*\#*) for ((i=0; i<$num_computes; i++)) ; do
wwsh -y node new ${c_name[i]} --ipaddr=${c_ip[i]} --hwaddr=${c_mac[i]} -D ${eth_provision}
done
wwctl node add --discoverable=yes --container=BOSVER \
--ipaddr=${c_ip[$i]} --netmask=${internal_netmask} ${compute_prefix}$i
done
\end{lstlisting}
% end_ohpc_run

%\iftoggle{isCentOS_ww_pbs_x86}{\clearpage}

%\iftoggle{isSLES_ww_slurm_x86}{\clearpage}

\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily]
# Additional step required if desiring to use predictable network interface
# naming schemes (e.g. en4s0f0). Skip if using eth# style names.
[sms](*\#*) export kargs="${kargs} net.ifnames=1,biosdevname=1"
[sms](*\#*) wwsh provision set --postnetdown=1 "${compute_regex}"
\end{lstlisting}

14 changes: 8 additions & 6 deletions docs/recipes/install/common/add_ww4_hosts_slurm.tex
Original file line number Diff line number Diff line change
@@ -1,9 +1,11 @@
Now that the nodes are defined, we can start munge and Slurm. This must be done
after the nodes are defined and the Warewulf configuration is updated.

% begin_ohpc_run
% ohpc_validation_comment Add hosts to cluster (Cont.)
% ohpc_validation_comment Enable and start munge and slurmctld (Cont.)
\begin{lstlisting}[language=bash,keywords={},upquote=true,basicstyle=\footnotesize\ttfamily,literate={BOSVER}{\baseos{}}1]
# Define provisioning image for hosts
[sms](*\#*) wwsh -y provision set "${compute_regex}" --vnfs=BOSVER --bootstrap=`uname -r` \
--files=dynamic_hosts,passwd,group,shadow,munge.key,network
# Enable and start munge and slurmctld
[sms](*\#*) systemctl enable --now munge
[sms](*\#*) systemctl enable --now slurmctld
\end{lstlisting}


% end_ohpc_run
2 changes: 2 additions & 0 deletions docs/recipes/install/common/bos.tex
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
master} host. Alternatively, if choosing to use a pre-installed server, please
verify that it is provisioned with the required \baseOS{} distribution. \\

\ifnottoggleverb{isWarewulf4}
Prior to beginning the installation process of \OHPC{} components, several additional
considerations are noted here for the SMS host configuration. First,
the installation recipe herein assumes that
Expand All @@ -15,6 +16,7 @@
\begin{lstlisting}[language=bash,keywords={}]
[sms](*\#*) echo ${sms_ip} ${sms_name} >> /etc/hosts
\end{lstlisting}
\fi

While it is theoretically possible to enable SELinux on a cluster provisioned
with \provisioner{},
Expand Down
70 changes: 10 additions & 60 deletions docs/recipes/install/common/finalize_warewulf4_provisioning.tex
Original file line number Diff line number Diff line change
@@ -1,74 +1,24 @@
\subsection{Finalizing provisioning configuration} \label{sec:assemble_bootstrap}

\Warewulf{} employs a two-stage boot process for provisioning nodes via
creation of a bootstrap image that is used to initialize the process, and a virtual node
file system capsule containing the full system image. This section highlights
creation of the necessary provisioning images, followed by the registration of
desired compute nodes.
\Warewulf{} provisions a node with an image then customizes it with overlays.
This section highlights creation of the node image and overlays, followed by the
registration of desired compute nodes.

\subsubsection{Assemble bootstrap image}
\subsubsection{Build container image and overlays}

The bootstrap image includes the runtime kernel and associated modules, as well
as some simple scripts to complete the provisioning process. The
following commands highlight the inclusion of additional drivers and creation
of the bootstrap image based on the running kernel.

%\iftoggle{isCentOS_ww_slurm_aarch}{\clearpage}
as some simple scripts to complete the provisioning process.

% begin_ohpc_run
% ohpc_comment_header Assemble bootstrap image \ref{sec:assemble_bootstrap}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
# (Optional) Include drivers from kernel updates; needed if enabling additional kernel modules on computes
[sms](*\#*) export WW_CONF=/etc/warewulf/bootstrap.conf
[sms](*\#*) echo "drivers += updates/kernel/" >> $WW_CONF

# Build bootstrap image
[sms](*\#*) wwbootstrap `uname -r`
\end{lstlisting}
% end_ohpc_run

\subsubsection{Assemble Virtual Node File System (VNFS) image}

With the local site customizations in place, the following step uses the
\texttt{wwvnfs} command to assemble a VNFS capsule from the chroot environment
defined for the {\em compute} instance.

% begin_ohpc_run
% ohpc_validation_comment Assemble VNFS
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
[sms](*\#*) wwvnfs --chroot $CHROOT
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true,literate={BOSVER}{\baseos{}}1]
# Build image
[sms](*\#*) wwctl container build BOSVER
[sms](*\#*) wwctl overlay build
\end{lstlisting}
% end_ohpc_run

\iftoggle{isCentOS_ww_slurm_aarch}{\vspace*{0.4cm}}

\iftoggle{isSLES_ww_slurm_aarch}{\vspace*{-0.1cm}}

\subsubsection{Register nodes for provisioning}

In preparation for provisioning, we can now define the desired network settings
for four example compute nodes with the underlying provisioning system and
restart the \texttt{dhcp} service. Note the use of variable names for the
desired compute hostnames, node IPs, and MAC addresses which should be modified
to accommodate local settings and hardware. By default, \Warewulf{} uses
network interface names of the \texttt{eth\#} variety and adds kernel boot
arguments to maintain this scheme on newer kernels. Consequently, when specifying
the desired provisioning interface via the \texttt{\$eth\_provision} variable,
it should follow this convention. Alternatively, if you prefer to use the
predictable network interface naming scheme (e.g. names like \texttt{en4s0f0}),
additional steps are included to alter the default kernel boot arguments and take
the \texttt{eth\#} named interface down after bootstrapping so the normal init
process can bring it up again using the desired name.

\iftoggleverb{isx86}
Also included in these steps are commands
to enable \Warewulf{} to manage IPoIB settings and corresponding definitions of
IPoIB addresses for the compute nodes. This is typically optional unless you
are planning to include a \Lustre{} client mount over \InfiniBand{}.
\fi
The final step
in this process associates the VNFS image assembled in previous steps with the
newly defined compute nodes, utilizing the user credential files and munge key
that were imported in \S\ref{sec:file_import}.

Nodes can be registered for provisioning using the following syntax.

15 changes: 9 additions & 6 deletions docs/recipes/install/common/import_ww4_files.tex
Original file line number Diff line number Diff line change
@@ -1,14 +1,17 @@
The \Warewulf{} system includes functionality to import arbitrary files from
the provisioning server for distribution to managed hosts. This is one way to
distribute user credentials to {\em compute} nodes. To import local file-based
credentials, issue the following:
the provisioning server for distribution to managed hosts through a system
called "overlays". Some files, like \texttt{/etc/passwd}, and \texttt{/etc/hosts}
handled in this way by default. Here we add directories and files to the
\texttt{generic} overlay that is applied to all nodes.

% begin_ohpc_run
% ohpc_comment_header Import files \ref{sec:file_import}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
[sms](*\#*) wwsh file import /etc/passwd
[sms](*\#*) wwsh file import /etc/group
[sms](*\#*) wwsh file import /etc/shadow
# Add the following to support unprivileged user namespaces for tools like Apptainer
[sms](*\#*) wwctl overlay import generic /etc/subuid
[sms](*\#*) wwctl overlay import generic /etc/subgid

# Identify master host as local NTP server
[sms](*\#*) echo "server \${sms_ip} iburst" | wwctl overlay import generic <(cat) /etc/chrony.conf
\end{lstlisting}
% \end_ohpc_run
12 changes: 10 additions & 2 deletions docs/recipes/install/common/import_ww4_files_slurm.tex
Original file line number Diff line number Diff line change
@@ -1,9 +1,17 @@
\noindent Similarly, to import the cryptographic
\noindent Similarly, we can configure Slurm and import the cryptographic
key that is required by the {\em munge} authentication library to be available
on every host in the resource management pool, issue the following:

% begin_ohpc_run
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
[sms](*\#*) wwsh file import /etc/munge/munge.key
# Configure Slurm server in the overlay (using "configless" option)
[sms](*\#*) wwctl overlay mkdir generic /etc/sysconfig/
[sms](*\#*) wwctl overlay import generic <(echo SLURMD_OPTIONS="--conf-server \${sms_ip}") /etc/sysconfig/slurmd

# Configure munge
[sms](*\#*) wwctl overlay mkdir generic --mode 0700 /etc/munge
[sms](*\#*) wwctl overlay import generic /etc/munge/munge.key
[sms](*\#*) wwctl overlay chown generic /etc/munge/munge.key $(id -u munge) $(id -g munge)
[sms](*\#*) wwctl overlay chown generic /etc/munge $(id -u munge) $(id -g munge)
\end{lstlisting}
% \end_ohpc_run
1 change: 1 addition & 0 deletions docs/recipes/install/common/inputs.tex
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ \subsection{Inputs} \label{sec:inputs}
\iftoggleverb{isWarewulf}
& \texttt{\$\{eth\_provision\}} & {\small \# Provisioning interface for computes} \\
\fi
& \texttt{\$\{internal\_network\}} & {\small \# Subnet network address for internal network} \\
& \texttt{\$\{internal\_netmask\}} & {\small \# Subnet netmask for internal network} \\
& \texttt{\$\{ntp\_server\}} & {\small \# Local ntp server for time synchronization} \\
& \texttt{\$\{bmc\_username\}} & {\small \# BMC username for use by IPMI} \\
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,17 @@
logical components together in order to help aid in this process. For
reference, a complete list of available group aliases and RPM packages available
via \OHPC{} are provided in Appendix~\ref{appendix:manifest}. To add
support for provisioning services, the following commands illustrate addition
of a common base package followed by the Warewulf provisioning system.
support for provisioning services, the following command adds a common base
package followed along with the Warewulf provisioning system. Then the main
Warewulf configuration file is edited to reflect the environment.

%\nottoggle{isCentOS}{\clearpage}

% begin_ohpc_run
% ohpc_comment_header Add baseline OpenHPC and provisioning services \ref{sec:add_provisioning}
\begin{lstlisting}[language=bash,keywords={}]
# Install base meta-packages
[sms](*\#*) (*\install*) ohpc-base
[sms](*\#*) (*\install*) ohpc-warewulf
[sms](*\#*) (*\install*) hwloc-ohpc
# Install base packages
[sms](*\#*) (*\install*) ohpc-base warewulf-ohpc hwloc-ohpc
\end{lstlisting}
% end_ohpc_run

Expand Down
8 changes: 8 additions & 0 deletions docs/recipes/install/common/ohpc-doc.sty
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,8 @@
\newtoggle{isaarch}
\newtoggle{ispbs}
\newtoggle{isWarewulf}
\newtoggle{isWarewulf3}
\newtoggle{isWarewulf4}
\newtoggle{isSLURM}
\newtoggle{isxCAT}
\newtoggle{isxCATstateful}
Expand All @@ -76,6 +78,12 @@
{\csname etb@tgl@#1\endcsname\iftrue\iffalse}
{\etb@noglobal\etb@err@notoggle{#1}\iffalse}%
}
% inverse of above
\newcommand{\ifnottoggleverb}[1]{%
\ifcsdef{etb@tgl@#1}
{\csname etb@tgl@#1\endcsname\iffalse\iftrue}
{\etb@noglobal\etb@err@notoggle{#1}\iftrue}%
}

\pagestyle{fancy}
\setlength\headheight{59pt}
Expand Down
2 changes: 1 addition & 1 deletion docs/recipes/install/common/reset_computes.tex
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
c4 05:03am up 0:02, 0 users, load average: 0.15, 0.12, 0.05
\end{lstlisting}

\iftoggleverb{isWarewulf}
\iftoggleverb{isWarewulf3}
\begin{center}
\begin{tcolorbox}[]
\small While the \texttt{pxelinux.0} and \texttt{lpxelinux.0} files that ship
Expand Down
2 changes: 1 addition & 1 deletion docs/recipes/install/common/rocky_repos.tex
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@
disabled in a standard install, but can be enabled from EPEL as follows:

\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
[sms](*\#*) dnf install dnf-plugins-core
[sms](*\#*) dnf -y install dnf-plugins-core
[sms](*\#*) dnf config-manager --set-enabled crb
\end{lstlisting}
Original file line number Diff line number Diff line change
@@ -1,33 +1,25 @@
The \texttt{wwmkchroot} process used in the previous step is designed to
The process used in the previous step is designed to
provide a minimal \baseOS{} configuration. Next, we add additional components
to include resource management client services, NTP support, and
other additional packages to support the default \OHPC{} environment. This
process augments the chroot-based install performed by \texttt{wwmkchroot} to
modify the base provisioning image and will access the BOS and \OHPC{}
process modifies the base provisioning image and will access the BOS and \OHPC{}
repositories to resolve package install requests. We begin by installing a few
common base packages:

% begin_ohpc_run
% ohpc_comment_header Add OpenHPC base components to compute image \ref{sec:add_components}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true,literate={BOSVER}{\baseos{}}1]
# Install compute node base meta-package
[sms](*\#*) (*\chrootinstall*) ohpc-base-compute
\end{lstlisting}
% end_ohpc_run

To access the remote
repositories by hostname (and not IP addresses), the chroot environment needs
to be updated to enable DNS resolution. Assuming that the {\em master} host has
a working DNS configuration in place, the chroot environment can be updated
with a copy of the configuration as follows:

% begin_ohpc_run
% ohpc_comment_header Add OpenHPC components to compute image \ref{sec:add_components}
\begin{lstlisting}[language=bash,literate={-}{-}1,keywords={},upquote=true]
[sms](*\#*) cp -p /etc/resolv.conf $CHROOT/etc/resolv.conf
[sms](*\#*) (*\containerinstall*)
(*\install*) ohpc-base-compute
/bin/false
EOF
\end{lstlisting}
% end_ohpc_run

\noindent Now, we can include additional required components to the compute
instance including resource manager client, NTP, and development environment modules support.

Adding packages can be done by entering the image with \texttt{wwctl container shell},
\texttt{wwctl container shell}, or using a CHROOT.

Loading

0 comments on commit cef317c

Please sign in to comment.