Skip to content

Latest commit

 

History

History
105 lines (82 loc) · 10.5 KB

File metadata and controls

105 lines (82 loc) · 10.5 KB

Description

This module creates a compute partition that can be used as input to the schedmd-slurm-gcp-v6-controller.

The partition module is designed to work alongside the schedmd-slurm-gcp-v6-nodeset module. A partition can be made up of one or more nodesets, provided either through use (preferred) or defined manually in the nodeset variable.

Example

The following code snippet creates a partition module with:

  • 2 nodesets added via use.
    • The first nodeset is made up of machines of type c2-standard-30.
    • The second nodeset is made up of machines of type c2-standard-60.
    • Both nodesets have a maximum count of 200 dynamically created nodes.
  • partition name of "compute".
  • connected to the network module via use.
  • nodes mounted to homefs via use.
- id: nodeset_1
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
  use:
  - network
  settings:
    name: c30
    node_count_dynamic_max: 200
    machine_type: c2-standard-30

- id: nodeset_2
  source: community/modules/compute/schedmd-slurm-gcp-v6-nodeset
  use:
  - network
  settings:
    name: c60
    node_count_dynamic_max: 200
    machine_type: c2-standard-60

- id: compute_partition
  source: community/modules/compute/schedmd-slurm-gcp-v6-partition
  use:
  - homefs
  - nodeset_1
  - nodeset_2
  settings:
    partition_name: compute

Support

The Cluster Toolkit team maintains the wrapper around the slurm-on-gcp terraform modules. For support with the underlying modules, see the instructions in the slurm-gcp README.

Requirements

Name Version
terraform >= 1.3

Providers

No providers.

Modules

No modules.

Resources

No resources.

Inputs

Name Description Type Default Required
exclusive Exclusive job access to nodes. bool true no
is_default Sets this partition as the default partition by updating the partition_conf.
If "Default" is already set in partition_conf, this variable will have no effect.
bool false no
network_storage DEPRECATED
list(object({
server_ip = string,
remote_mount = string,
local_mount = string,
fs_type = string,
mount_options = string,
client_install_runner = map(string)
mount_runner = map(string)
}))
[] no
nodeset Define nodesets, as a list.
list(object({
node_count_static = optional(number, 0)
node_count_dynamic_max = optional(number, 1)
node_conf = optional(map(string), {})
nodeset_name = string
additional_disks = optional(list(object({
disk_name = optional(string)
device_name = optional(string)
disk_size_gb = optional(number)
disk_type = optional(string)
disk_labels = optional(map(string), {})
auto_delete = optional(bool, true)
boot = optional(bool, false)
})), [])
bandwidth_tier = optional(string, "platform_default")
can_ip_forward = optional(bool, false)
disable_smt = optional(bool, false)
disk_auto_delete = optional(bool, true)
disk_labels = optional(map(string), {})
disk_size_gb = optional(number)
disk_type = optional(string)
enable_confidential_vm = optional(bool, false)
enable_placement = optional(bool, false)
enable_oslogin = optional(bool, true)
enable_shielded_vm = optional(bool, false)
gpu = optional(object({
count = number
type = string
}))
instance_template = optional(string)
labels = optional(map(string), {})
machine_type = optional(string)
maintenance_interval = optional(string)
metadata = optional(map(string), {})
min_cpu_platform = optional(string)
network_storage = optional(list(object({
server_ip = string
remote_mount = string
local_mount = string
fs_type = string
mount_options = string
})), [])
network_tier = optional(string, "STANDARD")
on_host_maintenance = optional(string)
preemptible = optional(bool, false)
region = optional(string)
service_account = optional(object({
email = optional(string)
scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])
}))
shielded_instance_config = optional(object({
enable_integrity_monitoring = optional(bool, true)
enable_secure_boot = optional(bool, true)
enable_vtpm = optional(bool, true)
}))
source_image_family = optional(string)
source_image_project = optional(string)
source_image = optional(string)
additional_networks = optional(list(object({
network = string
subnetwork = string
subnetwork_project = string
network_ip = string
nic_type = string
stack_type = string
queue_count = number
access_config = list(object({
nat_ip = string
network_tier = string
}))
ipv6_access_config = list(object({
network_tier = string
}))
alias_ip_range = list(object({
ip_cidr_range = string
subnetwork_range_name = string
}))
})))
access_config = optional(list(object({
nat_ip = string
network_tier = string
})))
subnetwork_self_link = string
spot = optional(bool, false)
tags = optional(list(string), [])
termination_action = optional(string)
zones = optional(list(string), [])
zone_target_shape = optional(string, "ANY_SINGLE_ZONE")
reservation_name = optional(string)
startup_script = optional(list(object({
filename = string
content = string })), [])
}))
[] no
nodeset_dyn Defines dynamic nodesets, as a list.
list(object({
nodeset_name = string
nodeset_feature = string
}))
[] no
nodeset_tpu Define TPU nodesets, as a list.
list(object({
node_count_static = optional(number, 0)
node_count_dynamic_max = optional(number, 5)
nodeset_name = string
enable_public_ip = optional(bool, false)
node_type = string
accelerator_config = optional(object({
topology = string
version = string
}), {
topology = ""
version = ""
})
tf_version = string
preemptible = optional(bool, false)
preserve_tpu = optional(bool, false)
zone = string
data_disks = optional(list(string), [])
docker_image = optional(string, "")
network_storage = optional(list(object({
server_ip = string
remote_mount = string
local_mount = string
fs_type = string
mount_options = string
})), [])
subnetwork = string
service_account = optional(object({
email = optional(string)
scopes = optional(list(string), ["https://www.googleapis.com/auth/cloud-platform"])
}))
project_id = string
reserved = optional(string, false)
}))
[] no
partition_conf Slurm partition configuration as a map.
See https://slurm.schedmd.com/slurm.conf.html#SECTION_PARTITION-CONFIGURATION
map(string) {} no
partition_name The name of the slurm partition. string n/a yes
resume_timeout Maximum time permitted (in seconds) between when a node resume request is issued and when the node is actually available for use.
If null is given, then a smart default will be chosen depending on nodesets in partition.
This sets 'ResumeTimeout' in partition_conf.
See https://slurm.schedmd.com/slurm.conf.html#OPT_ResumeTimeout_1 for details.
number 300 no
suspend_time Nodes which remain idle or down for this number of seconds will be placed into power save mode by SuspendProgram.
This sets 'SuspendTime' in partition_conf.
See https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendTime_1 for details.
NOTE: use value -1 to exclude partition from suspend.
number 300 no
suspend_timeout Maximum time permitted (in seconds) between when a node suspend request is issued and when the node is shutdown.
If null is given, then a smart default will be chosen depending on nodesets in partition.
This sets 'SuspendTimeout' in partition_conf.
See https://slurm.schedmd.com/slurm.conf.html#OPT_SuspendTimeout_1 for details.
number null no

Outputs

Name Description
nodeset Details of a nodesets in this partition
nodeset_dyn Details of a dynamic nodesets in this partition
nodeset_tpu Details of a TPU nodesets in this partition
partitions Details of a slurm partition