This module creates one or more compute VM instances.
- id: compute
source: modules/compute/vm-instance
use: [network1]
settings:
instance_count: 8
name_prefix: compute
machine_type: c2-standard-60
This creates a cluster of 8 compute VMs that are:
- named
compute-[0-7]
- on the network defined by the
network1
module - of type c2-standard-60
NOTE: Simultaneous Multithreading (SMT) is deactivated by default (threads_per_core=1), which means only the physical cores are visible on the VM. With SMT disabled, a machine of type c2-standard-60 will only have the 30 physical cores visible. To change this, set
threads_per_core=2
under settings.
There are two methods for adding network connectivity to the vm-instance
module. The first is shown in the example above, where a vpc
module or
pre-existing-vpc
module is used by the vm-instance
module. When this
happens, the network_self_link
and subnetwork_self_link
outputs from the
network are provided as input to the vm-instance
and a network interface is
defined based on that. This can also be done updating the network_self_link
and
subnetwork_self_link
settings directly.
The alternative option can be used when more than one network needs to be added
to the vm-instance
or further customization is needed beyond what is provided
via other variables. For this option, the network_interfaces
variable can be
used to set up one or more network interfaces on the VM instance. The format is
consistent with the terraform google_compute_instance
network_interface
block, and more information can be found in the
terraform docs.
NOTE: When supplying the
network_interfaces
variable, networks associated with thevm-instance
via use will be ignored in favor of the networks added innetwork_interfaces
. In addition,bandwidth_tier
anddisable_public_ips
will not apply to networks defined innetwork_interfaces
.
This module will ignore all changes to the ssh-keys
metadata field that are
typically set by external Google Cloud tools that automate SSH access
when not using OS Login. For example, clicking on the Google Cloud Console SSH
button next to VMs in the VM Instances list will temporarily modify VM metadata
to include a dynamically-generated SSH public key.
The placement_policy
variable can be used to control where your VM instances
are physically located relative to each other within a zone. See the official
placement guide and api documentation.
Use the following settings for compact placement:
...
settings:
instance_count: 4
machine_type: c2-standard-60
placement_policy:
collocation: "COLLOCATED"
By default the above placement policy will always result in the most compact set
of VMs available. If you would like that provisioning failed if some level of
compactness is not obtainable, you can enforce this with the max_distance
setting:
...
settings:
instance_count: 4
machine_type: c2-standard-60
placement_policy:
collocation: "COLLOCATED"
max_distance: 1
Use the following settings for spread placement:
...
settings:
instance_count: 4
machine_type: n2-standard-4
placement_policy:
availability_domain_count: 2
When vm_count
is not set, as shown in the examples above, then the VMs will be
added to the placement policy incrementally. This is the recommended way to
use placement policies.
If vm_count
is specified then VMs will stay in pending state until the
specified number of VMs are created. See the warning below if using this field.
Warning
When creating a compact placement using vm_count
with more than 10 VMs, you
must add -parallelism=<n>
argument on apply. For example if you have 15 VMs
in a placement group: terraform apply -parallelism=15
. This is because
terraform self limits to 10 parallel requests by default but the create
instance requests will not succeed until all VMs in the placement group have
been requested, forming a deadlock.
More information on GPU support in vm-instance
and other Cluster Toolkit modules
can be found at docs/gpu-support.md
The vm-instance
module will be replaced when the instance_image
variable is
changed and terraform apply
is run on the deployment group folder or
ghpc deploy
is run. However, it will not be automatically replaced if a new
image is created in a family.
To selectively replace the vm-instance(s), consider running terraform
apply -replace
such as:
See https://developer.hashicorp.com/terraform/cli/commands/plan#replace-address for precise syntax terraform apply -replace=ADDRESS
terraform state list
# search for the module ID and resource
terraform apply -replace="address"
Copyright 2023 Google LLC
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
Name | Version |
---|---|
terraform | >= 1.3.0 |
>= 4.73.0 | |
google-beta | >= 4.73.0 |
null | >= 3.0 |
Name | Version |
---|---|
>= 4.73.0 | |
google-beta | >= 4.73.0 |
null | >= 3.0 |
Name | Source | Version |
---|---|---|
netstorage_startup_script | github.com/GoogleCloudPlatform/hpc-toolkit//modules/scripts/startup-script | v1.36.0 |
Name | Type |
---|---|
google-beta_google_compute_instance.compute_vm | resource |
google-beta_google_compute_resource_policy.placement_policy | resource |
google_compute_address.compute_ip | resource |
google_compute_disk.boot_disk | resource |
null_resource.image | resource |
null_resource.replace_vm_trigger_from_placement | resource |
google_compute_image.compute_image | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
add_deployment_name_before_prefix | If true, the names of VMs and disks will always be prefixed with deployment_name to enable uniqueness across deployments.See name_prefix for further details on resource naming behavior. |
bool |
false |
no |
allocate_ip | If not null, allocate IPs with the given configuration. See details at https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_address |
object({ |
null |
no |
allow_automatic_updates | If false, disables automatic system package updates on the created instances. This feature is only available on supported images (or images derived from them). For more details, see https://cloud.google.com/compute/docs/instances/create-hpc-vm#disable_automatic_updates |
bool |
true |
no |
auto_delete_boot_disk | Controls if boot disk should be auto-deleted when instance is deleted. | bool |
true |
no |
automatic_restart | Specifies if the instance should be restarted if it was terminated by Compute Engine (not a user). | bool |
null |
no |
bandwidth_tier | Tier 1 bandwidth increases the maximum egress bandwidth for VMs. Using the tier_1_enabled setting will enable both gVNIC and TIER_1 higher bandwidth networking.Using the gvnic_enabled setting will only enable gVNIC and will not enable TIER_1.Note that TIER_1 only works with specific machine families & shapes and must be using an image that supports gVNIC. See official docs for more details. |
string |
"not_enabled" |
no |
deployment_name | Name of the deployment, will optionally be used name resources according to name_prefix |
string |
n/a | yes |
disable_public_ips | If set to true, instances will not have public IPs | bool |
false |
no |
disk_size_gb | Size of disk for instances. | number |
200 |
no |
disk_type | Disk type for instances. | string |
"pd-standard" |
no |
enable_oslogin | Enable or Disable OS Login with "ENABLE" or "DISABLE". Set to "INHERIT" to inherit project OS Login setting. | string |
"ENABLE" |
no |
guest_accelerator | List of the type and count of accelerator cards attached to the instance. | list(object({ |
[] |
no |
instance_count | Number of instances | number |
1 |
no |
instance_image | Instance Image | map(string) |
{ |
no |
labels | Labels to add to the instances. Key-value pairs. | map(string) |
n/a | yes |
local_ssd_count | The number of local SSDs to attach to each VM. See https://cloud.google.com/compute/docs/disks/local-ssd. | number |
0 |
no |
local_ssd_interface | Interface to be used with local SSDs. Can be either 'NVME' or 'SCSI'. No effect unless local_ssd_count is also set. |
string |
"NVME" |
no |
machine_type | Machine type to use for the instance creation | string |
"c2-standard-60" |
no |
metadata | Metadata, provided as a map | map(string) |
{} |
no |
min_cpu_platform | The name of the minimum CPU platform that you want the instance to use. | string |
null |
no |
name_prefix | An optional name for all VM and disk resources. If not supplied, deployment_name will be used.When name_prefix is supplied, and add_deployment_name_before_prefix is set,then resources are named by "< deployment_name >-<name_prefix >-<#>". |
string |
null |
no |
network_interfaces | A list of network interfaces. The options match that of the terraform network_interface block of google_compute_instance. For descriptions of the subfields or more information see the documentation: https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_instance#nested_network_interface _NOTE:_ If network_interfaces are set, network_self_link andsubnetwork_self_link will be ignored, even if they are provided throughthe use field. bandwidth_tier and disable_public_ips also do not applyto network interfaces defined in this variable. Subfields: network (string, required if subnetwork is not supplied) subnetwork (string, required if network is not supplied) subnetwork_project (string, optional) network_ip (string, optional) nic_type (string, optional, choose from ["GVNIC", "VIRTIO_NET"]) stack_type (string, optional, choose from ["IPV4_ONLY", "IPV4_IPV6"]) queue_count (number, optional) access_config (object, optional) ipv6_access_config (object, optional) alias_ip_range (list(object), optional) |
list(object({ |
[] |
no |
network_self_link | The self link of the network to attach the VM. Can use "default" for the default network. | string |
null |
no |
network_storage | An array of network attached storage mounts to be configured. | list(object({ |
[] |
no |
on_host_maintenance | Describes maintenance behavior for the instance. If left blank this will default to MIGRATE except for when placement_policy , spot provisioning, or GPUs require it to be TERMINATE |
string |
null |
no |
placement_policy | Control where your VM instances are physically located relative to each other within a zone. See https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/compute_resource_policy#nested_group_placement_policy |
any |
null |
no |
project_id | Project in which the HPC deployment will be created | string |
n/a | yes |
region | The region to deploy to | string |
n/a | yes |
service_account | DEPRECATED - Use service_account_email and service_account_scopes instead. |
object({ |
null |
no |
service_account_email | Service account e-mail address to use with the node pool | string |
null |
no |
service_account_scopes | Scopes to to use with the node pool. | set(string) |
[ |
no |
spot | Provision VMs using discounted Spot pricing, allowing for preemption | bool |
false |
no |
startup_script | Startup script used on the instance | string |
null |
no |
subnetwork_self_link | The self link of the subnetwork to attach the VM. | string |
null |
no |
tags | Network tags, provided as a list | list(string) |
[] |
no |
threads_per_core | Sets the number of threads per physical core. By setting threads_per_core to 2, Simultaneous Multithreading (SMT) is enabled extending the total number of virtual cores. For example, a machine of type c2-standard-60 will have 60 virtual cores with threads_per_core equal to 2. With threads_per_core equal to 1 (SMT turned off), only the 30 physical cores will be available on the VM. The default value of "0" will turn off SMT for supported machine types, and will fall back to GCE defaults for unsupported machine types (t2d, shared-core instances, or instances with less than 2 vCPU). Disabling SMT can be more performant in many HPC workloads, therefore it is disabled by default where compatible. null = SMT configuration will use the GCE defaults for the machine type 0 = SMT will be disabled where compatible (default) 1 = SMT will always be disabled (will fail on incompatible machine types) 2 = SMT will always be enabled (will fail on incompatible machine types) |
number |
0 |
no |
zone | Compute Platform zone | string |
n/a | yes |
Name | Description |
---|---|
external_ip | External IP of the instances (if enabled) |
instructions | Instructions on how to SSH into the created VM. Commands may fail depending on VM configuration and IAM permissions. |
internal_ip | Internal IP of the instances |
name | Names of instances created |
self_link | The tuple URIs of the created instances |