Description

This module performs the following tasks:

create an instance template from which execute points will be created
create a managed instance group (MIG) for execute points
create a Toolkit runner to configure the autoscaler to scale the MIG

It is expected to be used with the htcondor-install and htcondor-setup modules.

Known limitations

This module may be used multiple times in a blueprint to create sets of execute points in an HTCondor pool. If used more than 1 time, the setting name_prefix must be set to a value that is unique across all uses of the htcondor-execute-point module. If you do not follow this constraint, you will likely receive an error while running terraform apply similar to that shown below.

Error: Invalid value for variable

  on modules/embedded/community/modules/scheduler/htcondor-access-point/main.tf line 136, in module "startup_script":
 136:   runners = local.all_runners
    ├────────────────
    │ var.runners is list of map of string with 5 elements

All startup-script runners must have a unique destination.

How to configure jobs to select execute points

HTCondor access points provisioned by the Toolkit are specially configured to honor an attribute named RequireId in each Job ClassAd. This value must be set to the ID of a MIG created by an instance of this module. The htcondor-access-point module includes a setting var.default_mig_id that will set this value automatically to the MIG ID corresponding to the module's execute points. If this setting is left unset each job must specify +RequireId explicitly. In all cases, the default value can be overridden explicitly as shown below:

universe       = vanilla
executable     = /bin/echo
arguments      = "Hello, World!"
output         = out.$(ClusterId).$(ProcId)
error          = err.$(ClusterId).$(ProcId)
log            = log.$(ClusterId).$(ProcId)
request_cpus   = 1
request_memory = 100MB
+RequireId     = "htcondor-pool-ep-mig"
queue

Example

A full example can be found in the examples README.

The following code snippet creates a pool with 2 sets of HTCondor execute points, one using On-demand pricing and the other using Spot pricing. They use a startup script and network created in previous steps.

- id: htcondor_execute_point
  source: community/modules/compute/htcondor-execute-point
  use:
  - network1
  - htcondor_secrets
  - htcondor_setup
  - htcondor_cm
  settings:
    instance_image:
      project: $(vars.project_id)
      family: $(vars.new_image_family)
    min_idle: 2

- id: htcondor_execute_point_spot
  source: community/modules/compute/htcondor-execute-point
  use:
  - network1
  - htcondor_secrets
  - htcondor_setup
  - htcondor_cm
  settings:
    instance_image:
      project: $(vars.project_id)
      family: $(vars.new_image_family)
    spot: true

- id: htcondor_access
  source: community/modules/scheduler/htcondor-access-point
  use:
  - network1
  - htcondor_secrets
  - htcondor_setup
  - htcondor_cm
  - htcondor_execute_point
  - htcondor_execute_point_spot
  settings:
    default_mig_id: $(htcondor_execute_point.mig_id)
    enable_public_ips: true
    instance_image:
      project: $(vars.project_id)
      family: $(vars.new_image_family)
  outputs:
  - access_point_ips
  - access_point_name

Support

HTCondor is maintained by the Center for High Throughput Computing at the University of Wisconsin-Madison. Support for HTCondor is available via:

Discussion lists
HTCondor on GitHub
HTCondor manual

Behavior of Managed Instance Group (MIG)

Regional MIGs are used to provision Execute Points. By default, VMs will be provisioned in any of the zones available in that region, however, it can be constrained to run in fewer zones (or a single zone) using var.zones.

When the configuration of an Execute Point is changed, the MIG can be configured to replace the VM using a "proactive" or "opportunistic" policy. By default, the policy is set to opportunistic. In practice, this means that Execute Points will NOT be automatically replaced by Terraform when changes to the instance template / HTCondor configuration are made. We recommend leaving this at the default value as it will allow the HTCondor autoscaler to replace VMs when they become idle without disrupting running jobs.

However, if it is desired var.update_policy can be set to "PROACTIVE" to enable automatic replacement. This will disrupt running jobs and send them back to the queue. Alternatively, one can leave the setting at the default value of "OPPORTUNISTIC" and update:

intentionally by issuing an update via Cloud Console or using gcloud (below)
VMs becomes unhealthy or are otherwise automatically replaced (e.g. regular Google Cloud maintenance)

For example, to manually update all instances in a MIG:

gcloud compute instance-groups managed update-instances \
   <<NAME-OF-MIG>> --all-instances --region <<REGION>> \
   --project <<PROJECT_ID>> --minimal-action replace

Known Issues

When using OS Login with "external users" (outside of the Google Cloud organization), then Docker universe jobs will fail and cause the Docker daemon to crash. This stems from the use of POSIX user ids (uid) outside the range supported by Docker. Please consider disabling OS Login if this atypical situation applies.

vars:
  # add setting below to existing deployment variables
  enable_oslogin: DISABLE

License

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

 http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Requirements

Name	Version
terraform	>= 1.1
google	>= 4.0

Providers

Name	Version
google	>= 4.0

Modules

Name	Source	Version
execute_point_instance_template	terraform-google-modules/vm/google//modules/instance_template	10.1.1
mig	terraform-google-modules/vm/google//modules/mig	10.1.1
startup_script	github.com/GoogleCloudPlatform/hpc-toolkit//modules/scripts/startup-script	v1.35.0&depth=1

Resources

Name	Type
google_storage_bucket_object.execute_config	resource
google_compute_image.htcondor	data source
google_compute_zones.available	data source

Inputs

Name	Description	Type	Default	Required
central_manager_ips	List of IP addresses of HTCondor Central Managers	`list(string)`	n/a	yes
deployment_name	Cluster Toolkit deployment name. HTCondor cloud resource names will include this value.	`string`	n/a	yes
disk_size_gb	Boot disk size in GB	`number`	`100`	no
disk_type	Disk type for template	`string`	`"pd-balanced"`	no
distribution_policy_target_shape	Target shape across zones for instance group managing execute points	`string`	`"ANY"`	no
enable_oslogin	Enable or Disable OS Login with "ENABLE" or "DISABLE". Set to "INHERIT" to inherit project OS Login setting.	`string`	`"ENABLE"`	no
enable_shielded_vm	Enable the Shielded VM configuration (var.shielded_instance_config).	`bool`	`false`	no
execute_point_runner	A list of Toolkit runners for configuring an HTCondor execute point	`list(map(string))`	`[]`	no
execute_point_service_account_email	Service account for HTCondor execute point (e-mail format)	`string`	n/a	yes
guest_accelerator	List of the type and count of accelerator cards attached to the instance.	list(object({ type = string, count = number }))	`[]`	no
htcondor_bucket_name	Name of HTCondor configuration bucket	`string`	n/a	yes
instance_image	HTCondor execute point VM image Expected Fields: name: The name of the image. Mutually exclusive with family. family: The image family to use. Mutually exclusive with name. project: The project where the image is hosted.	`map(string)`	{ "family": "hpc-rocky-linux-8", "project": "cloud-hpc-image-public" }	no
labels	Labels to add to HTConodr execute points	`map(string)`	n/a	yes
machine_type	Machine type to use for HTCondor execute points	`string`	`"n2-standard-4"`	no
max_size	Maximum size of the HTCondor execute point pool.	`number`	`5`	no
metadata	Metadata to add to HTCondor execute points	`map(string)`	`{}`	no
min_idle	Minimum number of idle VMs in the HTCondor pool (if pool reaches var.max_size, this minimum is not guaranteed); set to ensure jobs beginning run more quickly.	`number`	`0`	no
name_prefix	Name prefix given to hostnames in this group of execute points; must be unique across all instances of this module	`string`	n/a	yes
network_self_link	The self link of the network HTCondor execute points will join	`string`	`"default"`	no
network_storage	An array of network attached storage mounts to be configured	list(object({ server_ip = string, remote_mount = string, local_mount = string, fs_type = string, mount_options = string, client_install_runner = map(string) mount_runner = map(string) }))	`[]`	no
project_id	Project in which the HTCondor execute points will be created	`string`	n/a	yes
region	The region in which HTCondor execute points will be created	`string`	n/a	yes
service_account_scopes	Scopes by which to limit service account attached to central manager.	`set(string)`	[ "https://www.googleapis.com/auth/cloud-platform" ]	no
shielded_instance_config	Shielded VM configuration for the instance (must set var.enabled_shielded_vm)	object({ enable_secure_boot = bool enable_vtpm = bool enable_integrity_monitoring = bool })	{ "enable_integrity_monitoring": true, "enable_secure_boot": true, "enable_vtpm": true }	no
spot	Provision VMs using discounted Spot pricing, allowing for preemption	`bool`	`false`	no
subnetwork_self_link	The self link of the subnetwork HTCondor execute points will join	`string`	`null`	no
target_size	Initial size of the HTCondor execute point pool; set to null (default) to avoid Terraform management of size.	`number`	`null`	no
update_policy	Replacement policy for Access Point Managed Instance Group ("PROACTIVE" to replace immediately or "OPPORTUNISTIC" to replace upon instance power cycle)	`string`	`"OPPORTUNISTIC"`	no
windows_startup_ps1	Startup script to run at boot-time for Windows-based HTCondor execute points	`list(string)`	`[]`	no
zones	Zone(s) in which execute points may be created. If not supplied, will default to all zones in var.region.	`list(string)`	`[]`	no

Outputs

Name	Description
autoscaler_runner	Toolkit runner to configure the HTCondor autoscaler
mig_id	ID of the managed instance group containing the execute points

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Description

Known limitations

How to configure jobs to select execute points

Example

Support

Behavior of Managed Instance Group (MIG)

Known Issues

License

Requirements

Providers

Modules

Resources

Inputs

Outputs

Files

README.md

Latest commit

History

README.md

File metadata and controls

Description

Known limitations

How to configure jobs to select execute points

Example

Support

Behavior of Managed Instance Group (MIG)

Known Issues

License

Requirements

Providers

Modules

Resources

Inputs

Outputs