Skip to content

Remote executors

Peter Rowlands (변기호) edited this page Jan 13, 2022 · 12 revisions

Machine management

Note: This is documentation for an experimental feature which is under active development, it should not be used in production environments.

dvc machine provides a set of DVC commands for provisioning and managing remote machines which will eventually be used for executing DVC experiments.

Currently dvc machine implementation utilizes https://github.com/iterative/terraform-provider-iterative and requires the terraform client be installed and available in your PATH.

Installation/Configuration

  • (Optional) Download & install terraform client for your platform

  • (Optional) Install latest tpi from master (pip install -e)

  • Install DVC deps (preferably using pip install -e from master:

    pip install dvc[terraform]
    
    • This will install tpi from pypi if you did not already install it from source

Note: If you do not install a terraform client yourself, it will be downloaded and installed for you (via tpi)

  • Enable the dvc machine feature (either per-repo or globally):
dvc config [--global] feature.machine true

Machine configuration

Machines are configured similarly to DVC remotes, and configuration usage generally mirrors dvc remote add/modify/remove.

  • dvc machine add - adds a machine to your repo configuration (note that no machine instance will actually be created until dvc machine create is run).
  • dvc machine modify - modify the configuration for an existing machine. For a full list of available options, refer to the documentation for https://github.com/iterative/terraform-provider-iterative#machine
  • dvc machine list - List the configuration of one/all machines.
  • dvc machine remove - removes a machine from your repo configuration (note that any running machine instances should be destroyed with dvc machine destroy before removing the machine from your repo configuration.
  • dvc machine rename - Rename a machine to a new name, will also affect the instances related to this machine.

Instance management

  • dvc machine create - create and start an instance of a configured machine.
  • dvc machine status - List the running status of the instances from one specified or all machines.
  • dvc machine destroy - stop and destroy a previously created machine instance.
  • dvc machine ssh - connect to a machine via SSH.
    • Your default ssh client will be used if available in your PATH.
    • Otherwise a limited functionality client session will be provided via asyncssh - Note that interactive programs (particularly line editors like vi) may not work as expected when run in this shell session.

Remote experiment execution

  • Very basic exp execution can be done over SSH via dvc exp run --machine <machine_name> (see also: https://github.com/iterative/dvc/pull/7173).
  • Runtime execution environment for the remote machine can be configured via the setup_script machine configuration option. setup_script should be a shell script, and will be sourced from the root of the user's Git repository prior to running an experiment (i.e. it is sourced before executing dvc exp run). Note that this is separate from the startup_script terraform configuration, which is executed at boot time and meant for installing system packages.
  • Detached/unattended execution is not currently supported, killing or interrupting the dvc exp run --machine command will also terminate the exp execution on the remote machine.

Example .dvc/config:

['machine "aws-test"']
    cloud = aws
    setup_script = ../setup.sh

Example setup.sh:

#!/bin/bash
python3.9 -m venv .venv
source .venv/bin/activate
pip install -U pip
pip install -r src/requirements.txt

To run on remote machine:

$ dvc machine create aws-test
$ dvc exp run --machine aws-test
$ dvc machine destroy aws-test

Example: asciicast

Clone this wiki locally