Skip to content

Commit

Permalink
Merge pull request #4 from ajschmidt8/generic-matrix
Browse files Browse the repository at this point in the history
Make `matrix` values generic
  • Loading branch information
ajschmidt8 authored Aug 24, 2022
2 parents 0fb9981 + 2b7325d commit 0576eb0
Show file tree
Hide file tree
Showing 6 changed files with 169 additions and 98 deletions.
130 changes: 87 additions & 43 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
# rapids-dependency-file-generator

`rapids-dependency-file-generator` is a Python CLI tool that generates conda `environment.yaml` files and `requirements.txt` files from a single YAML file, typically named `dependencies.yaml`. When installed, it makes the `rapids-dependency-file-generator` CLI command available which is responsible for parsing a `dependencies.yaml` configuration file and generating the appropriate conda `environment.yaml` and `requirements.txt` dependency files.
`rapids-dependency-file-generator` is a Python CLI tool that generates conda `environment.yaml` files and `requirements.txt` files from a single YAML file, typically named `dependencies.yaml`.

When installed, it makes the `rapids-dependency-file-generator` CLI command available which is responsible for parsing a `dependencies.yaml` configuration file and generating the appropriate conda `environment.yaml` and `requirements.txt` dependency files.

## Table of Contents

Expand All @@ -24,9 +26,12 @@ pip install rapids-dependency-file-generator
## Usage

When `rapids-dependency-file-generator` is invoked, it will read a `dependencies.yaml` file from the current directory and generate children dependency files.
`dependencies.yaml` is intended to be committed to the root directory of repositories.
It has specific keys (described below) that enable the bifurcation of dependencies for different CUDA versions, architectures, and dependency file types (i.e. conda `environment.yaml` files vs. `requirements.txt`).
The bifurcated dependency lists are merged according to the description in the [_How Dependency Lists Are Merged_](#how-dependency-lists-are-merged) section below.

The `dependencies.yaml` file has the following characteristics:

- it is intended to be committed to the root directory of repositories
- it can define matrices that enable the output dependency files to vary according to any arbitrary specification (or combination of specifications), including CUDA version, machine architecture, Python version, etc.
- it contains bifurcated lists of dependencies based on the dependency's purpose (i.e. build, runtime, test, etc.). The bifurcated dependency lists are merged according to the description in the [_How Dependency Lists Are Merged_](#how-dependency-lists-are-merged) section below.

## `dependencies.yaml` Format

Expand All @@ -38,7 +43,7 @@ The top-level `files` key is responsible for determining the following:

- which types of dependency files should be generated (i.e. conda `environment.yaml` files and/or `requirements.txt` files)
- where the generated files should be written to
- which architecture and CUDA version variant files should be generated
- which variant files should be generated (based on the provided matrix)
- which of the dependency lists from the top-level `dependencies` key should be included in the generated files

Here is an example of what the `files` key might look like:
Expand All @@ -49,9 +54,9 @@ files:
generate: both # which dependency file types to generate. required, can be "both", "env", "requirements", or "none"
conda_dir: conda/environments # where to put conda environment.yaml files. optional, defaults to "conda/environments"
requirements_dir: python/cudf # where to put requirements.txt files. optional, but recommended. defaults to "python"
matrix:
cuda_version: ["11.5", "11.6"] # which CUDA version variant files to generate. The CUDA version is included in the output file name
arch: [x86_64] # which architecture version variant files to generate. The architecture is included in the output file name. This value should be the result of running the `arch` command on a given machine.
matrix: # contains an arbitrary set of key/value pairs to determine which dependency files that should be generated. These values are included in the output filename.
cuda: ["11.5", "11.6"] # which CUDA version variant files to generate.
arch: [x86_64] # which architecture version variant files to generate. This value should be the result of running the `arch` command on a given machine.
includes: # a list of keys from the `dependencies` section which should be included in the generated files
- build
- test
Expand All @@ -61,19 +66,20 @@ files:
conda_dir: conda/environments
requirements_dir: python/cudf
matrix:
cuda_version: ["11.5"]
cuda: ["11.5"]
arch: [x86_64]
py: ["3.8"]
includes:
- build
```
The result of the above configuration is that the following dependency files would be generated:
- `conda/environments/all_cuda-11.5_arch-x86_64.yaml`
- `conda/environments/all_cuda-11.6_arch-x86_64.yaml`
- `python/cudf/requirements_all_cuda-11.5_arch-x86_64.txt`
- `python/cudf/requirements_all_cuda-11.6_arch-x86_64.txt`
- `python/cudf/requirements_build_cuda-11.5_arch-x86_64.txt`
- `conda/environments/all_cuda-115_arch-x86_64.yaml`
- `conda/environments/all_cuda-116_arch-x86_64.yaml`
- `python/cudf/requirements_all_cuda-115_arch-x86_64.txt`
- `python/cudf/requirements_all_cuda-116_arch-x86_64.txt`
- `python/cudf/requirements_build_cuda-115_arch-x86_64_py-38.txt`

The `all*.yaml` and `requirements_all*.txt` files would include the contents of the `build`, `test`, and `runtime` dependency lists from the top-level `dependency` key. The `requirements_build*.txt` file would only include the contents of the `build` dependency list from the top-level `dependency` key.

Expand Down Expand Up @@ -101,55 +107,69 @@ channels:
- conda-forge
```

In the absence of a `channels` key, some sensible defaults for RAPIDS will be used (see [constants.py](./src/rapids_dependency_file_generator//constants.py)).
In the absence of a `channels` key, some sensible defaults for RAPIDS will be used (see [constants.py](./src/rapids_dependency_file_generator/constants.py)).

### `dependencies` Key

The top-level `dependencies` key is where the bifurcated dependency lists should be specified. Directly beneath the `dependencies` key are 3 unique keys:

- `conda_and_requirements` - contains dependency lists that are the sames for both conda `environment.yaml` files and `requirements.txt` files
- `conda_and_requirements` - contains dependency lists that are the same for both conda `environment.yaml` files and `requirements.txt` files
- `conda` - contains dependency lists that are specific to conda `environment.yaml` files
- `requirements` - contains dependency lists that are specific to `requirements.txt` files

Each of the above keys has the following children keys:

- `common` - contains dependency lists that are the same across CUDA versions and architectures
- `<arch>-<cuda_version>` (i.e. `x86_64-11.5`) - contains dependency lists that are specific to the respective architecture and CUDA versions
- `common` - contains dependency lists that are the same across all matrix variations
- `specific` - contains dependency lists that are specific to a particular matrix combination

Below these keys are any number of arbitrarily named dependency lists (i.e. `build`, `test`, `libcuml_build`, `cuml_build`, etc.).
The structure of these two keys varies slightly.

The `common` key has children which are simple key-value pairs, where the key is an arbitrary name for the dependency list (i.e. `build`, `test`, `libcuml_build`, `cuml_build`, etc.) and the value is a list of dependencies.

The `specific` key's value is an array of objects. Each object contains a `matrix` key and some arbitrarily named dependency lists (similar to the dependency lists under `common`). The `matrix` key is used to define which matrix combinations from `files.[*].matrix` these dependency lists should apply to. This is elaborated on in [How Dependency Lists Are Merged](#how-dependency-lists-are-merged)

An example of the above structure is exemplified below:

```yaml
dependencies:
conda_and_requirements: # common dependencies between conda environment.yaml & requirements.txt files
common: # common between archs/cudas
common: # common between all matrix variations
build: # arbitrarily named dependency list
- common_build_dep
test: # arbitrarily named dependency list
- pytest
x86_64-11.5: # common dependencies specific to x86_64-11.5
build:
- a_random_x86_115_specific_dep
specific:
# dependencies specific to x86_64 and 11.5
- matrix:
cuda: "11.5"
arch: x86_64
build:
- a_random_x86_115_specific_dep
conda: # dependencies specific to conda environment.yaml files
common:
build:
- cupy
- pip: # supports `pip` key for conda environment.yaml files
- some_random_dep
x86_64-11.5:
build:
- cudatoolkit=11.5
x86_64-11.6:
build:
- cudatoolkit=11.6
specific:
- matrix:
cuda: "11.5"
build:
- cudatoolkit=11.5
- matrix:
cuda: "11.6"
build:
- cudatoolkit=11.6
requirements: # dependencies specific to requirements.txt files
x86_64-11.5:
build:
- another_random_dep=11.5.0
x86_64-11.6:
build:
- another_random_dep=11.6.0
specific:
- matrix:
cuda: "11.5"
build:
- another_random_dep=11.5.0
- matrix:
cuda: "11.6"
build:
- another_random_dep=11.6.0
```
## How Dependency Lists Are Merged
Expand All @@ -165,7 +185,7 @@ files:
conda_dir: conda/environments
requirements_dir: python/cudf
matrix:
cuda_version: ["11.5", "11.6"]
cuda: ["11.5", "11.6"]
arch: [x86_64, arm]
includes:
- build
Expand All @@ -176,12 +196,37 @@ For the `11.5` and `x86_64` matrix combination, the following dependency lists w

- `conda_and_requirements.common.build`
- `conda_and_requirements.common.test`
- `conda_and_requirements.x86_64-11.5.build`
- `conda_and_requirements.x86_64-11.5.test`
- `conda.common.build`
- `conda.common.test`
- `conda.x86_64-11.5.build`
- `conda.x86_64-11.5.test`

Additionally, any `build` and `test` lists from array entries under the `conda.specific` or `conda_and_requirements.specific` keys whose matrix value matches any of the definitions below would also be merged:

```yaml
specific:
- matrix:
cuda: "11.5"
arch: "x86_64"
build:
- some_dep1
test:
- some_dep2
# or
specific:
- matrix:
cuda: "11.5"
build:
- some_dep1
test:
- some_dep2
# or
specific:
- matrix:
arch: "x86_64"
build:
- some_dep1
test:
- some_dep2
```

Merged dependency lists are also deduped.

Expand Down Expand Up @@ -209,8 +254,7 @@ ENV_NAME="cudf_test"
rapids-dependency-file-generator \
--file_key "test" \
--generate "conda" \
--cuda_version "11.5" \
--arch $(arch) > env.yaml
--matrix "cuda=11.5;arch=$(arch)" > env.yaml
mamba env create --file env.yaml
mamba activate "$ENV_NAME"
Expand All @@ -219,6 +263,6 @@ mamba activate "$ENV_NAME"

The `--file_key` argument is passed the `test` key name from the `files` configuration. Additional flags are used to generate a single dependency file. When the CLI is used in this fashion, it will print to `stdout` instead of writing the resulting contents to the filesystem.

The `--file_key`, `--generate`, `--cuda_version`, and `--arch` flags must be used together.
The `--file_key`, `--generate`, and `--matrix` flags must be used together.

Running `rapids-dependency-file-generator -h` will show the most up-to-date CLI arguments.
36 changes: 19 additions & 17 deletions src/rapids_dependency_file_generator/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,18 @@
import argparse


def generate_file_obj(config_file, file_key, file_type, cuda_version, arch):
if not (config_file and file_key and file_type and cuda_version and arch):
def generate_file_obj(config_file, file_key, file_type, matrix):
if not (config_file and file_key and file_type and matrix):
return {}
with open(config_file, "r") as f:
parsed_config = yaml.load(f, Loader=yaml.FullLoader)
matrix = {"cuda_version": [cuda_version], "arch": [arch]}
parsed_config["files"][file_key]["matrix"] = matrix
parsed_config["files"][file_key]["generate"] = file_type
return {file_key: parsed_config["files"][file_key]}


def validate_args(args):
dependent_arg_keys = ["file_key", "generate", "cuda_version", "arch"]
dependent_arg_keys = ["file_key", "generate", "matrix"]
dependent_arg_values = []
for i in range(len(dependent_arg_keys)):
dependent_arg_values.append(getattr(args, dependent_arg_keys[i]))
Expand All @@ -28,6 +27,14 @@ def validate_args(args):
)


def generate_matrix(matrix_arg):
matrix = {}
for matrix_column in matrix_arg.split(";"):
kv_pair = matrix_column.split("=")
matrix[kv_pair[0]] = [kv_pair[1]]
return matrix


def main():
parser = argparse.ArgumentParser(
description=f"Generates dependency files for RAPIDS libraries (version: {version})"
Expand All @@ -38,28 +45,23 @@ def main():
help="path to YAML config file",
)

inclusive_group = parser.add_argument_group("optional, but mutually inclusive")
inclusive_group.add_argument(
codependent_args = parser.add_argument_group("optional, but codependent")
codependent_args.add_argument(
"--file_key",
help="The file key from `dependencies.yaml` to generate",
)
inclusive_group.add_argument(
codependent_args.add_argument(
"--generate",
help="The file type to generate",
choices=[str(x) for x in [GeneratorTypes.CONDA, GeneratorTypes.REQUIREMENTS]],
)
inclusive_group.add_argument(
"--cuda_version",
help="The CUDA version used for generating the output",
)
inclusive_group.add_argument(
"--arch",
help="The architecture used for generating the output",
codependent_args.add_argument(
"--matrix",
help='string representing which matrix combination should be generated, such as `--matrix "cuda=11.5;arch=x86_64"`',
)

args = parser.parse_args()
validate_args(args)
file = generate_file_obj(
args.config, args.file_key, args.generate, args.cuda_version, args.arch
)
matrix = generate_matrix(args.matrix)
file = generate_file_obj(args.config, args.file_key, args.generate, matrix)
dfg(args.config, file)
4 changes: 0 additions & 4 deletions src/rapids_dependency_file_generator/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,3 @@ def __str__(self):
default_conda_dir = "conda/environments"
default_requirements_dir = "python"
default_dependency_file_path = "dependencies.yaml"


def arch_cuda_key_fmt(arch, cuda_version):
return f"{arch}-{cuda_version}"
Loading

0 comments on commit 0576eb0

Please sign in to comment.