From 5262b5bfc094885446134e9326a7844b93e6a4bd Mon Sep 17 00:00:00 2001 From: shalberd <21118431+shalberd@users.noreply.github.com> Date: Tue, 27 Aug 2024 15:37:04 +0200 Subject: [PATCH] document system-level environment variables for file-based pipeline nodes at level Jupyterlab, KFP or Airflow runtime, or both Signed-off-by: shalberd <21118431+shalberd@users.noreply.github.com> --- docs/source/index.rst | 1 + .../env-variables-file-based-nodes.md | 65 +++++++++++++++++++ 2 files changed, 66 insertions(+) create mode 100644 docs/source/user_guide/env-variables-file-based-nodes.md diff --git a/docs/source/index.rst b/docs/source/index.rst index 2feff88ae..4c29ae37a 100644 --- a/docs/source/index.rst +++ b/docs/source/index.rst @@ -51,6 +51,7 @@ Elyra is a set of AI-centric extensions to JupyterLab Notebooks. user_guide/pipeline-components.md user_guide/best-practices-custom-pipeline-components user_guide/best-practices-file-based-nodes.md + user_guide/env-variables-file-based-nodes.md user_guide/enhanced-script-support.md user_guide/code-snippets.md diff --git a/docs/source/user_guide/env-variables-file-based-nodes.md b/docs/source/user_guide/env-variables-file-based-nodes.md new file mode 100644 index 000000000..5ee36d7e8 --- /dev/null +++ b/docs/source/user_guide/env-variables-file-based-nodes.md @@ -0,0 +1,65 @@ + +## System-level environment variables used in file-based pipeline nodes + +[Generic pipelines and typed pipelines](pipelines.md) support natively file-based nodes for Jupyter notebooks, Python scripts, and R scripts. In order to support heterogeneous execution - that is making them runnable to your requiremenents in any runtime environment (JupyterLab, Kubeflow Pipelines, and Apache Airflow) - follow the documentation on environment variables listed below. + +There are system-level environment variables for two types of scopes: +- Jupyterlab pipeline generation and validation (PipelineProcessor) +- Runtime image task (Airflow) or component (KFP) execution of file-based node Jupyter notebooks, Python scripts, and R scripts (bootstrapper pipeline run) + +This page lists the environment variables; their scope, defaults, and background concept. + +### `ELYRA_ENABLE_PIPELINE_INFO` + +Scope: Jupyterlab PipelineProcessor and runtime image task execution in runtime environment +Impact: Produces a formatted log INFO message used entirely for support purposes. +Having single-line entries in the log (no embedded newlines) with pipeline name, operation_name, action and Duration makes it easy to cross-evaluate logs across log files. + +Background: During processing of Pipelines in jupyterlab, i.e. before execution when logging pipeline info during submitting the pipeline, processing later Pipeline operation dependencies, +submitting the Pipeline to Git, and exporting the Pipeline as KFP Python or yaml or Airflow DAG Python code (not needed with local / LocalPipelineProcessor). + +Also used in runtime-specific container environment in bootstrapper.py python code for execution run logging operation info of KFP Pipeline components and Airflow Pipeline / DAG Tasks to +log KFP component / Airflow task execution info when execution of the script starts, dependencies are processed, and the script execution operation ends. + +Default: We recommend leaving this at its default "true", i.e. no explicit setting of this environment variable necessary. +If you want to set `ELYRA_ENABLE_PIPELINE_INFO` to `false`, you can do so in either +- Jupyterlab at runtime +- Statically baked into Jupyterlab container definition for use in Jupyterlab container build +- Pipeline Editor at Pipeline Properties - Generic Node Defaults - Environment Variables or at Node Properties - Additional Properties - Environment Variables +- Statically baked into Jupyterlab container definition for use in KFP or Airflow runtime image container build + +### `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` + +Scope: Runtime image task (Airflow) or component (KFP) execution of file-based node Jupyter notebooks, Python scripts, and R scripts (bootstrapper pipeline run). Relevant for pipeline runs in KFP components or Airflow DAGs. +Background: +- Puts script execution Output / STDOUT into a .log file for Python and R Scripts. +- Puts script execution Output / STDOUT into a notebookname-output.ipynb and notebookname-Output.html file. + +Impact: Controls whether the files are then uploaded to the Elyra S3 bucket, if this environment variable is not set at pipeline, node, or runtime container level. + +Default: `true` if not specified, i.e. no explicit setting of this environment variable necessary. + +Background: +If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**, +for example because you capture and store logs with central KFP, Airflow, K8S / Openshift mechanisms, +set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`**. + +If you want to set `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` to `false`, you can do so in either +- Pipeline Editor at Pipeline Properties - Generic Node Defaults - Environment Variables or at Node Properties - Additional Properties - Environment Variables +- Statically baked into Jupyterlab container definition for use in KFP or Airflow runtime image container build \ No newline at end of file