Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

document system-level environment variables for file-based pipeline nodes #3243

Merged
merged 1 commit into from
Sep 2, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ Elyra is a set of AI-centric extensions to JupyterLab Notebooks.
user_guide/pipeline-components.md
user_guide/best-practices-custom-pipeline-components
user_guide/best-practices-file-based-nodes.md
user_guide/env-variables-file-based-nodes.md
user_guide/enhanced-script-support.md
user_guide/code-snippets.md

Expand Down
65 changes: 65 additions & 0 deletions docs/source/user_guide/env-variables-file-based-nodes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
<!--
{% comment %}
Copyright 2018-2023 Elyra Authors

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->
## System-level environment variables used in file-based pipeline nodes

[Generic pipelines and typed pipelines](pipelines.md) support natively file-based nodes for Jupyter notebooks, Python scripts, and R scripts. In order to support heterogeneous execution - that is making them runnable to your requiremenents in any runtime environment (JupyterLab, Kubeflow Pipelines, and Apache Airflow) - follow the documentation on environment variables listed below.

There are system-level environment variables for two types of scopes:
- Jupyterlab pipeline generation and validation (PipelineProcessor)
- Runtime image task (Airflow) or component (KFP) execution of file-based node Jupyter notebooks, Python scripts, and R scripts (bootstrapper pipeline run)

This page lists the environment variables; their scope, defaults, and background concept.

### `ELYRA_ENABLE_PIPELINE_INFO`

Scope: Jupyterlab PipelineProcessor and runtime image task execution in runtime environment
Impact: Produces a formatted log INFO message used entirely for support purposes.
Having single-line entries in the log (no embedded newlines) with pipeline name, operation_name, action and Duration makes it easy to cross-evaluate logs across log files.

Background: During processing of Pipelines in jupyterlab, i.e. before execution when logging pipeline info during submitting the pipeline, processing later Pipeline operation dependencies,
submitting the Pipeline to Git, and exporting the Pipeline as KFP Python or yaml or Airflow DAG Python code (not needed with local / LocalPipelineProcessor).

Also used in runtime-specific container environment in bootstrapper.py python code for execution run logging operation info of KFP Pipeline components and Airflow Pipeline / DAG Tasks to
log KFP component / Airflow task execution info when execution of the script starts, dependencies are processed, and the script execution operation ends.

Default: We recommend leaving this at its default "true", i.e. no explicit setting of this environment variable necessary.
If you want to set `ELYRA_ENABLE_PIPELINE_INFO` to `false`, you can do so in either
- Jupyterlab at runtime
- Statically baked into Jupyterlab container definition for use in Jupyterlab container build
- Pipeline Editor at Pipeline Properties - Generic Node Defaults - Environment Variables or at Node Properties - Additional Properties - Environment Variables
- Statically baked into Jupyterlab container definition for use in KFP or Airflow runtime image container build

### `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`

Scope: Runtime image task (Airflow) or component (KFP) execution of file-based node Jupyter notebooks, Python scripts, and R scripts (bootstrapper pipeline run). Relevant for pipeline runs in KFP components or Airflow DAGs.
Background:
- Puts script execution Output / STDOUT into a .log file for Python and R Scripts.
- Puts script execution Output / STDOUT into a notebookname-output.ipynb and notebookname-Output.html file.

Impact: Controls whether the files are then uploaded to the Elyra S3 bucket, if this environment variable is not set at pipeline, node, or runtime container level.

Default: `true` if not specified, i.e. no explicit setting of this environment variable necessary.

Background:
If you prefer to use S3-compatible storage for transfer of files between pipeline steps only and **not for logging information / run output of R, Python and Jupyter Notebook files**,
for example because you capture and store logs with central KFP, Airflow, K8S / Openshift mechanisms,
set env var **`ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3`** to **`false`**.

If you want to set `ELYRA_GENERIC_NODES_ENABLE_SCRIPT_OUTPUT_TO_S3` to `false`, you can do so in either
- Pipeline Editor at Pipeline Properties - Generic Node Defaults - Environment Variables or at Node Properties - Additional Properties - Environment Variables
- Statically baked into Jupyterlab container definition for use in KFP or Airflow runtime image container build
Loading