You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the NVIDIA Device Plugin Helm chart (v0.16.1), when using a ConfigMap strategy, all nodes are incorrectly assigned elevated privileges, regardless of their MIG strategy configuration. This is due to a flaw in the template logic that prevents the actual content of the ConfigMap from being evaluated.
Current Behavior
When deploying the chart with migStrategy: none and no ConfigMap, the correct security context is applied.
However, when using a ConfigMap strategy, even with a default migStrategy: none:
All nodes receive elevated privileges, regardless of their actual MIG strategy.
Expected Behavior
Nodes should receive the appropriate security context based on their actual MIG strategy configuration, especially those using the default migStrategy: none.
Impact
This issue unnecessarily elevates privileges on all nodes when using a ConfigMap strategy, potentially compromising security, particularly in mixed GPU environments.
Steps to Reproduce
Set up a cluster with both MIG and vGPU nodes.
Deploy the NVIDIA Device Plugin using any ConfigMap strategy.
Observe that all nodes receive elevated privileges.
Code Analysis
The issue begins in the nvidia-device-plugin.allPossibleMigStrategiesAreNone template. Here's the relevant part:
{{- if .Values.migStrategy -}}{{- if ne .Values.migStrategy "none" -}}{{- $result = false -}}{{- end -}}{{- else if eq (include "nvidia-device-plugin.hasConfigMap" .) "true" -}}{{- $result = false -}}{{- else -}}{{- range $name, $contents := $.Values.config.map -}}{{- $config := $contents | fromYaml -}}{{- if $config.flags -}}{{- if ne $config.flags.migStrategy "none" -}}{{- $result = false -}}{{- end -}}{{- end -}}{{- end -}}{{- end -}}
{{/*Check if there is a ConfigMap in use or not*/}}{{- define "nvidia-device-plugin.hasConfigMap" -}}{{- $result := false -}}{{- if ne (include "nvidia-device-plugin.configMapName" .) "" -}}{{- $result = true -}}{{- end -}}{{- $result -}}{{- end }}
where the configMapName function is defined as:
{{- define "nvidia-device-plugin.configMapName" -}}{{- $result := "" -}}{{- if .Values.config.name -}}{{- $result = .Values.config.name -}}{{- else if not (empty .Values.config.map) -}}{{- $result = printf "%s-%s" (include "nvidia-device-plugin.fullname" .) "configs" -}}{{- end -}}{{- $result -}}{{- end -}}
Root Cause
The hasConfigMap function always returns true if there's any content in config.map within the values.yaml, without examining its contents.
This causes the template to set $result = false whenever a ConfigMap is present, regardless of its content.
The actual checking of the ConfigMap's content (the range loop) is never reached when a ConfigMap is defined.
Additional Context
This issue affects all deployments using the ConfigMap strategy, regardless of the actual MIG configurations.
The current implementation makes it impossible to use the multi-configmap strategy without applying elevated privileges to all nodes.
There is also some misconfigured logic in the nvidia-device-plugin.allPossibleMigStrategiesAreNone template. Here's the relevant part:
{{- else -}}{{- range $name, $contents := $.Values.config.map -}}{{- $config := $contents | fromYaml -}}{{- if $config.flags -}}{{- if ne $config.flags.migStrategy "none" -}}{{- $result = false -}}{{- end -}}{{- end -}}{{- end -}}{{- end -}}
This code:
Iterates over all configurations in the ConfigMap.
Parses each configuration from YAML.
Checks if the flags key exists.
If any configuration has migStrategy not equal to "none", it sets $result to false.
The problem is that this logic doesn't distinguish between the default configuration and others. It treats all configurations equally. As a result:
Even if the default configuration has migStrategy: none,
The presence of any other configuration with a different migStrategy (like mig-single in the example)
Causes $result to be set to false, leading to elevated privileges for all nodes.
Potential action items
Fundamentally, the issue begins with the described logical misconfiguration, however there remains an underlying issue due to a single Daemonset being generated for all configurations. This subsequently applies the most permissive set of contexts regardless if the node only contains vGPU's. A method to mitigate this would include a unique Daemonset per configuration, that applies the appropriate permissions based on the gpu type. However, this would require a non trivial overhaul to the helm chart.
The text was updated successfully, but these errors were encountered:
Description
In the NVIDIA Device Plugin Helm chart (v0.16.1), when using a ConfigMap strategy, all nodes are incorrectly assigned elevated privileges, regardless of their MIG strategy configuration. This is due to a flaw in the template logic that prevents the actual content of the ConfigMap from being evaluated.
Current Behavior
migStrategy: none
and no ConfigMap, the correct security context is applied.migStrategy: none
:Expected Behavior
Nodes should receive the appropriate security context based on their actual MIG strategy configuration, especially those using the default
migStrategy: none
.Impact
This issue unnecessarily elevates privileges on all nodes when using a ConfigMap strategy, potentially compromising security, particularly in mixed GPU environments.
Steps to Reproduce
Code Analysis
The issue begins in the
nvidia-device-plugin.allPossibleMigStrategiesAreNone
template. Here's the relevant part:The initial issue is in this section:
The
hasConfigMap
function is defined as:where the configMapName function is defined as:
Root Cause
hasConfigMap
function always returns true if there's any content inconfig.map
within the values.yaml, without examining its contents.$result = false
whenever a ConfigMap is present, regardless of its content.range
loop) is never reached when a ConfigMap is defined.Additional Context
This code:
The problem is that this logic doesn't distinguish between the default configuration and others. It treats all configurations equally. As a result:
Potential action items
Fundamentally, the issue begins with the described logical misconfiguration, however there remains an underlying issue due to a single Daemonset being generated for all configurations. This subsequently applies the most permissive set of contexts regardless if the node only contains vGPU's. A method to mitigate this would include a unique Daemonset per configuration, that applies the appropriate permissions based on the gpu type. However, this would require a non trivial overhaul to the helm chart.
The text was updated successfully, but these errors were encountered: