k3s: GPU Passthrough for Nvidia / AMD / Intel #288037

superherointj · 2024-02-11T13:36:50Z

This issue is for tracking GPU pass through in K3s.

K3s supports GPU pass through but not in NixOS K3s (last time I tried).

I don't know if it is a solved issue. I think it is not, but:

there are notes on how to do GPU passthrough here:
- https://wiki.nixos.org/wiki/K3s
  (Let me know if it works for you. I will test too. And leave my notes here.)

Last time I tried, I had issues with nix paths, ldcache in Nvidia driver. I got lost in the process. I will keep updating this issue until:
GPU pass through is:

supported by NixOS K3s.
properly documented
integrated to NixOS K3s module. (Ideally GPU pass through should be a toggle.)

Other references:

SomeoneSerge · 2024-02-13T13:00:41Z

Related: #278969 #284507

OlfillasOdikno · 2024-05-04T02:50:14Z

I was unable to get the official NVIDIA device plugin to work, since all the heavy lifting is already done in containerd and generating cdi json, I created a device plugin that uses the cdi json and instructs kubernetes to inject the device.
Tested it on a NVIDIA 3060.
https://github.com/OlfillasOdikno/generic-cdi-plugin

ahirner · 2024-06-23T11:41:28Z

@OlfillasOdikno thanks, I use your plugin as well for now. I had problems that some containers didn't see libnvml.so.1, nor the generated CDIs.

GPU pass through is:

Question regarding scope: does this issue inlcude shared GPU use? I'm not sure how involved it is.

Goorzhel · 2024-08-07T04:11:51Z

After four months of dead ends and failed hacks, I've arrived at this configuration for my k3s node and its GeForce 3070:

In NixOS

Bodge LD_LIBRARY_PATH into the CDI generator's environment.
Ensure /run/opengl is available.
Enable CDI in k3s' bundled containerd.

In Kubernetes

Install @OlfillasOdikno's CDI plugin (thank you!)
Add spec.resources.limits."nvidia.com/gpu-0"=1 to the relevant pod specs.
Enjoy massively-improved video transcoding, etc.

Stray notes

containerd/containerd@c8e8a093c will remove the need for NixOS step 3 whenever k3s bundles a version with that commit.
Like OlfillasOdikno, I hit a dead end with Nvidia's plugin.
Relevant software versions:
- k3s 1.30.2+k3s2,
- NixOS 24.05, and
- Nvidia 550.78.

Nvidia monoculture aside, I also have a Radeon RX 7800 in my desktop. A brief web-search reveals a plugin for AMD GPUs, but I need more time to look into that.

Goorzhel · 2024-08-13T04:37:03Z

A brief web-search reveals a plugin for AMD GPUs, but I need more time to look into that.

Little did I know that is the official AMD plugin. I made a one-node k3s cluster of my desktop and installed the plugin's Helm chart—and that's all I needed. One unit of amd.com/gpu became available, without any abstruse hacks like my Nvidia odyssey above.

Some caveats:

Like the Nvidia device plugin, the AMD one hands out whole-GPU leases by default. Unlike the Nvidia plugin, this is non-configurable.
No news on CDI support yet.
On a Jellyfin pod, with VAAPI selected, I got ~20 fps transcoding 2160p HEVC to AVC—in memory. The same video on a magnetic ZFS pool in my main cluster went through NVENC at ~200 fps.

EDIT: Same story with Intel GPUs. All one needs is the device plugin, which I've been using for more than a year.

superherointj added the 6.topic: k3s label May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

k3s: GPU Passthrough for Nvidia / AMD / Intel #288037

k3s: GPU Passthrough for Nvidia / AMD / Intel #288037

superherointj commented Feb 11, 2024 •

edited by Mic92

Loading

SomeoneSerge commented Feb 13, 2024

OlfillasOdikno commented May 4, 2024

ahirner commented Jun 23, 2024

Goorzhel commented Aug 7, 2024 •

edited

Loading

Goorzhel commented Aug 13, 2024 •

edited

Loading

k3s: GPU Passthrough for Nvidia / AMD / Intel #288037

k3s: GPU Passthrough for Nvidia / AMD / Intel #288037

Comments

superherointj commented Feb 11, 2024 • edited by Mic92 Loading

SomeoneSerge commented Feb 13, 2024

OlfillasOdikno commented May 4, 2024

ahirner commented Jun 23, 2024

Goorzhel commented Aug 7, 2024 • edited Loading

In NixOS

In Kubernetes

Stray notes

Goorzhel commented Aug 13, 2024 • edited Loading

superherointj commented Feb 11, 2024 •

edited by Mic92

Loading

Goorzhel commented Aug 7, 2024 •

edited

Loading

Goorzhel commented Aug 13, 2024 •

edited

Loading