NVidia GPU Passthrough to LXC on Proxmox

Configuring NVidia GPU passthrough to a Linux Container (LXC) on Proxmox can greatly enhance the performance of hardware-intensive workloads. With GPU passthrough, the LXC will be able to utilize the full power of the GPU, providing hardware acceleration for demanding tasks such as:

  1. Running Plex media server with hardware-based decoding and encoding support, resulting in smoother and higher-quality video playback.
  2. Machine learning with CUDA, allowing you to train and run deep learning models more efficiently.
  3. High-performance video editing, where the GPU can be used to perform demanding tasks such as video rendering and encoding.
  4. Scientific simulations, which can take advantage of the GPU's parallel processing capabilities to perform complex calculations and simulations.

It's important to note that GPU passthrough requires hardware that supports this technology, and that the host and LXC must be running the same version of the Nvidia driver.

Updates

  • 2024-01-02: Updated lxc configuration to reflect changes to cgroups.

Driver Installation - Host

Download the latest driver off of the Nvidia website (https://www.nvidia.com/en-us/drivers/unix/), take note of the version number.

Install the driver

1root@pve:~# chmod +x NVIDIA-Linux-x86_64-515.76.run
2root@pve:~# ./NVIDIA-Linux-x86_64-515.76.run

Verify you can see your card with nvidia-smi, you should be able to see your GPU. if you want to see the full name run nvidia-smi -L

 1root@pve:~# nvidia-smi
 2Sun Feb 12 10:50:12 2023
 3+-----------------------------------------------------------------------------+
 4| NVIDIA-SMI 515.76       Driver Version: 515.76       CUDA Version: 11.7     |
 5|-------------------------------+----------------------+----------------------+
 6| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 7| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 8|                               |                      |               MIG M. |
 9|===============================+======================+======================|
10|   0  NVIDIA GeForce ...  Off  | 00000000:08:00.0 Off |                  N/A |
11|  0%   47C    P8    19W / 185W |      3MiB /  4096MiB |      0%      Default |
12|                               |                      |                  N/A |
13+-------------------------------+----------------------+----------------------+
14
15+-----------------------------------------------------------------------------+
16| Processes:                                                                  |
17|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
18|        ID   ID                                                   Usage      |
19|=============================================================================|
20|  No running processes found                                                 |
21+-----------------------------------------------------------------------------+
22root@pve:~# nvidia-smi -L
23GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-934aa0ef-64e0-4fb0-7910-46f1b818486b)

LXC Configuration - Host

Retrieve the cgroup assigned to the NVidia GPU, as it will be necessary for configuring the LXC, In my case they were 195 and 511

 1root@pve:~# ls -l /dev/nvidia*
 2crw-rw-rw- 1 root root 195,   0 Jan 10 16:11 /dev/nvidia0
 3crw-rw-rw- 1 root root 195, 255 Jan 10 16:11 /dev/nvidiactl
 4crw-rw-rw- 1 root root 511,   0 Jan 10 16:11 /dev/nvidia-uvm
 5crw-rw-rw- 1 root root 511,   1 Jan 10 16:11 /dev/nvidia-uvm-tools
 6
 7/dev/nvidia-caps:
 8total 0
 9cr-------- 1 root root 236, 1 Jan 10 16:11 nvidia-cap1
10cr--r--r-- 1 root root 236, 2 Jan 10 16:11 nvidia-cap2
11root@pve:~#

Note down the cgroup number from /dev/nvidia0 and /dev/nvidia-uvm

Make sure that the LXC container is not running before making any configuration changes, as any changes made while the container is running will be lost once it is stopped.

Open the lxc configuration file in your preferred text editor. The configuration file is located at /etc/pve/lxc/<lxcid>.conf.

Add the following two lines to the bottom of the lxc configuration file. This will grant the LXC permission to interact with the cgroups 195 and 511, which have been assigned to your GPU.

Update: 2024-01-02: change lxc.cgroup to lxc.cgroup2 to reflect changes to cgroups.

1lxc.cgroup2.devices.allow: c 195:* rwm
2lxc.cgroup2.devices.allow: c 511:* rwm

We now need to mount the devices into the LXC. Add the following five lines to the bottom of the configuration file to complete this step.

1lxc.mount.entry: /dev/nvidia0 dev/nvidia0 none bind,optional,create=file
2lxc.mount.entry: /dev/nvidiactl dev/nvidiactl none bind,optional,create=file
3lxc.mount.entry: /dev/nvidia-uvm dev/nvidia-uvm none bind,optional,create=file
4lxc.mount.entry: /dev/nvidia-modeset dev/nvidia-modeset none bind,optional,create=file
5lxc.mount.entry: /dev/nvidia-uvm-tools dev/nvidia-uvm-tools none bind,optional,create=file

Driver Installation - LXC

Make sure to download the correct version of the Nvidia driver for the LXC. It should match the version that is already installed on the host. Using different versions of the driver between the host and the LXC can cause compatibility issues and result in the GPU passthrough not functioning properly.

Install the driver

1root@app-plex:~# chmod +x NVIDIA-Linux-x86_64-515.76.run
2root@app-plex:~# ./NVIDIA-Linux-x86_64-515.76.run

Verify you can see your card with nvidia-smi, you should be able to see your GPU. if you want to see the full name run nvidia-smi -L

 1root@app-plex:~# nvidia-smi
 2Sun Feb 12 10:50:12 2023
 3+-----------------------------------------------------------------------------+
 4| NVIDIA-SMI 515.76       Driver Version: 515.76       CUDA Version: 11.7     |
 5|-------------------------------+----------------------+----------------------+
 6| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 7| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 8|                               |                      |               MIG M. |
 9|===============================+======================+======================|
10|   0  NVIDIA GeForce ...  Off  | 00000000:08:00.0 Off |                  N/A |
11|  0%   47C    P8    19W / 185W |      3MiB /  4096MiB |      0%      Default |
12|                               |                      |                  N/A |
13+-------------------------------+----------------------+----------------------+
14
15+-----------------------------------------------------------------------------+
16| Processes:                                                                  |
17|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
18|        ID   ID                                                   Usage      |
19|=============================================================================|
20|  No running processes found                                                 |
21+-----------------------------------------------------------------------------+
22root@app-plex:~# nvidia-smi -L
23GPU 0: NVIDIA GeForce GTX 980 (UUID: GPU-934aa0ef-64e0-4fb0-7910-46f1b818486b)

With this setup, you should be able to use the GPU as if the LXC were a bare-metal machine, with some limitations to keep in mind. For example, the maximum number of active decoding/encoding processes is limited to 5

Troubleshooting

If the GPU is not visible after a host reboot, check if the cgroups have changed. If so, update the LXC configuration to reflect the changes. The reason for this occasional issue is not known.

If the GPU is no longer visible in the LXC after an update on the host, it may be necessary to reinstall the Nvidia driver on the host. It's important to ensure that the version of the driver installed on the host matches the version installed in the LXC.