2 篇文章带有标签 “device-plugin”

Kubernetes中的GPU共享

构建应用

Scheduler Extender

git clone https://github.com/AliyunContainerService/gpushare-scheduler-extender.git && cd gpushare-scheduler-extender
docker build -t gouchicao/gpushare-scheduler-extender .

Device Plugin

git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git && cd gpushare-device-plugin
docker build -t gouchicao/gpushare-device-plugin .

Kubectl Extension

wget https://github.com/AliyunContainerService/gpushare-device-plugin/releases/download/v0.3.0/kubectl-inspect-gpushare

安装 在控制平面中部署 GPU 共享调度程序扩展器 cd /etc/kubernetes sudo wget https://raw.

Install NVIDIA device plugin for Kubernetes

配置每个NVIDIA GPU节点上的Docker

  1. 增加"default-runtime": "nvidia"
$ sudo vim /etc/docker/daemon.json
{
    "registry-mirrors": ["https://75oltije.mirror.aliyuncs.com"],
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
  1. 重启服务
sudo systemctl restart docker

设置每个节点的污点

GPU 节点

kubectl taint node gpu1 nvidia.com/gpu:NoSchedule
kubectl taint node gpu2 nvidia.com/gpu:NoSchedule

CPU 节点 kubectl taint node ln2 node-type=production:NoSchedule kubectl ta