2 篇文章带有标签 “Nvidia”

Kubernetes中的GPU共享

  1. 添加策略配置文件
- --policy-config-file=/etc/kubernetes/scheduler-policy-config.json
  1. 将卷挂载添加到Pod
- mountPath: /etc/kubernetes/scheduler-policy-config.json
  name: scheduler-policy-config
  readOnly: true
- hostPath:
      path: /etc/kubernetes/scheduler-policy-config.json
      type: FileOrCreate
  name: scheduler-policy-config

最终修改为 apiVersion: v1 kind: Pod metadata: creationTimestamp: null labels: component: kube-scheduler tier: control-plane name: kube-scheduler namespace: kube-system spec: containers: - command: - kube-scheduler - --authentication-kubeconfig=/etc/kubernetes/scheduler.

Install NVIDIA device plugin for Kubernetes

  1. 重启服务
sudo systemctl restart docker
  1. 使用Helm安装
helm install --generate-name nvdp/nvidia-device-plugin

失败(gpu2节点的Docker没有配置好) $ kubectl logs -n kube-system nvidia-device-plugin-1614240442-wfh6c 2021/02/26 07:03:48 Loading NVML 2021/02/26 07:03:48 Failed to initialize NVML: could not load NVML library. 2021/02/26 07:03:48 If this is a GPU node, did you set the docker default runtime to nvidia? 2021/02/26 07:03:48 You can check the prerequisites at: https://github.com/NVIDIA/k8s-device-plugin#prerequisites 2021/02/26 07:03:48 You can learn how to set the runtime at: https://github.