GaiaGPU: 在容器云中共享GPU
类别: Kubernetes 标签: GPU CUDA git GitHub ResourceQuota port-forward Dockerfile kube-scheduler目录
容器技术由于其轻量级和可伸缩的优势而被广泛使用。GPU也因为其强大的并行计算能力被用于应用程序加速。在云计算环境下,容器可能需要一块或者多块GPU计算卡来满足应程序的资源需求,但另一方面,容器独占GPU计算卡常常会带来资源利用率低的问题。因此,对于云计算资源提供商而言,如何解决在多个容器之间共享GPU计算卡是一个很有吸引力的问题。本文中我们提出了一种称为GaiaGPU的方法,用于在容器间共享GPU存储和GPU的计算资源。GaiaGPU会将物理GPU计算卡分割为多个虚拟GPU并且将虚拟GPU按需分配给容器。同时我们采用了弹性资源分配和动态资源分配的方法来提高资源利用率。实验结果表明GaiaGPU平均仅带来1.015%的性能损耗并且能够高效的为容器分配和隔离GPU资源。
编译 GaiaGPU 服务
配置 git 加速
$ git config --global url."https://github.com.cnpmjs.org".insteadOf "https://github.com"
$ vim /etc/profile
export GOPROXY=https://goproxy.cn,direct
export GO111MODULE=on
$ source /etc/profile
vCUDA Controller
$ git clone https://github.com/tkestack/vcuda-controller.git
$ cd vcuda-controller
$ vim Dockerfile
FROM nvidia/cuda:11.4.0-devel-ubuntu20.04 as build
ENV DEBIAN_FRONTEND=noninteractive
$ IMAGE_FILE=wangjunjian/vcuda:latest ./build-img.sh
GPU Manager
安装 CUDA Toolkit
wget https://developer.download.nvidia.com/compute/cuda/11.5.1/local_installers/cuda_11.5.1_495.29.05_linux.run
sudo sh cuda_11.5.1_495.29.05_linux.run
配置环境变量
$ sudo vim /etc/profile
PATH=$PATH:/usr/local/cuda-11.5/bin
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda-11.5/lib64
$ source /etc/profile
编译工程
$ sudo apt install golang
$ git clone https://github.com/tkestack/gpu-manager.git
$ gpu-manager
$ vim build/Dockerfile
ENV GOPROXY=https://goproxy.cn,direct
ENV GO111MODULE=on
$ vim hack/build.sh
readonly local base_img=${BASE_IMG:-"wangjunjian/vcuda:latest"}
$ vim hack/common.sh
readonly IMAGE_FILE=${IMAGE_FILE:-"wangjunjian/gpu-manager"}
$ echo "latest">VERSION
$ make img
GPU Admission
$ git clone https://github.com/tkestack/gpu-admission.git
$ gpu-admission
$ vim Dockerfile
ENV GOPROXY=https://goproxy.cn,direct
ENV GO111MODULE=on
$ IMAGE=wangjunjian/gpu-admission:latest make img
部署 GaiaGPU 服务
GPU Manager
创建 service account 和 clustercole
kubectl create sa gpu-manager -n kube-system
kubectl create clusterrolebinding gpu-manager-role --clusterrole=cluster-admin --serviceaccount=kube-system:gpu-manager
给 GPU 节点打标签
kubectl label node gpu1 nvidia-device-enable=enable
部署 GPU Manager
wget https://raw.githubusercontent.com/tkestack/gpu-manager/master/gpu-manager.yaml
sed -i 's/tkestack\/gpu-manager:1\.0\.3/wangjunjian\/gpu-manager:latest/g' gpu-manager.yaml
kubectl apply -f gpu-manager.yaml
部署指标的服务
kubectl apply -f gpu-manager-svc.yaml
GPU Admission
在控制平面创建 gpu-admission.yaml 文件
sudo vim /etc/kubernetes/manifests/gpu-admission.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: gpu-admission
tier: control-plane
name: gpu-admission
namespace: kube-system
spec:
containers:
- command:
- /usr/bin/gpu-admission
- --address=0.0.0.0:3456
- --logtostderr=true
- --kubeconfig=/etc/kubernetes/scheduler.conf
image: wangjunjian/gpu-admission:latest
imagePullPolicy: IfNotPresent
name: gpu-admission
resources:
requests:
cpu: 100m
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
status: {}
编写 GPU 调度策略文件
sudo vim /etc/kubernetes/scheduler-policy-config.json
{
"kind": "Policy",
"apiVersion": "v1",
"predicates": [
{
"name": "PodFitsHostPorts"
},
{
"name": "PodFitsResources"
},
{
"name": "NoDiskConflict"
},
{
"name": "MatchNodeSelector"
},
{
"name": "HostName"
}
],
"extenders": [
{
"urlPrefix": "http://172.16.33.157:3456/scheduler",
"apiVersion": "v1beta1",
"filterVerb": "predicates",
"enableHttps": false,
"nodeCacheCapable": false
}
],
"hardPodAffinitySymmetricWeight": 10,
"alwaysCheckAllPredicates": false
}
配置控制平台的 kube-scheduler.yaml
sudo vim /etc/kubernetes/manifests/kube-scheduler.yaml
apiVersion: v1
kind: Pod
metadata:
creationTimestamp: null
labels:
component: kube-scheduler
tier: control-plane
name: kube-scheduler
namespace: kube-system
spec:
containers:
- command:
- kube-scheduler
- --authentication-kubeconfig=/etc/kubernetes/scheduler.conf
- --authorization-kubeconfig=/etc/kubernetes/scheduler.conf
- --bind-address=127.0.0.1
- --kubeconfig=/etc/kubernetes/scheduler.conf
- --leader-elect=true
- --policy-config-file=/etc/kubernetes/scheduler-policy-config.json
- --use-legacy-policy-config=true
- --port=0
image: registry.aliyuncs.com/google_containers/kube-scheduler:v1.21.5
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-scheduler
resources:
requests:
cpu: 100m
startupProbe:
failureThreshold: 24
httpGet:
host: 127.0.0.1
path: /healthz
port: 10259
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/kubernetes/scheduler.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/kubernetes/scheduler-policy-config.json
name: gpu-scheduler-policy
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/kubernetes/scheduler.conf
type: FileOrCreate
name: kubeconfig
- hostPath:
path: /etc/kubernetes/scheduler-policy-config.json
type: FileOrCreate
name: gpu-scheduler-policy
status: {}
- –policy-config-file=/etc/kubernetes/scheduler-policy-config.json
- –use-legacy-policy-config=true
编辑 clusterroles system:kube-scheduler
kubectl edit clusterroles system:kube-scheduler
- apiGroups:
- ""
resources:
- pods
verbs:
- delete
- get
- patch
- list
- watch
- 增加
- patch
查看节点信息
kubectl describe node gpu1
Name: gpu1
Labels: beta.kubernetes.io/arch=amd64
nvidia-device-enable=enable
Capacity:
cpu: 64
ephemeral-storage: 575261800Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263781448Ki
pods: 110
tencent.com/vcuda-core: 400
tencent.com/vcuda-memory: 236
Allocatable:
cpu: 64
ephemeral-storage: 530161274003
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 263679048Ki
pods: 110
tencent.com/vcuda-core: 400
tencent.com/vcuda-memory: 236
- tencent.com/vcuda-core 1张GPU卡等于100,400代表4张GPU卡。
- tencent.com/vcuda-memory 1个单位代表256Mi,一张GPU卡的计算方法是:16G(16×1000×1000×1000)/256M(256×1024×1024)=59,4×59=236。
资源配额
创建 ResourceQuota
vim resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: resource-quota
spec:
hard:
requests.cpu: 100m
requests.memory: 100Mi
requests.tencent.com/vcuda-core: 100
requests.tencent.com/vcuda-memory: 59
limits.cpu: 10
limits.memory: 10Gi
limits.tencent.com/vcuda-core: 100
limits.tencent.com/vcuda-memory: 59
kubectl apply -f resource-quota.yaml
查看资源配额的描述
kubectl describe resourcequotas resource-quota
Name: resource-quota
Namespace: default
Resource Used Hard
-------- ---- ----
limits.cpu 0 10
limits.memory 0 10Gi
limits.tencent.com/vcuda-core 0 100
limits.tencent.com/vcuda-memory 0 59
requests.cpu 0 100m
requests.memory 0 100Mi
requests.tencent.com/vcuda-core 0 100
requests.tencent.com/vcuda-memory 0 59
部署 GPU 应用
编写 GPU 应用的 yaml 文件
vim gpu-quota.yaml
apiVersion: v1
kind: Pod
metadata:
name: vcuda
annotations:
tencent.com/vcuda-core-limit: "10"
spec:
restartPolicy: Never
containers:
- image: tensorflow/tensorflow:2.7.0-gpu
name: tensorflow
command: ["/bin/sh", "-c", "sleep 86400"]
resources:
requests:
cpu: 100m
memory: 100Mi
tencent.com/vcuda-core: 10
tencent.com/vcuda-memory: 4
limits:
cpu: 1
memory: 1Gi
tencent.com/vcuda-core: 10
tencent.com/vcuda-memory: 4
- requests 的 tencent.com/vcuda-core 必须等于 limits 的 tencent.com/vcuda-core
- requests 的 tencent.com/vcuda-memory 必须等于 limits 的 tencent.com/vcuda-memory
kubectl apply -f gpu-quota.yaml
查看资源配额的使用
kubectl describe resourcequotas resource-quota
Name: resource-quota
Namespace: default
Resource Used Hard
-------- ---- ----
limits.cpu 1 10
limits.memory 1Gi 10Gi
limits.tencent.com/vcuda-core 0 100
limits.tencent.com/vcuda-memory 0 59
requests.cpu 100m 100m
requests.memory 100Mi 100Mi
requests.tencent.com/vcuda-core 10 100
requests.tencent.com/vcuda-memory 4 59
GPU 的应用
GPU共享
apiVersion: v1
kind: Pod
metadata:
name: vcuda20
annotations:
tencent.com/vcuda-core-limit: "20"
spec:
restartPolicy: Never
containers:
- image: tensorflow/tensorflow:2.7.0-gpu
name: tensorflow
command: ["/bin/sh", "-c", "sleep 86400"]
resources:
requests:
tencent.com/vcuda-core: 20
tencent.com/vcuda-memory: 12
limits:
tencent.com/vcuda-core: 20
tencent.com/vcuda-memory: 12
GPU单卡
apiVersion: v1
kind: Pod
metadata:
name: vcuda100
annotations:
tencent.com/vcuda-core-limit: "100"
spec:
restartPolicy: Never
containers:
- image: tensorflow/tensorflow:2.7.0-gpu
name: tensorflow
command: ["/bin/sh", "-c", "sleep 86400"]
resources:
requests:
tencent.com/vcuda-core: 100
tencent.com/vcuda-memory: 59
limits:
tencent.com/vcuda-core: 100
tencent.com/vcuda-memory: 59
GPU多卡
apiVersion: v1
kind: Pod
metadata:
name: vcuda200
annotations:
tencent.com/vcuda-core-limit: "200"
spec:
restartPolicy: Never
containers:
- image: tensorflow/tensorflow:2.7.0-gpu
name: tensorflow
command: ["/bin/sh", "-c", "sleep 86400"]
resources:
requests:
tencent.com/vcuda-core: 200
tencent.com/vcuda-memory: 128
limits:
tencent.com/vcuda-core: 200
tencent.com/vcuda-memory: 128
指标服务
部署指标服务
cd gpu-manager
kubectl apply -f gpu-manager-svc.yaml
查看指标服务的IP和端口
$ kubectl -n kube-system get endpoints gpu-manager-metric
NAME ENDPOINTS AGE
gpu-manager-metric 10.36.0.0:5678 15h
调试(获取指标数据)
本地网络端口转发到 Pod
kubectl port-forward svc/gpu-manager-metric -n kube-system 5678:5678
打开一个新的终端,获取 GPU 指标数据的统计。
curl http://127.0.0.1:5678/metric
# HELP container_gpu_memory_total gpu memory usage in MiB
# TYPE container_gpu_memory_total gauge
container_gpu_memory_total{container_name="tensorflow",gpu_memory="gpu0",namespace="default",node="gpu1",pod_name="vcuda"} 0
container_gpu_memory_total{container_name="tensorflow",gpu_memory="total",namespace="default",node="gpu1",pod_name="vcuda"} 0
# HELP container_gpu_utilization gpu utilization
# TYPE container_gpu_utilization gauge
container_gpu_utilization{container_name="tensorflow",gpu="gpu0",namespace="default",node="gpu1",pod_name="vcuda"} 0
container_gpu_utilization{container_name="tensorflow",gpu="total",namespace="default",node="gpu1",pod_name="vcuda"} 0
# HELP container_request_gpu_memory request of gpu memory in MiB
# TYPE container_request_gpu_memory gauge
container_request_gpu_memory{container_name="tensorflow",namespace="default",node="gpu1",pod_name="vcuda",req_of_gpu_memory="total"} 1024
# HELP container_request_gpu_utilization request of gpu utilization
# TYPE container_request_gpu_utilization gauge
container_request_gpu_utilization{container_name="tensorflow",namespace="default",node="gpu1",pod_name="vcuda",req_of_gpu="total"} 0.10000000149011612
直接在容器内部访问
kubectl exec -it vcuda -- curl http://10.36.0.0:5678/metric
资源配额
设置资源配额
vim resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: resource-quota
spec:
hard:
requests.cpu: 100m
requests.memory: 100Mi
requests.tencent.com/vcuda-core: 100
requests.tencent.com/vcuda-memory: 59
limits.cpu: 10
limits.memory: 10Gi
limits.tencent.com/vcuda-core: 100
limits.tencent.com/vcuda-memory: 59
在当前名字空间中部署GPU资源的配额
kubectl apply -f resource-quota.yaml
部署应用,将会受到上面设置资源配额的限制。
查看资源配额使用详情
kubectl describe resourcequotas resource-quota
Name: resource-quota
Namespace: default
Resource Used Hard
-------- ---- ----
limits.cpu 1 10
limits.memory 1Gi 10Gi
limits.tencent.com/vcuda-core 0 100
limits.tencent.com/vcuda-memory 0 59
requests.cpu 100m 100m
requests.memory 100Mi 100Mi
requests.tencent.com/vcuda-core 10 100
requests.tencent.com/vcuda-memory 4 59