目录

构建 ONNXRuntime-GPU 镜像

编写 requirements.txt

$ vim requirements.txt
flask
connexion[swagger-ui]
connexion
gunicorn
numpy
opencv-python
scikit-image
psutil
pynvml
onnxruntime-gpu

编写 Dockerfile

需要带 cudnn 库的 CUDA 作为基镜像

$ vim Dockerfile
FROM nvidia/cuda:11.4.0-cudnn8-runtime-ubuntu20.04
LABEL maintainer="wang-junjian@qq.com"

RUN rm /etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/nvidia-ml.list && \
    sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list && \
    apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get install libgl1-mesa-glx libglib2.0-dev -y && \
    apt-get install python3 python3-pip -y

COPY requirements.txt /tmp/requirements.txt

RUN pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/ && \
    pip3 install --upgrade --default-timeout=100000 -r /tmp/requirements.txt
RUN ln -s /usr/bin/python3 /usr/bin/python

构建镜像

docker build -t gouchicao/onnxruntime:1.10-cuda11.4-ubuntu20.04 .

测试镜像

运行容器

docker run --runtime=nvidia --rm -it -v /home/wjj/inference-serving:/inference-serving gouchicao/onnxruntime:1.10-cuda11.4-ubuntu20.04 bash

运行 Python 脚本

$ python
import onnxruntime as ort
model_path='/inference-serving/models/running/onnx_files/yolov5_r4_person.onnx'
providers = [
    ('CUDAExecutionProvider', {
        'device_id': 0,
        'arena_extend_strategy': 'kNextPowerOfTwo',
        'gpu_mem_limit': 2 * 1024 * 1024 * 1024,
        'cudnn_conv_algo_search': 'EXHAUSTIVE',
        'do_copy_in_default_stream': True,
    }),
    'CPUExecutionProvider',
]
session = ort.InferenceSession(model_path, providers=providers)

发布镜像

docker push gouchicao/onnxruntime:1.10-cuda11.4-ubuntu20.04

基于 ONNXRuntime-GPU 镜像构建推理服务

编写 Dockerfile

$ vim Dockerfile
FROM gouchicao/onnxruntime:1.10-cuda11.4-ubuntu20.04
LABEL maintainer="wang-junjian@qq.com"

WORKDIR /inference-serving

ADD . ./

ENV CUDA_VISIBLE_DEVICES=0
ENV MODEL_YAML_FILENAME=yolov5_person.yaml

ARG PORT=6666
ENV INFERENCE_SERVING_PORT=$PORT
EXPOSE $PORT

CMD ["python", "app/main.py"]

构建镜像

docker build -t inference-serving .

测试推理服务

docker run --runtime=nvidia --rm -t -p 12345:6666 -e MODEL_YAML_FILENAME=yolov5_sign.yaml inference-serving