10 篇文章带有标签 “onnx”

SenseVoice

SenseVoice 是具有音频理解能力的音频基础模型,包括语音识别(ASR)、语种识别(LID)、语音情感识别(SER)和声学事件分类(AEC)或声学事件检测(AED)。

SenseVoice

核心功能 🎯

SenseVoice 专注于高精度多语言语音识别、情感辨识和音频事件检测

  • 多语言识别: 采用超过 40 万小时数据训练,支持超过 50 种语言,识别效果上优于 Whisper 模型。
  • 富文本识别:
    • 具备优秀的情感识别,能够在测试数据上达到和超过目前最佳情感识别模型的效果。
    • 支持声音事件检测能力,支持音乐、掌声、笑声、哭声、咳嗽、喷嚏等多种常见人机交互事件进行检测。
  • 高效推理: SenseVoice-Small 模型采用非自回归端到端框架,推理延迟极低,10s 音频推理仅耗时 70ms,15 倍优于 Whisper-Large。
  • 微调定制: 具备便捷的微调脚本与策略,方便用户根据业务场景修复长尾样本问题。
  • 服务部署: 具有完整的服务部署链路,支持多并发请求,支持客户端语言有,python、c++、html、java 与 c# 等。

架构图

  • 语音识别(ASR)
  • 语言识别(LID)
  • 语音情感识别(SER)
  • 音频事件检测(AED,比如笑声、掌声、背景音乐、咳嗽等)
  • 逆文本归一化(ITN)

安装

克隆代码库

Ultralytics YOLOv8 推理速度对比

CPU

服务器信息

lscpu

Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          40
On-line CPU(s) list:             0-39
Thread(s) per core:              2
Core(s) per socket:              10
Socket(s):                       2
NUMA node(s):                    2
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           79
Model name:                      Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz
Stepping:                        1
CPU MHz:                         1201.687
CPU max MHz:                     3400.0000
CPU min MHz:                     1200.0000
BogoMIPS:                        4788.86
Virtualization:                  VT-x
L1d cache:                       640 KiB
L1i cache:                       640 KiB
L2 cache:                        5 MiB
L3 cache:                        50 MiB
NUMA node0 CPU(s):               0-9,20-29
NUMA node1 CPU(s):               10-19,30-39
Vulnerability Itlb multihit:     KVM: Mitigation: Split huge pages
Vulnerability L1tf:              Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Vulnerability Mds:               Mitigation; Clear CPU buffers; SMT vulnerable
Vulnerability Meltdown:          Mitigation; PTI
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full generic retpoline, IBPB conditional, IBRS_FW, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Mitigation; Clear CPU buffers; SMT vulnerable
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nop
                                 l xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c 
                                 rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 sme
                                 p bmi2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts md_clear flush_l1d

Ultralytics YOLOv8

Ultralytics

构建环境

Ultralytics 镜像

  • GPU
docker pull ultralytics/ultralytics:latest
  • CPU
docker pull ultralytics/ultralytics:latest-cpu
  • Apple Silicon
docker pull ultralytics/ultralytics:latest-arm64

本地安装

pip install ultralytics

基于 COCO128 数据集的目标检测范例

运行容器

git clone https://github.com/ultralytics/ultralytics.git
docker run --runtime=nvidia -it --name ultralytics -v `pwd`/ultralytics:/usr/src/ultralytics ultralytics/ultralytics:latest

yolo 命令的使用参数

yolo TASK MODE ARGS

训练模型

yolo train data=coco128.yaml model=yolov8n.pt

训练可视化(Comet) pip install comet_ml export

TVM

模型在使用 TVM 优化编译器框架进行转换时的步骤

from tvm.driver import tvmc
model = tvmc.load('resnet50-v2-7.onnx')
package = tvmc.compile(model, target="llvm")


from tvm.driver import tvmc
model = tvmc.load('sign.onnx') #Step 1: Load
package = tvmc.compile(model, target="llvm") #Step 2: Compile
result = tvmc.run(package, device="cpu") #Step 3: Run
print("Time :", timeit.default_timer() - starttime)


import onnx
import tvm
from tvm import relay
// ...

参考资料 The Deep Learning Compiler: A Comprehensive Survey 深度学习编译器整理 一篇关于深度学习编译器架构的综述论文 Getting Starting using TVMC P

ONNX Simplifier

onnxsim 本身只提供 constant folding/propagation(即消除结果恒为常量的算子)的能力,而图变换(即合并 conv 和 bn 等等)的能力是由 onnxsim 调用 onnx optimizer 的各种 pass 实现的。constant folding 和图变换同时使用时,很多隐藏的优化机会会被挖掘出来,这也是 onnxsim 优化效果出色的原因之一。例如 add(add(x, 1), 2) 在变换为 add(x, add(1, 2)) 之后就可以通过 constant folding 变为 add(x, 3),而 pad(conv(x, w, padding=0), add(1, 1)) 在经过 constant folding 变为 pad(conv(x, w, padding=0), 2) 后,就可以进一步融合成 conv(x, w, padding=2)。

安装

pip install -U pip
pip install onnx-simplifier

使用

命令

onnxsim input_onnx_model output_onnx_model

脚本集成 import onnx from onnxsim import simplify # load your predefined ONNX model model = onnx.

构建基于 ONNXRuntime 的推理服务

构建 ONNXRuntime-GPU 镜像

编写 requirements.txt

$ vim requirements.txt
flask
connexion[swagger-ui]
connexion
gunicorn
numpy
opencv-python
scikit-image
psutil
pynvml
onnxruntime-gpu

编写 Dockerfile 需要带 cudnn 库的 CUDA 作为基镜像 $ vim Dockerfile FROM nvidia/cuda:11.4.0-cudnn8-runtime-ubuntu20.04 LABEL maintainer="wang-junjian@qq.com" RUN rm /etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/nvidia-ml.list && \ sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.

AI 模型打包发布

工程目录

model-package-release/
├── Dockerfile
├── Dockerfile.ubuntu
├── main.sh
└── .ssh
    ├── id_rsa
    ├── id_rsa.pub
    └── known_hosts

main.sh #!/usr/bin/sh # 模型和配置文件 FILE_CONFIG=config.yaml FILE_MODEL=model.onnx # 用于构建模型压缩包的目录结构 DIR_MODELS=models if [ -z $1 ] then echo "Usage: " echo "Environment variable:" echo " MODEL_SERVER_USERNAME Default: username" echo " MODEL_SERVER_IP Default: ip" echo "" echo "docker run --rm -v /home/ai/models/sign.yaml:/app/config.yaml \" echo " -v /home/ai/models/sign.onnx:/app/model.

Building ONNX Runtime

NVIDIA CUDA

单步构建

  • 下载onnxruntime源代码
git clone --recursive https://github.com/microsoft/onnxruntime.git
  • 拉取容器(编译环境)
docker pull nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04
  • 运行容器
docker run -it --name build-onnxruntime-gpu --runtime nvidia \
    -v $(pwd)/onnxruntime:/onnxruntime -w /onnxruntime \
    nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04
  • 更新apt镜像源
sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list
apt-get update
  • 安装依赖包
apt-get install language-pack-en git cmake python3 python3-pip -y
  • 修改语言环境
locale-gen en_US.UTF-8
update-locale LANG=en_US.UTF-8
  • 更新pip镜像源
pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple/

Dockerfile ONNXRuntime GPU

FROM nvidia/cuda:11.1-cudnn8-devel-ubuntu20.04 AS builder
LABEL maintainer="wang-junjian@qq.com"

#E: Failed to fetch https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu2004/x86_64/by-hash/SHA256/f10fc2a7a0d072ddcf141af2ef28f1e97ab4b3a5c3b9bbe34ed845d174fb4979  404  Not Found [IP: 61.155.167.2 443]
#E: Some index files failed to download. They have been ignored, or old ones used instead.
RUN rm /etc/apt/sources.list.d/cuda.list /etc/apt/sources.list.d/nvidia-ml.list

RUN sed -i 's/archive.ubuntu.com/mirrors.aliyun.com/g' /etc/apt/sources.list && \
    apt-get update && \
    apt-get install language-pack-en git python3 python3-pip -y && \
    DEBIAN_FRONTEND=noninteractive apt-get install cmake -y && \
    locale-gen en_US.UTF-8 && \
    update-locale LANG=en_US.UTF-8

RUN pip3 install numpy -i https://mirrors.aliyun.com/pypi/simple/
// ...