打包 Python 工程到 PyPI：构建 LLM 压测工具 evalscope-perf

2024-10-16 2 minute read

创建 Python 工程 evalscope-perf

工程的目录结构

evalscope-perf/
├── evalscope_perf/
│   ├── __init__.py
│   └── main.py
├── README.md
├── LICENSE
├── pyproject.toml
└── setup.py

evalscope_perf/init.py

没有可以不写。

evalscope_perf/main.py

import subprocess
import re
import typer
import matplotlib.pyplot as plt

from typing import List
from typing_extensions import Annotated


app = typer.Typer()

def run_evalscope(url, model, dataset_path, max_prompt_length, stop, read_timeout, parallel, n):
    cmd = [
        'evalscope', 'perf',
        '--api', 'openai',
        '--url', url,
        '--model', model,
        '--dataset', 'openqa',
        '--dataset-path', dataset_path,
        '--max-prompt-length', str(max_prompt_length),
        '--stop', stop,
        '--read-timeout', str(read_timeout),
        '--parallel', str(parallel),
        '-n', str(n)
    ]
    result = subprocess.run(cmd, stdout=subprocess.PIPE, text=True)

    # 将输出保存到文件
    dataset_name = dataset_path.split('/')[-1].split('.')[0]
    output_filename = f'{model}_{dataset_name}_p{parallel}.txt'
    with open(output_filename, 'w') as f:
        f.write(result.stdout)

    return result.stdout

def parse_output(output):
    print(output)
    metrics = {}
    patterns = {
        'Average QPS': r'Average QPS:\s+([\d.]+)',
        'Average latency': r'Average latency:\s+([\d.]+)',
        'Throughput': r'Throughput\(average output tokens per second\):\s+([\d.]+)'
    }
    for key, pattern in patterns.items():
        match = re.search(pattern, output)
        if match:
            metrics[key] = float(match.group(1))
    print('📌 Metrics:', metrics)
    return metrics

@app.command()
def main(
    url: str = typer.Argument(..., help="OpenAI URL"),
    model: str = typer.Argument(..., help="模型名称"),
    dataset_path: str = typer.Argument(..., help="数据集路径"),
    max_prompt_length: int = typer.Option(256, help="最大提示长度"),
    stop: str = typer.Option("<|im_end|>", help="停止标记"),
    read_timeout: int = typer.Option(30, help="读取超时"),
    parallels: Annotated[List[int], "并行数"] = typer.Option([1], help="并行数"),
    n: int = typer.Option(1, help="请求数")
):
    data = {'Parallel': [], 'Average QPS': [], 'Average latency': [], 'Throughput': []}

    for parallel in parallels:
        print(f'Running with parallel={parallel}')
        output = run_evalscope(url, model, dataset_path, max_prompt_length, stop, read_timeout, parallel, n)
        metrics = parse_output(output)
        data['Parallel'].append(parallel)
        data['Average QPS'].append(metrics.get('Average QPS', 0))
        data['Average latency'].append(metrics.get('Average latency', 0))
        data['Throughput'].append(metrics.get('Throughput', 0))

    # 绘制子图
    fig, axs = plt.subplots(2, 2, figsize=(18, 9))

    axs[0, 0].plot(data['Parallel'], data['Average QPS'], marker='o')
    axs[0, 0].set_title('Average QPS vs Parallel Number')
    axs[0, 0].set_xlabel('Parallel Number')
    axs[0, 0].set_ylabel('Average QPS')

    axs[0, 1].plot(data['Parallel'], data['Average latency'], marker='o', color='orange')
    axs[0, 1].set_title('Average Latency vs Parallel Number')
    axs[0, 1].set_xlabel('Parallel Number')
    axs[0, 1].set_ylabel('Average Latency (s)')

    axs[1, 0].plot(data['Parallel'], data['Throughput'], marker='o', color='green')
    axs[1, 0].set_title('Throughput vs Parallel Number')
    axs[1, 0].set_xlabel('Parallel Number')
    axs[1, 0].set_ylabel('Throughput (token/s)')

    fig.delaxes(axs[1, 1])  # Remove the empty subplot

    plt.tight_layout()
    plt.savefig('performance_metrics.png')

if __name__ == "__main__":
    app()

README.md

evalscope-perf: Model inference performance stress test

pyproject.toml

[build-system]
requires = ["setuptools>=42", "wheel"]
build-backend = "setuptools.build_meta"

setup.py

from setuptools import setup, find_packages

import os

# 获取当前文件所在目录
this_directory = os.path.abspath(os.path.dirname(__file__))

# 读取 README.md 内容
with open(os.path.join(this_directory, "README.md"), encoding="utf-8") as fh:
    long_description = fh.read()

setup(
    name='evalscope-perf',
    version='0.1.2',
    author = 'Junjian Wang',
    author_email = 'vwarship@163.com',
    description = '大模型性能压测可视化',
    long_description = long_description,
    long_description_content_type = 'text/markdown',
    url = 'http://www.wangjunjian.com',
    packages=find_packages(),
    install_requires=[
        # 在这里添加你的依赖包，例如：
        # 'requests',
        'typer',
        'pandas',
        'matplotlib',
        # 'evalscope',  # 太大了，不建议直接依赖
    ],
    entry_points={
        'console_scripts': [
            # 在这里添加你的命令行工具，例如：
            'evalscope-perf=evalscope_perf.main:app',
        ],
    },
    classifiers=[
        "Programming Language :: Python :: 3",
        "License :: OSI Approved :: MIT License",
        "Operating System :: OS Independent",
    ],
    python_requires='>=3.6',
)

📌 这里要注意命令行 evalscope-perf=evalscope_perf.main:app 使用的是 app

如果需要增加非 Python 代码的文件，可以在 setup.py 中添加 package_data 字段，例如：

    include_package_data=True,
    package_data={
        'evalscope_perf': ['assets/*.*'],
    },

开发

生成源代码分发包

python setup.py sdist

安装源代码分发包

pip install -e .

-e --editable 表示安装到当前目录，方便开发时修改代码后立即生效。

打包 Python 工程

同时生成源代码和二进制 wheel 分发包

python setup.py sdist bdist_wheel

sdist
- 功能：生成源代码分发包。
- 产物：通常为 .tar.gz 或 .zip 文件，包含项目的源代码。
- 用途：用户可以通过源代码安装包，适用于所有平台。
bdist_wheel
- 功能：生成二进制 whell 包。
- 产物：.whl 文件，预编译的包，包含二进制文件（如果有的话）。
- 用途：加快安装速度，适用于支持的特定平台和Python版本。

同时生成源代码和二进制 wheel 分发包的优点：提供更灵活的安装选项，满足不同用户的需求。

发布 Python 工程到 PyPI

安装 twine

pip install twine

注册 PyPI 账号

https://pypi.org/account/register/
注册后，登录到 https://pypi.org/account/login/，创建一个 API token。

编辑 PyPI 配置文件 ~/.pypirc

[pypi]
  username = __token__
  password = pypi-api-token_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

上传包到 PyPI

twine upload dist/*

dist/* 表示上传 dist 目录下的所有文件。

安装发布的包

pip install evalscope-perf -i https://pypi.org/simple

evalscope-perf 使用示例

evalscope-perf http://127.0.0.1:8000/v1/chat/completions lnsoft-chat \
    ./datasets/open_qa.jsonl \
    --read-timeout=120 \
    --parallels 16 \
    --parallels 32 \
    --parallels 64 \
    --parallels 100 \
    --parallels 128 \
    --parallels 150 \
    --parallels 200 \
    --parallels 300 \
    --parallels 400 \
    --parallels 500 \
    --n 1000