4 篇文章带有标签 “Inference”

2024年10月1日星期二

推测解码 (Speculative Decoding)

初步生成：使用一个小而快速的模型（称为Mq），生成一系列初步的 tokens。这个模型很高效，所以能快速得到结果。
并行评估：接着，使用一个更大的目标模型（称为Mp）来同时评估Mq生成的所有 tokens。Mp会判断每个 token 的概率，选择那些可能性高的结果。
修正输出：对于那些被Mq生成但被Mp拒绝的低概率 token，Mp会提供新的替代 token。这一步确保了输出的质量，同时提高了生成的速度。

Serving AI models faster with speculative decoding
1. 生成多个猜测候选: 使用一个更小更高效的"草稿"模型或者是主模型本身的最后一层，生成多个可能的下一个token作为猜测。
2. 并行评估猜测: 利用主要的大型语言模型（LLM）并行地对这些猜测进行评估，计算每个猜测的概率分布。
3. 接受或拒绝猜测: 通过比较每个猜测在 LLM 和草稿模型下的概率，以及生成一个随机数进行判断，决定是否接受该猜测。
4. 调整并重采样: 如果所有猜测都被接受，则直接从 LLM 采样下一个token。如果有猜测被拒绝，则从调整后的概率分布中重新采样被拒绝的猜测。
5. 输出结果: 最终输出包括所有被接受的猜测以及从 LLM 采样或重采样得到的token。

Text Generation Inference - Speculation 推测解码：在不降低准确性

2024年10月1日 1 分钟 390 字

2023年12月19日星期二

Text Generation Inference

2023年12月19日 1 分钟 191 字

TGI Inference

2023年11月9日星期四

Transformers Pipeline

pip install datasets evaluate transformers[sentencepiece]

No model was supplied, defaulted to distilbert-base-uncased-finetuned-sst-2-english and revision af0f99b (https://huggingface.co/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.
[{'label': 'POSITIVE', 'score': 0.9598048329353333},
 {'label': 'NEGATIVE', 'score': 0.9994558691978455}]

模型：uer/roberta-base-finetuned-dianping-chinese

2023年11月9日 1 分钟 157 字

Inference Transformers Pipeline LLM.int8

2023年5月16日星期二

Ultralytics YOLOv8 推理速度对比

Ultralytics YOLOv8.0.92 🚀 Python-3.10.7 torch-2.0.0+cpu CPU
Model summary (fused): 168 layers, 3006038 parameters, 0 gradients, 8.1 GFLOPs

image 1/304 /usr/src/datasets/platen-switch/images/train/1.jpg: 480x640 22 closes, 14 opens, 51.7ms
image 2/304 /usr/src/datasets/platen-switch/images/train/2.jpg: 384x640 12 closes, 24 opens, 45.7ms
image 3/304 /usr/src/datasets/platen-switch/images/train/3.jpg: 576x640 12 closes, 33 opens, 62.3ms

Speed: 3.7ms preprocess, 45.0ms inference, 1.3ms postprocess per image at shape (1, 3, 640, 640)

2023年5月16日 3 分钟 838 字

Ultralytics YOLO pt ONNX 知识扩展 Inference

4 篇文章带有标签 “Inference”

2024年10月1日 星期二

推测解码 (Speculative Decoding)

2023年12月19日 星期二

Text Generation Inference

2023年11月9日 星期四

Transformers Pipeline

2023年5月16日 星期二

Ultralytics YOLOv8 推理速度对比

2024年10月1日星期二

2023年12月19日星期二

2023年11月9日星期四

2023年5月16日星期二