146 篇文章带有标签 “llm”

2025年3月14日星期五

模型上下文协议 (MCP) 全面解析：原理、应用与实现

这篇文章是使用 Google Gemini Deep Research 生成的。提示词：研究 Model Context Protocol

1. 模型上下文协议 (MCP) 导论

大型语言模型 (LLMs) 在理解和生成人类语言方面取得了显著的进步。然而，这些模型本质上是孤立的，它们的知识仅限于训练数据，并且缺乏与外部世界交互的能力 1。为了克服这些限制，将 LLMs 与外部数据源和工具集成变得至关重要 1。传统上，这种集成是通过为每个新的数据源或工具开发定制的连接器来实现的 1。这种方法导致了集成工作的重复，难以扩展，并且维护成本高昂，阻碍了上下文感知 AI 的广泛采用 1。

为了应对这一挑战，模型上下文协议 (MCP) 应运而生 1。MCP 是一种开放标准，旨在规范应用程序如何向 LLMs 提供上下文和工具 6。可以将 MCP 视为 AI 应用程序的通用连接器，类似于 USB-C 标准化了设备和外设之间的连接 6。通过提供一种标准化的方式将 AI 模型连接到各种数据源和工具，MCP 简化了集成，增强了互操作性，并促进了可扩展性 6。

本报告旨在对模型上下文协议 (MCP) 进行全面的解析，涵盖其基本原理、核心架构、通信机制、广泛的应用场景以及客户端和服务器端的创建方法。通过深入理解 MCP，开发者和组织可以更好地利用这一新兴标准，构建更智能、更具上下文感知能力的 AI 应用。

2025-03-14 10:00

2025年3月8日星期六

推理 LLM 技术内幕 - DeepSeek-R1/o1

2025-03-08 10:00

deepseek-r1 openai-o1 reasoning-model chain-of-thought test-time-compute reinforcement-learning llm 推理模型

2025年3月3日星期一

大模型推理服务压测报告：vLLM、SGLang、LiteLLM 与 Higress 性能对比

服务器配置

CPU: Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz（64核）
GPU: NVIDIA T4（16GB）X 4
内存: 256GB

创建压测 LLM 环境

conda create -n eval-llm python==3.12 -y
conda activate eval-llm

创建工作目录

cd /data/wjj
mkdir eval-llm
cd eval-llm

安装 vllm

pip install vllm==0.7.3 pandas

git clone https://github.com/vllm-project/vllm

拉取 sglang 镜像

docker pull lmsysorg/sglang:latest

安装 evalscope-perf

pip install evalscope-perf==1.0.0

处理 API Key（访问的 API 需要认证）

通过设置环境变量没有生效。

export OPENAI_API_KEY=sk-1234

这里进行了硬编码，编辑文件：/data/miniconda3/envs/eval-llm/lib/python3.12/site-packages/evalscope_perf/main.py

2025-03-03 10:00

benchmark vllm sglang litellm higress qwen inference-server evalscope gpu llm

2025年3月1日星期六

构建本地 AI 技术栈

构建环境

选择 Python 版本

Python Releases

安装 LiteLLM + LangFuse

conda create -n litellm python==3.12.9 -y
conda activate litellm                     

pip install "litellm[proxy]" langfuse openai

Cookbook: LiteLLM (Proxy) + Langfuse OpenAI Integration

LangFuse

部署（Docker）

git clone https://github.com/langfuse/langfuse.git
cd langfuse

docker compose up

注册用户

浏览器访问 http://localhost:3000/，单击 Sign up 注册一个新账户。

创建组织和工程

API Keys

LiteLLM

克隆 LiteLLM（可选）

git clone https://github.com/BerriAI/litellm
cd litellm

编辑配置 litellm_config.yaml

2025-03-01 10:00

litellm langfuse ollama chatbox local-ai llm proxy docker observability

2025年2月25日星期二

海光 DCU 的大模型推理性能压测

服务器配置

CPU 信息

CPU: Hygon C86 7490 64-core Processor X 2

lscpu

架构：                              x86_64
CPU 运行模式：                      32-bit, 64-bit
字节序：                            Little Endian
Address sizes:                      48 bits physical, 48 bits virtual
CPU:                                256
在线 CPU 列表：                     0-254
离线 CPU 列表：                     255
每个核的线程数：                    1
每个座的核数：                      64
座：                                2
NUMA 节点：                         8
厂商 ID：                           HygonGenuine
BIOS Vendor ID:                     Chengdu Hygon
CPU 系列：                          24
型号：                              4
// ...

DCU 信息

DCU：Hygon K100_AI 64G X 8

lspci -v | grep -A22 'Co-processor'

2025-02-25 10:00

海光 hygon dcu vllm evalscope benchmark llm qwen litellm

2025年2月22日星期六

Cline: 自主编程助手

开发

克隆仓库

git clone https://github.com/cline/cline.git

打开项目

code cline

安装依赖

npm run install:all

安装 esbuild problem matchers 扩展

如果构建项目时遇到问题，请安装 esbuild problem matchers 扩展。

Activating task providers npm
错误: problemMatcher 引用无效: $esbuild-watch

启动

打开 运行和调试 侧边栏，运行 Run Extension，或者按 F5 键启动调试，打开一个新的 VSCode 窗口，加载扩展。

配置

配置模型 Ollama

智能体编码

查看 issue

显示 issue

创建分支

修复 issue

运行 RAGFlowAssistant

安装 GitHub MCP Server

2025-02-22 10:00

cline agent ollama llm vscode-extension github mcp coding-agent

2025年2月18日星期二

构建自主答题的智能体

目标

这里想探索使用多模态大模型答题的技术方案，包含单选题、多选题、判断题，最终构建自主答题的智能体。

工作流程：🏞️ -> MLM（多模态大模型）-> 答案

📝思路一

直接使用多模态大模型读题（转成文字），然后检索答案，把题和答案组合的提示词输入给语言大模型。

我使用了 Ollama 调用多模态大模型 minicpm-v:8b 来生成文字。llava:7b 的效果不好。

代码示例：

import ollama

response = ollama.chat(
	model="minicpm-v:8b",
	messages=[
		{
			'role': 'user',
			'content': '读取图像中的题。',
			'images': ['ti.png']
		}
	]
)

print(response['message']['content'])

2025-02-18 10:00

安规 agent ollama 多模态 llm prompt-engineering minicpm-v vision-language-model

2025年2月14日星期五

部署 DeepSeek-R1 蒸馏模型

GPU 服务器

T4 GPU 服务器，4卡16G。

安装 vLLM

conda create -n deepseek-r1 python=3.12 -y
conda activate deepseek-r1

pip install vllm

Installation GPU

错误处理

ImportError: undefined symbol: __nvJitLinkComplete_12_4, version libnvJitLink.so.12

2025-02-14 10:00

deepseek-r1 vllm qwen jan ollama model-deployment llm reasoning gpu

2025年2月13日星期四

沐曦 MXC500 训练 GPU 的大模型推理性能压测

沐曦训练芯片 MXC500 介绍

曦云®C500是沐曦面向通用计算的旗舰产品，提供强大高精度及多精度混合算力，配备大规格高带宽显存，片间互联MetaXLink无缝链接多GPU系统，自主研发的MXMACA®软件栈可兼容主流GPU生态，能够全面满足数字经济建设和产业数字化的算力需求。

2023 年 6 月 14 日，沐曦官宣 AI 训练 GPU MXC500 完成芯片功能测试，MXMACA 2.0 计算平台基础测试完成，意味着公司首款 AI 训练芯片 MXC500成功点亮，该芯片采用 7nm 制程，GPGPU 架构，能够兼容 CUDA，目标对标英伟达 A100/A800 芯片。

沐曦主要有三大产品线：

用于 AI 推理的 MXN 系列；
用于 AI 训练及通用计算的 MXC 系列；
用于图形渲染的 MXG 系列。

研发实力强大，软件生态布局完善。沐曦的研发团队阵容豪华，三位创始人均在 AMD 拥有 20 年左右的 GPU 研发经验，其中两位为 AMD 科学家（Fellow）。沐曦采用了完全自主研发的 GPU IP，有效提高了产品的开发效率，同时拥有完全自主知识产权的指令集和架构，可以对每个独立的计算实例进行灵活配置，从而优化数据中心计算资源的效率。

2025-02-13 10:00

沐曦 mxc500 gpu vllm evalscope benchmark llm qwen numa

2025年2月4日星期二

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

ABSTRACT（摘要）

Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. To this end, we introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories.

2025-02-04 10:00

swe-bench benchmark github llm code-generation program-repair python retrieval

2025年2月1日星期六

Claude: Developing a computer use model

Developing a computer use model（开发计算机使用模型）

Claude can now use computers. The latest version of Claude 3.5 Sonnet can, when run through the appropriate software setup, follow a user’s commands to move a cursor around their computer’s screen, click on relevant locations, and input information via a virtual keyboard, emulating the way people interact with their own computer.

Claude现在可以使用计算机了。最新版本的Claude 3.5 Sonnet可以在通过适当的软件设置后，按照用户的命令在计算机屏幕上移动光标，单击相关位置，并通过虚拟键盘输入信息，模拟人们与自己的计算机交互的方式。

We think this skill—which is currently in public beta—represents a significant breakt

2025-02-01 10:00

claude anthropic computer-use agent llm osworld safety prompt-injection api

2025年1月31日星期五

OSWorld：在真实计算机环境中为开放式任务进行多模态代理基准测试

参考

Abstract（摘要）

Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability.

2025-01-31 10:00

osworld benchmark agent multimodal-agent vlm llm gui cli pyautogui

2025年1月27日星期一

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

UI-TARS: Pioneering Automated GUI Interaction with Native Agents（与本地代理进行自动化 GUI 交互的先驱）

Abstract（摘要）

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks.

2025-01-27 10:00

ui-tars agent gui llm native-agent bytedance qwen-2-vl osworld androidworld system-2-reasoning

2025年1月23日星期四

DeepSeek-V3 Technical Report

DeepSeek-V3 Technical Report

Abstract（摘要）

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architec- tures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.

2025-01-23 10:00

deepseek-v3 moe llm mla deepseekmoe fp8-training multi-token-prediction training-efficiency inference

2025年1月21日星期二

DeepSeek R1: 通过强化学习激励 LLM 的推理能力

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Abstract（摘要）

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super- vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing.

2025-01-21 10:00

deepseek-r1 deepseek-r1-zero llm reinforcement-learning reasoning chain-of-thought distillation grpo cold-start

2025年1月17日星期五

CodeGate - 让 AI 编码助手更安全

什么是 CodeGate

CodeGate 是位于 AI 编码助手和 LLM 之间的本地提示网关，用于增强隐私和安全性。

执行代码安全审查
识别包依赖项中的漏洞
防止敏感数据（如机密）与 AI 模型共享

工作原理

CodeGate 是位于 AI 编码助手和 LLM 之间的本地代理。CodeGate 会审查您的提示是否存在任何潜在的机密泄露 — 在机密离开您的桌面之前对其进行加密，并在响应中对其进行解密。CodeGate 使用 RAG 来更新任何 LLM 的知识库，并提供相关的风险洞察。

Continue 指南

启动 CodeGate 服务

docker pull ghcr.io/stacklok/codegate:latest
docker run --name codegate -d -p 8989:8989 -p 9090:9090 --restart unless-stopped ghcr.io/stacklok/codegate:latest

下载 Ollama 代码模型

ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:1.5b

配置 Continue 扩展

编辑配置文件：~/.continue/config.json

2025-01-17 10:00

codegate ai-gateway llm ai-coding-assistant security privacy continue ollama qwen2.5-coder

2024年11月7日星期四

华为 Atlas A2 上使用 LLaMA-Factory 模型微调

济南人工智能计算中心

菜单

云资源
- ModelArts
  - 开发环境
    - Notebook

创建 Notebook

自定义镜像：llama2
类型：ASCEND
规格：Ascend: 8*Ascend910 ARM: 192核 768GB
存储配置：云硬盘EVS
- 磁盘规格：200GB

工作目录：/home/ma-user/work

下载模型

安装 modelscope

pip install --upgrade modelscope

SDK 下载模型脚本

编辑 download.py 文件

#模型下载
from modelscope import snapshot_download
model_dir = snapshot_download('Qwen/Qwen1.5-7B-Chat')

设置下载路径

export MODELSCOPE_CACHE=/home/ma-user/work

下载

python download.py

查看下载的模型

ll /home/ma-user/work/hub/Qwen/Qwen1___5-7B-Chat

修改模型配置文件

修改配置文件：Qwen/Qwen1___5-7B-Chat/config.json

{
  "torch_dtype": "float16",
}

NPU 不支持 bfloat16，模型配置文件需要修改为 float16。

2024-11-07 10:00

huawei atlas-a2 npu llama-factory fine-tuning lora deepspeed qwen modelscope llm

2024年10月31日星期四

华为 Atlas A2 算力切分

算力切分

查询算力切分模式

sudo npu-smi info -t vnpu-mode

    vnpu-mode                      : docker

查询算力切分模板信息 sudo npu-smi info -t template-info +------------------------------------------------------------------------------------------+ |NPU instance template info is: | |Name AICORE Memory AICPU VPC VENC JPEGD | | GB PNGD VDEC JPEGE | |==========================================================================================| |vir10_3c_16g 10 16 3 4 0 12 | | 0 1 2 | +------------------------------------------------------------------------------------------+ |vir10_4c_16g_m 10 16 4 9 0 24 | | 0 2 4 | +---------------------------

2024-10-31 10:00

huawei atlas-a2 npu 算力切分 vnpu ascend-910b docker mindie llm

2024年10月28日星期一

LangChain Blog: In the Loop

What is an agent?

“什么是代理？”

几乎每天都会有人问我这个问题。在 LangChain，我们构建工具来帮助开发者构建 LLM 应用程序，特别是那些充当推理引擎并与外部数据和计算源交互的应用程序。这包括通常被称为“代理”的系统。

每个人似乎对代理都有稍微不同的定义。我的定义可能比大多数人更技术性：

💡 代理是一个使用 LLM 来决定应用程序控制流的系统。

即使在这里，我也承认我的定义并不完美。人们通常认为代理是高级的、自主的、类人的——但如果是一个简单的系统，LLM 在两个不同路径之间进行路由呢？这符合我的技术定义，但不符合人们对代理应具备能力的普遍看法。很难准确定义什么是代理！

这就是为什么我非常喜欢 Andrew Ng 上周的推文。在推文中，他建议“与其争论哪些工作应被包括或排除为真正的代理，我们可以承认系统可以有不同程度的代理性。”就像自动驾驶汽车有不同的自动化级别一样，我们也可以将代理能力视为一个光谱。我非常同意这个观点，我认为 Andrew 表达得很好。将来，当有人问我什么是代理时，我会转而讨论什么是“代理性”。

什么是代理性（agentic）？

去年我在 TED 演讲中谈到了 LLM 系统，并使用下面的幻灯片讨论了 LLM 应用程序中存在的不同自主级别。

一个系统越“代理性”，LLM 决定系统行为的程度就越高。

使用 LLM 将输入路由到特定的下游工作流中具有一些小的“

2024-10-28 10:00

langchain agent agentic llm langgraph cognitive-architecture function-calling memory

2024年9月28日星期六

LLM 的合成数据

Cosmopedia: 如何为预训练构建大规模合成数据集

本文档的专注点是如何将样本从 几千 扩展到 数百万，从而使其可用于 从头开始预训练 LLM。深入研究了创建数据集的方法、提示整编的方法及相应的技术栈。

Cosmopedia

Cosmopedia 的目的是重现 Phi-1.5 所使用的训练数据。

围绕在 Phi 数据集上的谜团除了我们对其如何创建的不甚了了之外，还有一个问题是其数据集的生成使用的是私有模型。为了解决这些问题，我们引入了 Cosmopedia，这是由 Mixtral-8x7B-Instruct-v0.1 生成的包含教科书、博文、故事、帖子以及 WikiHow 文章等各种体裁的合成数据集。其中有超过 3000 万个文件、250 亿个词元，是迄今为止最大的开放合成数据集。

实际上 Cosmopedia 的大部分时间都花在了细致的提示词工程上了。

2024-09-28 08:00

synthetic-data cosmopedia distilabel argilla llm-swarm data-generation model-training 数据增强 mixtral llm

146 篇文章带有标签 “llm”

2025年3月14日 星期五

2025年3月8日 星期六

2025年3月3日 星期一

2025年3月1日 星期六

2025年2月25日 星期二

2025年2月22日 星期六

2025年2月18日 星期二

2025年2月14日 星期五

2025年2月13日 星期四

2025年2月4日 星期二

2025年2月1日 星期六

2025年1月31日 星期五

2025年1月27日 星期一

2025年1月23日 星期四

2025年1月21日 星期二

2025年1月17日 星期五

2024年11月7日 星期四

2024年10月31日 星期四

2024年10月28日 星期一

2024年9月28日 星期六

2025年3月14日星期五

2025年3月8日星期六

2025年3月3日星期一

2025年3月1日星期六

2025年2月25日星期二

2025年2月22日星期六

2025年2月18日星期二

2025年2月14日星期五

2025年2月13日星期四

2025年2月4日星期二

2025年2月1日星期六

2025年1月31日星期五

2025年1月27日星期一

2025年1月23日星期四

2025年1月21日星期二

2025年1月17日星期五

2024年11月7日星期四

2024年10月31日星期四

2024年10月28日星期一

2024年9月28日星期六