文档 - 第 17 页 - 军舰的日志

2025年2月7日星期五

Open-source DeepResearch – Freeing our search agents

Open-source DeepResearch

TLDR

Yesterday, OpenAI released Deep Research, a system that browses the web to summarize content and answer questions based on the summary. The system is impressive and blew our minds when we tried it for the first time.

昨天，OpenAI 发布了 Deep Research，这是一个浏览网页以总结内容并根据总结回答问题的系统。当我们第一次尝试时，这个系统给我们留下了深刻的印象。

One of the main results in the blog post is a strong improvement of performances on the General AI Assistants benchmark (GAIA), a benchmark we’ve been playing with recently as well, where they successfully reached near 67% correct answers on 1-shot on average, and 47.

2025-02-07 10:00

2025年2月6日星期四

Introducing deep research

Deep research

An agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks for you.

一个代理，使用推理来综合大量在线信息，并为您完成多步研究任务。

Today we’re launching deep research in ChatGPT, a new agentic capability that conducts multi-step research on the internet for complex tasks. It accomplishes in tens of minutes what would take a human many hours.

今天我们在 ChatGPT 中推出了 deep research，这是一种新的代理能力，可以在互联网上进行复杂任务的多步研究。它可以在几十分钟内完成人类需要花费数小时才能完成的任务。

2025-02-06 10:00

deep-research openai agent reasoning web-browsing o3 chatgpt benchmark

2025年2月4日星期二

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

ABSTRACT（摘要）

Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. To this end, we introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories.

2025-02-04 10:00

swe-bench benchmark github llm code-generation program-repair python retrieval

2025年2月2日星期日

DeepSeek Janus Pro 7B

SiliconFlow 图像生成

从实验来看，需要用英文描述，中文描述生成的效果不好。

实验 1

This year is the Year of the Snake. I want to create a lifelike snake, wearing a fiery red new outfit, holding its head high, floating in the air, and writing "Happy New Year 2025" in snake-like font.

今年是蛇年，我想生成一只栩栩如生的蛇，穿着火红色的新衣，高昂着头，悬浮于空，用蛇体字型写上“2025年新年快乐”。

下面的图是快手可灵生成的。

实验 2

I wanted to create a lifelike snake, with its head held high, suspended in the air.

我想生成一只栩栩如生的蛇，高昂着头，悬浮于空。

实验 3

Modern abstract digital artwork with a split layout, black on the left and beige on the right.

2025-02-02 10:00

deepseek janus-pro-7b 多模态 text-to-image image-generation 图像生成

2025年2月1日星期六

Claude API: Computer use

Claude API - Computer use

Computer use reference implementation（计算机使用参考实现）

Get started quickly with our computer use reference implementation that includes a web interface, Docker container, example tool implementations, and an agent loop.

快速开始使用我们的计算机使用参考实现，其中包括Web界面、Docker容器、示例工具实现和代理循环。

Here’s an example of how to provide computer use tools to Claude using the Messages API:

以下是如何使用消息API为Claude提供计算机使用工具的示例：

2025-02-01 12:00

claude computer-use agent api anthropic docker tool-use python

Claude: Developing a computer use model

Developing a computer use model（开发计算机使用模型）

Claude can now use computers. The latest version of Claude 3.5 Sonnet can, when run through the appropriate software setup, follow a user’s commands to move a cursor around their computer’s screen, click on relevant locations, and input information via a virtual keyboard, emulating the way people interact with their own computer.

Claude现在可以使用计算机了。最新版本的Claude 3.5 Sonnet可以在通过适当的软件设置后，按照用户的命令在计算机屏幕上移动光标，单击相关位置，并通过虚拟键盘输入信息，模拟人们与自己的计算机交互的方式。

We think this skill—which is currently in public beta—represents a significant breakt

2025-02-01 10:00

claude anthropic computer-use agent llm osworld safety prompt-injection api

2025年1月31日星期五

OSWorld：在真实计算机环境中为开放式任务进行多模态代理基准测试

参考

Abstract（摘要）

Autonomous agents that accomplish complex computer tasks with minimal human interventions have the potential to transform human-computer interaction, significantly enhancing accessibility and productivity. However, existing benchmarks either lack an interactive environment or are limited to environments specific to certain applications or domains, failing to reflect the diverse and complex nature of real-world computer use, thereby limiting the scope of tasks and agent scalability.

2025-01-31 10:00

osworld benchmark agent multimodal-agent vlm llm gui cli pyautogui

2025年1月27日星期一

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

UI-TARS: Pioneering Automated GUI Interaction with Native Agents（与本地代理进行自动化 GUI 交互的先驱）

Abstract（摘要）

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks.

2025-01-27 10:00

ui-tars agent gui llm native-agent bytedance qwen-2-vl osworld androidworld system-2-reasoning

2025年1月26日星期日

CUA 评估额外信息

CUA eval extra information

This document includes extra information to how we evaluated our Computer Using Agent, including (browser/VM) environments, prompts, sampling parameters, and scoring procedures. For more details, read https://openai.com/index/computer-using-agent/.

本文档包括我们如何评估我们的计算机使用代理的额外信息，包括（浏览器/VM）环境，提示，采样参数和评分程序。有关更多详细信息，请阅读 https://openai.com/index/computer-using-agent/ 。

1 Environment（环境）

For WebArena and WebVoyager, we run the evals in operator browser instead of playwright browsers since our model relies on the visual action space for navigation (search bar, backward/forward button). Our model does not have access to tool calls that control the navigation.
对于WebArena和WebVoyager，我们在 operator browser 中运行评估，而不是在 playwright 浏览器中运行，因为我们的模型依赖于用于导航的视觉动作空间（搜索栏，后退/前进按钮）。我们的模型无法访问控制导航的工具调用。
For OSWorld, we use the VMWare Ubuntu VM distributed by the authors. Our environment has the dock on the right side of the screen instead of the left side, which we have found to improve the performance slightly.
对于 OSWorld，我们使用作者分发的 VMWare Ubuntu VM。我们的环境将 dock 放在屏幕的右侧，而不是左侧，我们发现这样可以稍微提高性能。

2025-01-26 10:00

cua benchmark openai osworld webarena webvoyager evaluation prompt-engineering

2025年1月25日星期六

Computer-Using Agent

Computer-Using Agent (CUA)

A universal interface for AI to interact with the digital world. AI 与数字世界交互的通用接口。

Today we introduced a research preview of Operator⁠, an agent that can go to the web to perform tasks for you. Powering Operator is Computer-Using Agent (CUA), a model that combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning. CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen—just as humans do.

2025-01-25 10:00

cua operator computer-using-agent openai gui agent osworld webarena webvoyager

2025年1月24日星期五

Operator System Card

Operator System Card

1 Introduction（简介）

Operator is a research preview of our Computer-Using Agent (CUA) model, which combines GPT-4o’s vision capabilities with advanced reasoning through reinforcement learning. It interprets screenshots and interacts with graphical user interfaces (GUIs) — the buttons, menus, and text fields people see on a computer screen — just as people do. Operator’s ability to use a computer enables it to interact with the same tools and interfaces that people rely on daily, unlocking the potential to assist with an unparalleled range of tasks.

Operator 是我们计算机使用代理（CUA）模型的研究

2025-01-24 10:00

operator cua computer-using-agent openai gpt-4o agent safety red-teaming prompt-injection

2025年1月23日星期四

DeepSeek-V3 Technical Report

DeepSeek-V3 Technical Report

Abstract（摘要）

We present DeepSeek-V3, a strong Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token. To achieve efficient inference and cost-effective training, DeepSeek-V3 adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architec- tures, which were thoroughly validated in DeepSeek-V2. Furthermore, DeepSeek-V3 pioneers an auxiliary-loss-free strategy for load balancing and sets a multi-token prediction training objective for stronger performance. We pre-train DeepSeek-V3 on 14.

2025-01-23 10:00

deepseek-v3 moe llm mla deepseekmoe fp8-training multi-token-prediction training-efficiency inference

2025年1月21日星期二

DeepSeek R1: 通过强化学习激励 LLM 的推理能力

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Abstract（摘要）

We introduce our first-generation reasoning models, DeepSeek-R1-Zero and DeepSeek-R1. DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without super- vised fine-tuning (SFT) as a preliminary step, demonstrates remarkable reasoning capabilities. Through RL, DeepSeek-R1-Zero naturally emerges with numerous powerful and intriguing reasoning behaviors. However, it encounters challenges such as poor readability, and language mixing.

2025-01-21 10:00

deepseek-r1 deepseek-r1-zero llm reinforcement-learning reasoning chain-of-thought distillation grpo cold-start

2025年1月17日星期五

CodeGate - 让 AI 编码助手更安全

什么是 CodeGate

CodeGate 是位于 AI 编码助手和 LLM 之间的本地提示网关，用于增强隐私和安全性。

执行代码安全审查
识别包依赖项中的漏洞
防止敏感数据（如机密）与 AI 模型共享

工作原理

CodeGate 是位于 AI 编码助手和 LLM 之间的本地代理。CodeGate 会审查您的提示是否存在任何潜在的机密泄露 — 在机密离开您的桌面之前对其进行加密，并在响应中对其进行解密。CodeGate 使用 RAG 来更新任何 LLM 的知识库，并提供相关的风险洞察。

Continue 指南

启动 CodeGate 服务

docker pull ghcr.io/stacklok/codegate:latest
docker run --name codegate -d -p 8989:8989 -p 9090:9090 --restart unless-stopped ghcr.io/stacklok/codegate:latest

下载 Ollama 代码模型

ollama pull qwen2.5-coder:7b
ollama pull qwen2.5-coder:1.5b

配置 Continue 扩展

编辑配置文件：~/.continue/config.json

2025-01-17 10:00

codegate ai-gateway llm ai-coding-assistant security privacy continue ollama qwen2.5-coder

2025年1月14日星期二

腾讯会议中云录制的 AI+

腾讯会议中云录制应用的核心：

快速定位（章节、发言人、话题）
转写、纪要、总结
内容问答（AI小助手）

视频的主界面

下方区域

总结

章节

发言人

话题

右侧区域

转写

可以使用句子进行视频定位

纪要

可以按 章节、主题、发言人 进行纪要生成。

会议待办

AI小助手

这个AI小助手价格太贵了，可能对于中大型企业用户有一定吸引力，没有多少录制视频的用户基本不用考虑，上面的功能已经足够了。

这个营销太差了，这么高的价格也不给人试用，上来就收费，打击用户积极性。

下面是AI会议中提供的AI小助手

2025-01-14 10:00

腾讯会议云录制 ai应用会议纪要语音识别转写智能助手协同办公

2025年1月1日星期三

如何投资个人养老金

2024年12月15日，个人养老金制度正式在全国全面实施。这里记录一下如何投资个人养老金。

个人养老金

个人养老金制度全面实施

个人养老金制度全面实施

指数基金

什么是个人养老金

基金数据

2024年第三季度基金管理机构非货币理财公募基金月均规模排名

排名	公募基金管理人名称	非货币理财公募基金月均规模(亿元)	排名	公募基金管理人名称	非货币理财公募基金月均规模(亿元)
1	易方达基金管理有限公司	12307	11	鹏华基金管理有限公司	4225
2	华夏基金管理有限公司	10557	12	景顺长城基金管理有限公司	3881
3	广发基金管理有限公司	7887	13	工银瑞信基金管理有限公司	3695
4	嘉实基金管理有限公司	6598	14	国泰基金管理有限公司	3437
5	富国基金管理有限公司	6105	15	天弘基金管理有限公司	3431
6	南方基金管理股份有限公司	5945	16	华安基金管理有限公司	3419
7	博时基金管理有限公司	5677	17	永赢基金管理有限公司	3202
8	招商基金管理有限公司	5504	18	中银基金管理有限公司	3135
9	华泰柏瑞基金管理有限公司	4878	19	中欧基金管理有限公司	2869
10	汇添富基金管理股份有限公司	4796	20	兴证全球基金管理有限公司	2648

剔除了短期理财债券基金规模和基金中基金持有的自身管理的基

2025-01-01 10:00

个人养老金基金投资养老金指数基金公募基金投资理财招商银行

2024年12月13日星期五

Open WebUI

下载镜像

Open WebUI

docker pull ghcr.io/open-webui/open-webui:main

运行

Docker Compose (Ollama)

编写配置文件：docker-compose.yml

version: '3'
services:
  openwebui:
    image: ghcr.io/open-webui/open-webui:main
    extra_hosts:
      - host.docker.internal:host-gateway    
    ports:
      - "3000:8080"
    volumes:
      - open-webui:/app/backend/data
volumes:
  open-webui:

docker compose up

Docker (OpenAI API)

2024-12-13 10:00

open-webui docker ollama openai-api chatgpt self-hosting

vscode-extension-samples/chat-sample 源码分析

运行 Chat Sample

克隆仓库

git clone https://github.com/microsoft/vscode-extension-samples

安装依赖

cd vscode-extension-samples/chat-sample
npm install

调试

在 Debug View 中运行 Run Extension。

Chat Sample 源码分析

扩展入口

文件：src/extension.ts

export function activate(context: vscode.ExtensionContext) {
    registerSimpleParticipant(context);
    registerToolUserChatParticipant(context);
    registerChatLibChatParticipant(context);

    registerChatTools(context);
}

注册参与者

文件：src/simple.ts

export function registerSimpleParticipant(context: vscode.ExtensionContext) {

}

2024-12-13 10:00

vscode vscode-extension-samples chat-extension github-copilot typescript extension-development source-code-analysis

2024年12月12日星期四

Language Model API

The Language Model API enables you to use the Language Model and integrate AI-powered features and natural language processing in your Visual Studio Code extension.

语言模型 API 可以让您使用语言模型，并在您的 Visual Studio Code 扩展中集成 AI 功能和自然语言处理。

You can use the Language Model API in different types of extensions. A typical use for this API is in chat extensions, where you use a language model to interpret the user's request and help provide an answer. However, the use of the Language Model API is not limited to this scenario.

2024-12-12 10:00

vscode language-model-api chat-extension github-copilot prompt-engineering gpt-4o prompt-tsx extension-development

2024年12月10日星期二

Chat Extensions (VS Code)

Chat extensions

聊天用户体验的组成部分

下面的截图显示了示例扩展中 Visual Studio Code 聊天体验中的不同聊天概念。

使用 @ 语法调用 @cat 聊天参与者
使用 / 语法调用 /teach 命令
用户提供的查询，也称为用户提示
图标和参与者的 fullName，表示 Copilot 正在使用 @cat 聊天参与者
由 @cat 提供的 Markdown 响应
包含在 Markdown 响应中的代码片段
包含在 @cat 响应中的按钮，按钮调用 VS Code 命令
聊天参与者提供的建议后续问题
聊天输入字段，其中的占位文本由聊天参与者的 description 属性提供

开发聊天扩展（chat extension）

聊天扩展是一种扩展，它向 Chat 视图提供了一个聊天参与者。

实现聊天扩展所需的最小功能是：

注册聊天参与者，让用户可以在 VS Code Chat 视图中使用 @ 符号调用它。
定义一个请求处理程序，解释用户的问题，并在 Chat 视图中返回响应。

您可以使用以下可选功能进一步扩展聊天扩展的功能：

注册聊天命令，为用户提供常见问题的简写符号
定义建议的后续问题，帮助用户继续对话

作为开发聊天扩展的起点，您可以参考我们的 chat extension sample。此示例实现了一个简单的猫导师，可以使用猫隐喻解释计算机科学主题。

2024-12-10 10:00

vscode chat-extension github-copilot chat-participant language-model-api slash-command extension-development disambiguation

2025年2月7日 星期五

2025年2月6日 星期四

2025年2月4日 星期二

2025年2月2日 星期日

2025年2月1日 星期六

2025年1月31日 星期五

2025年1月27日 星期一

2025年1月26日 星期日

2025年1月25日 星期六

2025年1月24日 星期五

2025年1月23日 星期四

2025年1月21日 星期二

2025年1月17日 星期五

2025年1月14日 星期二

2025年1月1日 星期三

2024年12月13日 星期五

2024年12月12日 星期四

2024年12月10日 星期二

2025年2月7日星期五

2025年2月6日星期四

2025年2月4日星期二

2025年2月2日星期日

2025年2月1日星期六

2025年1月31日星期五

2025年1月27日星期一

2025年1月26日星期日

2025年1月25日星期六

2025年1月24日星期五

2025年1月23日星期四

2025年1月21日星期二

2025年1月17日星期五

2025年1月14日星期二

2025年1月1日星期三

2024年12月13日星期五

2024年12月12日星期四

2024年12月10日星期二