3 篇文章带有标签 “deep-research”

2026年4月6日星期一

Harness Engineering 实战：利用 Autoresearch 实现 AI 模型的自我进化

自主研究时代的序幕与 Harness Engineering 的兴起

人工智能研究的传统模式长期依赖于人类研究员的直觉、假设验证以及繁琐的代码迭代过程。这种模式在处理海量参数空间和复杂架构组合时表现出明显的效率瓶颈，尤其是在前沿人工智能研究领域。

Andrej Karpathy 发起的 autoresearch 项目代表了从命令式编程向指令式编排的根本性转折。该项目不仅是一个技术工具，更是一种关于人类与人工智能在科研领域协作关系的深刻重塑。其核心理念在于将 AI 智能体置于研究流程的中心，使其能够独立完成从假设生成、代码修改、模型训练到结果评估的完整闭环，而无需人类在过程中间进行干预。

这一转变标志着 Harness Engineering 时代的到来。在这一范式下，研究人员的角色发生了质变，不再是直接编写解决具体问题的 Python 代码，而是编写用于指导 AI 智能体的自然语言指令集，即 program.md 文件。这种模式通过将复杂的机器学习实验简化为一种可自动执行的、具备“棘轮效应”的改进循环，实现了科研效率的指数级提升。

项目背景设定在一个虚构但具有高度前瞻性的未来：尖端 AI 研究已不再由人类在会议中通过同步信息来推进，而是由在超大规模算力集群上运行的自主智能体集群独立完成。

2026-04-06 16:00

2025年2月7日星期五

Open-source DeepResearch – Freeing our search agents

Open-source DeepResearch

TLDR

Yesterday, OpenAI released Deep Research, a system that browses the web to summarize content and answer questions based on the summary. The system is impressive and blew our minds when we tried it for the first time.

昨天，OpenAI 发布了 Deep Research，这是一个浏览网页以总结内容并根据总结回答问题的系统。当我们第一次尝试时，这个系统给我们留下了深刻的印象。

One of the main results in the blog post is a strong improvement of performances on the General AI Assistants benchmark (GAIA), a benchmark we’ve been playing with recently as well, where they successfully reached near 67% correct answers on 1-shot on average, and 47.

2025-02-07 10:00

deep-research hugging-face agent smolagents gaia code-agent open-source web-search

2025年2月6日星期四

Introducing deep research

Deep research

An agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks for you.

一个代理，使用推理来综合大量在线信息，并为您完成多步研究任务。

Today we’re launching deep research in ChatGPT, a new agentic capability that conducts multi-step research on the internet for complex tasks. It accomplishes in tens of minutes what would take a human many hours.

今天我们在 ChatGPT 中推出了 deep research，这是一种新的代理能力，可以在互联网上进行复杂任务的多步研究。它可以在几十分钟内完成人类需要花费数小时才能完成的任务。

2025-02-06 10:00

deep-research openai agent reasoning web-browsing o3 chatgpt benchmark