2 篇文章带有标签 “webvoyager”

CUA 评估额外信息

CUA eval extra information

This document includes extra information to how we evaluated our Computer Using Agent, including (browser/VM) environments, prompts, sampling parameters, and scoring procedures. For more details, read https://openai.com/index/computer-using-agent/.

本文档包括我们如何评估我们的计算机使用代理的额外信息,包括(浏览器/VM)环境,提示,采样参数和评分程序。有关更多详细信息,请阅读 https://openai.com/index/computer-using-agent/

1 Environment(环境)

  • For WebArena and WebVoyager, we run the evals in operator browser instead of playwright browsers since our model relies on the visual action space for navigation (search bar, backward/forward button). Our model does not have access to tool calls that control the navigation.
  • 对于WebArena和WebVoyager,我们在 operator browser 中运行评估,而不是在 playwright 浏览器中运行,因为我们的模型依赖于用于导航的视觉动作空间(搜索栏,后退/前进按钮)。我们的模型无法访问控制导航的工具调用。
  • For OSWorld, we use the VMWare Ubuntu VM distributed by the authors. Our environment has the dock on the right side of the screen instead of the left side, which we have found to improve the performance slightly.
  • 对于 OSWorld,我们使用作者分发的 VMWare Ubuntu VM。我们的环境将 dock 放在屏幕的右侧,而不是左侧,我们发现这样可以稍微提高性能。

Computer-Using Agent

Computer-Using Agent (CUA)

A universal interface for AI to interact with the digital world. AI 与数字世界交互的通用接口。

Today we introduced a research preview of Operator⁠, an agent that can go to the web to perform tasks for you. Powering Operator is Computer-Using Agent (CUA), a model that combines GPT-4o's vision capabilities with advanced reasoning through reinforcement learning. CUA is trained to interact with graphical user interfaces (GUIs)—the buttons, menus, and text fields people see on a screen—just as humans do.