CUA Eval - 标签 - 军舰的日志

CUA 评估额外信息

Computer-Using Agent
Operator

This document includes extra information to how we evaluated our Computer Using Agent, including (browser/VM) environments, prompts, sampling parameters, and scoring procedures. For more details, read https://openai.com/index/computer-using-agent/.

本文档包括我们如何评估我们的计算机使用代理的额外信息，包括（浏览器/VM）环境，提示，采样参数和评分程序。有关更多详细信息，请阅读 https://openai.com/index/computer-using-agent/ 。

For WebArena and WebVoyager, we run the evals in operator browser instead of playwright browsers since our model relies on the visual action space for navigation (search bar, backward/forward button).

2025年1月26日 4 分钟 902 字

1 篇文章带有标签 “CUA Eval”

2025年1月26日星期日

CUA 评估额外信息

1 篇文章带有标签 “CUA Eval”

2025年1月26日 星期日

CUA 评估额外信息

2025年1月26日星期日