CUA 评估额外信息
This document includes extra information to how we evaluated our Computer Using Agent, including (browser/VM) environments, prompts, sampling parameters, and scoring procedures. For more details, read https://openai.com/index/computer-using-agent/.
本文档包括我们如何评估我们的计算机使用代理的额外信息,包括(浏览器/VM)环境,提示,采样参数和评分程序。有关更多详细信息,请阅读 https://openai.com/index/computer-using-agent/ 。
For WebArena and WebVoyager, we run the evals in operator browser instead of playwright browsers since our model relies on the visual action space for navigation (search bar, backward/forward button).