1 篇文章带有标签 “GUI”

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

UI-TARS: Pioneering Automated GUI Interaction with Native Agents(与本地代理进行自动化 GUI 交互的先驱)

This paper introduces UI-TARS, a native GUI agent model that solely perceives the screenshots as input and performs human-like interactions (e.g., keyboard and mouse operations). Unlike prevailing agent frameworks that depend on heavily wrapped commercial models (e.g., GPT-4o) with expert-crafted prompts and workflows, UI-TARS is an end-to-end model that outperforms these sophisticated frameworks.