video-understanding - 标签

Qwen2.5-VL Technical Report

Qwen2.5-VL Technical Report
Qwen2.5-VL - GitHub

Abstract（摘要）

We introduce Qwen2.5-VL, the latest flagship model of Qwen vision-language series, which demonstrates significant advancements in both foundational capabilities and innovative functionalities. Qwen2.5-VL achieves a major leap forward in understanding and interacting with the world through enhanced visual recognition, precise object localization, robust document parsing, and long-video comprehension. A standout feature of Qwen2.5-VL is its ability to localize objects using bounding boxes or points accurately.

2025-02-23 10:00

1 篇文章带有标签 “video-understanding”

2025年2月23日星期日

Qwen2.5-VL Technical Report

1 篇文章带有标签 “video-understanding”

2025年2月23日 星期日

Qwen2.5-VL Technical Report

2025年2月23日星期日