swe-bench - 标签 - 军舰的日志

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

ABSTRACT（摘要）

Language models have outpaced our ability to evaluate them effectively, but for their future development it is essential to study the frontier of their capabilities. We find real-world software engineering to be a rich, sustainable, and challenging testbed for evaluating the next generation of language models. To this end, we introduce SWE-bench, an evaluation framework consisting of 2,294 software engineering problems drawn from real GitHub issues and corresponding pull requests across 12 popular Python repositories.

2025-02-04 10:00

1 篇文章带有标签 “swe-bench”

2025年2月4日星期二

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

1 篇文章带有标签 “swe-bench”

2025年2月4日 星期二

SWE-bench: Can Language Models Resolve Real-World GitHub Issues?

2025年2月4日星期二