Kimi-Dev: Agentless Training as Skill Prior for SWE-Agents Paper • 2509.23045 • Published Sep 27 • 3
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 52
Evaluating and Aligning CodeLLMs on Human Preference Paper • 2412.05210 • Published Dec 6, 2024 • 50
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models Paper • 2410.07985 • Published Oct 10, 2024 • 32
Towards a Unified View of Preference Learning for Large Language Models: A Survey Paper • 2409.02795 • Published Sep 4, 2024 • 72