CRAG-MM: Multi-modal Multi-turn Comprehensive RAG Benchmark Paper • 2510.26160 • Published Oct 30, 2025 • 15
When Benchmarks are Targets: Revealing the Sensitivity of Large Language Model Leaderboards Paper • 2402.01781 • Published Feb 1, 2024 • 4