SciEvalKit: An Open-source Evaluation Toolkit for Scientific General Intelligence
Paper
•
2512.22334
•
Published
•
28
None defined yet.
CompassVerifier: A Unified and Robust Verifier for LLMs Evaluation and Outcome Reward
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM