R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning Paper • 2505.02835 • Published May 5, 2025 • 28