AT^2PO: Agentic Turn-based Policy Optimization via Tree Search Paper • 2601.04767 • Published 20 days ago • 28
Evaluating Parameter Efficient Methods for RLVR Paper • 2512.23165 • Published about 1 month ago • 26
Refusal Falls off a Cliff: How Safety Alignment Fails in Reasoning? Paper • 2510.06036 • Published Oct 7, 2025 • 7