Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published Dec 1, 2025 • 95
Imperceptible Jailbreaking against Large Language Models Paper • 2510.05025 • Published Oct 6, 2025 • 33
Imperceptible Jailbreaking against Large Language Models Paper • 2510.05025 • Published Oct 6, 2025 • 33 • 2
MCPMark: A Benchmark for Stress-Testing Realistic and Comprehensive MCP Use Paper • 2509.24002 • Published Sep 28, 2025 • 174
Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment Paper • 2505.21494 • Published May 27, 2025 • 8
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification Paper • 2502.17421 • Published Feb 24, 2025
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 76
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70
Language Models Can Learn from Verbal Feedback Without Scalar Rewards Paper • 2509.22638 • Published Sep 26, 2025 • 70 • 3
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2, 2025 • 83
VerlTool: Towards Holistic Agentic Reinforcement Learning with Tool Use Paper • 2509.01055 • Published Sep 1, 2025 • 76