3 9 4

charliezhang

Clockz

AI & ML interests

None yet

Recent Activity

upvoted a paper 4 days ago

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

upvoted a paper 10 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

liked a model 14 days ago

allenai/Olmo-3.1-7B-RL-Zero-Math

View all activity

Organizations

upvoted a paper 4 days ago

Bottom-up Policy Optimization: Your Language Model Policy Secretly Contains Internal Policies

Paper • 2512.19673 • Published 5 days ago • 59

upvoted a paper 10 days ago

Stabilizing Reinforcement Learning with LLMs: Formulation and Practices

Paper • 2512.01374 • Published 27 days ago • 93

liked a model 14 days ago

allenai/Olmo-3.1-7B-RL-Zero-Math

Text Generation • 528k • Updated 16 days ago • 166 • 10

New activity in Interplay-LM-Reasoning/extrapolation_midtrain 14 days ago

Add pipeline tag, GitHub link, and improved model description

#1 opened 15 days ago by

nielsr

New activity in Interplay-LM-Reasoning/extrapolation_rl 14 days ago

Improve model card: Add pipeline tag and GitHub link

#1 opened 15 days ago by

nielsr

updated 2 models 18 days ago

Interplay-LM-Reasoning/extrapolation_rl

Text Generation • Updated 14 days ago

Interplay-LM-Reasoning/extrapolation_midtrain

Text Generation • Updated 14 days ago

updated a dataset 18 days ago

Interplay-LM-Reasoning/context

Updated 18 days ago • 9

published 2 datasets 18 days ago

Interplay-LM-Reasoning/context

Updated 18 days ago • 9

Interplay-LM-Reasoning/extrapolation

Updated 18 days ago • 6

published 3 models 18 days ago

authored a paper 19 days ago

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 19 days ago • 36

upvoted a paper 19 days ago

On the Interplay of Pre-Training, Mid-Training, and RL on Reasoning Language Models

Paper • 2512.07783 • Published 19 days ago • 36

upvoted a paper 23 days ago

DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle

Paper • 2512.04324 • Published 24 days ago • 149

updated a model about 1 month ago

goodevening/composition-10B-op-cpt-rl_fixed

Updated Nov 21

published a model about 1 month ago

goodevening/composition-10B-op-cpt-rl_fixed

Updated Nov 21

upvoted a paper about 2 months ago

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Paper • 2510.25726 • Published Oct 29 • 45

upvoted a paper 2 months ago

Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences

Paper • 2510.23451 • Published Oct 27 • 26