1 39 128

Peng Wang

stillarrow

https://peter-peng-w.github.io/

AI & ML interests

None yet

Recent Activity

upvoted a paper about 11 hours ago

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

liked a Space 1 day ago

huggingface/ai-deadlines

liked a dataset 4 days ago

databricks/databricks-dolly-15k

View all activity

Organizations

None yet

upvoted a paper about 11 hours ago

PVPO: Pre-Estimated Value-Based Policy Optimization for Agentic Reasoning

Paper • 2508.21104 • Published Aug 28, 2025 • 36

liked a Space 1 day ago

AI Deadlines

⚡

629

Discover and manage important project deadlines and milestones

liked 4 datasets 4 days ago

upvoted a collection 5 days ago

🧠 Reasoning datasets

Collection

Datasets with reasoning traces for math and code released by the community • 24 items • Updated May 19, 2025 • 181

liked a dataset 14 days ago

m-a-p/SuperGPQA

Viewer • Updated Apr 30, 2025 • 26.5k • 4.95k • 80

liked a dataset 19 days ago

LLM360/guru-RL-92k

Viewer • Updated Aug 20, 2025 • 91.9k • 1.04k • 42

upvoted an article about 1 month ago

Article

From GRPO to DAPO and GSPO: What, Why, and How

Aug 9, 2025

•

upvoted an article about 2 months ago

Article

Illustrating Reinforcement Learning from Human Feedback (RLHF)

Dec 9, 2022

•

392

liked a dataset about 2 months ago

zwhe99/DeepMath-103K

Viewer • Updated May 29, 2025 • 103k • 5.26k • 291

liked a model about 2 months ago

deepseek-ai/DeepSeek-Math-V2

Text Generation • 685B • Updated Nov 27, 2025 • 2.7k • 677

upvoted a paper 2 months ago

Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B

Paper • 2511.06221 • Published Nov 9, 2025 • 132

liked 2 models 2 months ago

WeiboAI/VibeThinker-1.5B

Text Generation • 2B • Updated Nov 24, 2025 • 2.11k • 511

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated Nov 21, 2025 • 991 • 235

liked a dataset 3 months ago

open-r1/DAPO-Math-17k-Processed

Viewer • Updated Nov 10, 2025 • 34.8k • 5.21k • 54

upvoted 2 papers 3 months ago

ExGRPO: Learning to Reason from Experience

Paper • 2510.02245 • Published Oct 2, 2025 • 80

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

Paper • 2509.25760 • Published Sep 30, 2025 • 55

liked a model 3 months ago

microsoft/UserLM-8b

Text Generation • 8B • Updated Oct 9, 2025 • 1.31k • 362