Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
WildEval
non-profit
wild_eval
WildEval
Activity Feed
Request to join this org
Follow
15
AI & ML interests
None defined yet.
Recent Activity
ChengsongHuang
Â
authored
a paper
about 10 hours ago
Benchmark^2: Systematic Evaluation of LLM Benchmarks
ChengsongHuang
Â
submitted
a paper
about 14 hours ago
Benchmark^2: Systematic Evaluation of LLM Benchmarks
ChengsongHuang
Â
authored
a paper
17 days ago
Guided Self-Evolving LLMs with Minimal Human Supervision
View all activity
Team members
9
WildEval
's Spaces
1
Sort:Â Recently updated
pinned
Runtime error
6
Zebra Logic Bench
🦓
Explore and evaluate Zebra Logic models