WildEval

non-profit

wild_eval

WildEval

Activity Feed Request to join this org

AI & ML interests

None defined yet.

Recent Activity

ChengsongHuang authored a paper about 10 hours ago

Benchmark^2: Systematic Evaluation of LLM Benchmarks

ChengsongHuang submitted a paper about 14 hours ago

Benchmark^2: Systematic Evaluation of LLM Benchmarks

ChengsongHuang authored a paper 17 days ago

Guided Self-Evolving LLMs with Minimal Human Supervision

View all activity

WildEval 's Spaces 1

Zebra Logic Bench

Explore and evaluate Zebra Logic models