VibeVoice Collection Frontier Text-to-Speech Models https://microsoft.github.io/VibeVoice/ • 8 items • Updated 24 days ago • 180
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 28
Ovis2 Collection Our latest advancement in multi-modal large language models (MLLMs) • 15 items • Updated Mar 25 • 65
Distilling LLM Agent into Small Models with Retrieval and Code Tools Paper • 2505.17612 • Published May 23 • 81
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers Paper • 2505.04842 • Published May 7 • 12
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving Paper • 2505.04528 • Published May 7 • 12
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published May 6 • 92
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published Apr 21 • 78
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 92
Optimizing LLMs for Italian: Reducing Token Fertility and Enhancing Efficiency Through Vocabulary Adaptation Paper • 2504.17025 • Published Apr 23 • 17
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Paper • 2504.16656 • Published Apr 23 • 57
CipherBank: Exploring the Boundary of LLM Reasoning Capabilities through Cryptography Challenges Paper • 2504.19093 • Published Apr 27 • 18