Understand, Solve and Translate: Bridging the Multilingual Mathematical Reasoning Gap
Paper
•
2501.02448
•
Published
block_size = 20000gradient_checkpointing=True
Generated and evaluated with my own code; accuracy may differ.
| Model | GSM8K | KSM | MATH | OMNI_MATH |
|---|---|---|---|---|
| Qwen2.5-32B-s1.1-Ko-Native | 89.92 | 39.85 | 87.73 | 42.06 |
| *GPT-4o | 91.21 | 22.83 | 74.45 | 30.75 |
| *GPT-4o-mini | 87.57 | 19.40 | 70.68 | 26.45 |
| EXAONE-3.5-7.8B-Stratos-Ko | 83.02 | 15.97 | 67.49 | 24.62 |
| Qwen2.5-7B-s1.1-Ko-Native | 76.27 | 15.48 | 66.45 | 23.57 |
| EXAONE-3.5-7.8B-Instruct | 81.58 | 14.71 | 63.50 | 21.69 |
| *Qwen2.5-14B-Instruct | 66.34 | 15.55 | 53.38 | 20.64 |
| *Llama-3.1-8B-Instruct | 77.79 | 7.21 | 49.01 | 15.92 |
| *Qwen2.5-7B-Instruct | 58.38 | 13.10 | 48.04 | 16.55 |
| *EXAONE-3.0-7.8B-Instruct | 72.33 | 7.98 | 46.79 | 15.35 |
| *Ko-R1-1.5B-preview | 43.3 | ? | 73.1 | 29.8 |
* Reported by HRM8K authors
temperature = 0.7top_p = 0.95max_tokens = 8192</think> tokens, add </think> tokens and generate 512 additional tokensWhy Qwen? Why EXAONE can't?