File size: 6,300 Bytes
d51d317 f220cfc d51d317 f220cfc |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 |
---
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
datasets:
- VanishD/DualDistill
language:
- en
license: mit
pipeline_tag: text-generation
library_name: transformers
---
# Agentic-R1: Distilled Dual-Strategy Reasoning
This repository hosts the **Agentic-R1** model, an implementation of the paper [**Agentic-R1: Distilled Dual-Strategy Reasoning**](https://huggingface.co/papers/2507.05707).
**Code**: https://github.com/StigLidu/DualDistill
## Abstract
Current long chain-of-thought (long-CoT) models excel at mathematical reasoning but rely on slow and error-prone natural language traces. Tool-augmented agents address arithmetic via code execution, but often falter on complex logical tasks. We introduce a fine-tuning framework, DualDistill, that distills complementary reasoning strategies from multiple teachers into a unified student model. Using this approach, we train Agentic-R1, which dynamically selects the optimal strategy for each query, invoking tools for arithmetic and algorithmic problems, and using text-based reasoning for abstract ones. Our method improves accuracy across a range of tasks, including both computation-intensive and standard benchmarks, demonstrating the effectiveness of multi-strategy distillation in achieving robust and efficient reasoning.
## Key Features
- **Efficient Training**: Integrates tool use into long-chain-of-thought (CoT) reasoning using only 4 × A6000 GPUs
- **Unified Reasoning**: Fuses heterogeneous reasoning traces from multiple teacher models into a single student model
<div align="center">
<img src="https://github.com/StigLidu/DualDistill/raw/main/fig/overview.png" alt="Overview of DualDistill" width="500">
<p><em>Overview of DualDistill methodology</em></p>
</div>
## Datasets
| Dataset | Description | Link |
| :------------ | :-------------------------------------------- | :--------------------------------------------------- |
| **Training Set** | Complete training dataset with teacher trajectories | [🤗 HuggingFace](https://huggingface.co/datasets/VanishD/DualDistill) |
| **Test Set** | Evaluation benchmarks | `dataset/test/` |
## Results
<div align="center">
<img src="https://github.com/StigLidu/DualDistill/raw/main/fig/result.png" alt="Performance comparison of Agentic-R1 models" width="700">
</div>
- **Agentic-R1** demonstrates significant performance gains on **DeepMath-L** and **Combinatorics300**, where both complex reasoning and tool use are crucial for success.
- **Agentic-R1-SD** (Self-Distilled) further enhances performance through our self-distillation approach, consistently outperforming baseline models across nearly all evaluation tasks.
## Quick Start
### Installation
1. **Clone the repository**:
```bash
git clone https://github.com/StigLidu/DualDistill.git
cd DualDistill
```
2. **Create environment** (optional but recommended):
```bash
conda create -n dualdistill python=3.11
conda activate dualdistill
```
3. **Install dependencies**:
```bash
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
```
### Sample Usage
Here's how to perform inference with the `Agentic-R1` model using the Hugging Face `transformers` library:
```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "VanishD/Agentic-R1" # Or "VanishD/Agentic-R1-SD" for the self-distilled version
tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16, # Use bfloat16 for better performance and memory if supported
device_map="auto",
trust_remote_code=True
).eval() # Set model to evaluation mode
# Prepare a simple user message
messages = [{"role": "user", "content": "What is 123 + 456?"}]
# Apply the chat template to format the prompt correctly for the model
# The `add_generation_prompt=True` adds the Assistant token to prompt the model for its response.
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Encode the prompt
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(model.device)
# Generate response
output_ids = model.generate(
input_ids,
max_new_tokens=256,
do_sample=True,
temperature=0.7,
top_p=0.95,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id, # Often EOS token is used as PAD token for LLMs
)
# Decode and print the generated text, excluding the input prompt
generated_text = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=True).strip()
print(f"Generated Text:
{generated_text}")
```
## ⚠️ Important Notes
- **Code Execution Safety**: The evaluation scripts execute model-generated code locally. Only use trusted models before execution.
- **Inference Config**: If you are using vLLM (a recent version) and encounter an error regarding the maximum context length. You may need to modify the `model_max_length` in `tokenizer_config.json`.
- **Self-Distillation Warning**: The self-distillation step requires sampling many trajectories and can be time-consuming.
## License
This project is licensed under the MIT License - see the [LICENSE](https://github.com/StigLidu/DualDistill/blob/main/LICENSE) file for details.
## Acknowledgments
We thank the following open-source projects for their foundational contributions:
- [OpenHands](https://github.com/All-Hands-AI/OpenHands) - Agent framework
- [DeepMath-103K](https://huggingface.co/datasets/zwhe99/DeepMath-103K) - Mathematical reasoning dataset
- [vLLM](https://github.com/vllm-project/vllm) - High-performance inference engine
## Contact
For questions or support, please contact:
- **Weihua Du**: [[email protected]](mailto:[email protected])
## Citation
If you find our work useful, please consider citing:
```bibtex
@article{du2025agentic,
title={Agentic-R1: Distilled Dual-Strategy Reasoning},
author={Du, Weihua and Aggarwal, Pranjal and Welleck, Sean and Yang, Yiming},
journal={arXiv preprint arXiv:2507.05707},
year={2025}
}
``` |