DAGGER-4B-SFT

arXiv GitHub

Model Description

DAGGER-4B-SFT is a supervised fine-tuned 4B model for computational graph generation. This model serves as initialization for GRPO training and as a lightweight baseline.

Model Overview

Attribute Value
Base Model Gemma-3-4B-Instruct
Training Supervised Fine-Tuning
Parameters 4B
LoRA Rank 64

Performance

Dataset Original +Distractor Drop
MGSM 40.4 25.1 15.3
MSVAMP 65.0 42.4 22.7
Weighted Avg - - 44.3

Improvement from GRPO

Model Weighted Avg
dagger-4B_SFT 44.3
dagger-4B_SFT_GRPO 47.3 (+3.0)

Quickstart

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "dipta007/dagger-4B_SFT"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

question = "মিনার কাছে ১০০টি কলম আছে। প্রতিটি কলমের দাম ৫ টাকা।"

messages = [
    {"role": "system", "content": "You are an expert Bangla Math Reasoner. Solve by constructing a Computational Graph."},
    {"role": "user", "content": question}
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)

outputs = model.generate(**inputs, max_new_tokens=1024)
response = tokenizer.decode(outputs[0][len(inputs.input_ids[0]):], skip_special_tokens=True)
print(response)

Training Configuration

Parameter Value
LoRA Rank / Alpha 64 / 128
Global Batch Size 256
Epochs 4
Learning Rate 1e-5 → 1e-6
Precision BF16

When to Use This Model

  • GRPO initialization: Starting point for policy optimization
  • Lightweight baseline: When 12B models are too large
  • Ablation studies: Comparing SFT vs. GRPO contributions

Related Models

Model Training Weighted Avg
dagger-4B_SFT SFT 44.3
dagger-4B_SFT_GRPO SFT → GRPO 47.3
dagger-4B_GRPO Base → GRPO 29.3

Citation

will be updated
Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dipta007/dagger-4B_SFT

Finetuned
(525)
this model

Datasets used to train dipta007/dagger-4B_SFT

Collection including dipta007/dagger-4B_SFT