kunhsiang
/

RC-Competition-DeepSeek-R1-Distill-Llama-70B-bnb-4bit-SFTOnly-ChatML-CoT-v2-20251211_0619

🎯 RC-Competition-DeepSeek-R1-Distill-Llama-70B-bnb-4bit-SFTOnly-ChatML-CoT-v2-20251211_0619

📋 模型說明

此模型採用 ChatML 格式 進行訓練，並使用 DataCollatorForCompletionOnlyLM 實現精準的 Loss Masking。

🆕 v2.2 改進

✅ 使用字串模式 Response Template 避免 token 邊界問題
✅ User 部分完全不計算 loss
✅ 只學習 Assistant 部分（推理過程 + 答案）

🎭 訓練格式

<|im_start|>user
[文章內容]                    ← ❌ 不計算 loss
<|im_end|>
<|im_start|>assistant
<think>
[推理過程]                    ← ✅ 計算 loss
</think>
答案: X                       ← ✅ 計算 loss
<|im_end|>

📊 訓練參數

參數	值
Base Model	`unsloth/DeepSeek-R1-Distill-Llama-70B-bnb-4bit`
LoRA Rank	256
LoRA Alpha	256
Learning Rate	2e-05
Effective Batch Size	32
Epochs	1
Max Sequence Length	4096

💻 使用方法

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "kunhsiang/RC-Competition-DeepSeek-R1-Distill-Llama-70B-bnb-4bit-SFTOnly-ChatML-CoT-v2-20251211_0619",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "[文章內容]\n問題: ...\n選項: 1. ... 2. ... 3. ... 4. ..."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

📅 版本資訊

訓練日期: 20251211_0619
訓練框架: Unsloth + TRL

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support