🎯 RC-Competition-DeepSeek-R1-Distill-Llama-70B-bnb-4bit-SFTOnly-ChatML-CoT-v2-20251211_0619

📋 模型說明

此模型採用 ChatML 格式 進行訓練,並使用 DataCollatorForCompletionOnlyLM 實現精準的 Loss Masking。

🆕 v2.2 改進

  • ✅ 使用字串模式 Response Template 避免 token 邊界問題
  • ✅ User 部分完全不計算 loss
  • ✅ 只學習 Assistant 部分(推理過程 + 答案)

🎭 訓練格式

<|im_start|>user
[文章內容]                    ← ❌ 不計算 loss
<|im_end|>
<|im_start|>assistant
<think>
[推理過程]                    ← ✅ 計算 loss
</think>
答案: X                       ← ✅ 計算 loss
<|im_end|>

📊 訓練參數

參數
Base Model unsloth/DeepSeek-R1-Distill-Llama-70B-bnb-4bit
LoRA Rank 256
LoRA Alpha 256
Learning Rate 2e-05
Effective Batch Size 32
Epochs 1
Max Sequence Length 4096

💻 使用方法

from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    "kunhsiang/RC-Competition-DeepSeek-R1-Distill-Llama-70B-bnb-4bit-SFTOnly-ChatML-CoT-v2-20251211_0619",
    max_seq_length=4096,
    load_in_4bit=True,
)
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "[文章內容]\n問題: ...\n選項: 1. ... 2. ... 3. ... 4. ..."}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=1024, temperature=0.7, top_p=0.9)
print(tokenizer.decode(outputs[0], skip_special_tokens=False))

📅 版本資訊

  • 訓練日期: 20251211_0619
  • 訓練框架: Unsloth + TRL
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support