🧠 Qwen2.5-3B-Instruct ReTrace-OpenO1 Merged

A reasoning-focused model trained on 5,000 chain-of-thought examples

🚀 Try Demo • 📊 Dataset ReTrace • 📊 Dataset OpenO1

📋 Model Description

This is a fully merged model of Qwen2.5-3B-Instruct fine-tuned with LoRA on 5,000 reasoning samples (500 ReTrace + 4,500 OpenO1-SFT). The model generates structured reasoning with explicit <Thought> and <Output> tags, demonstrating enhanced step-by-step problem-solving capabilities.

🎯 Key Features

✅ Fully Merged: Ready-to-use model (no adapter loading needed)
✅ Structured Reasoning: Outputs thinking in <Thought> tags, final answer in <Output> tags
✅ 5K Training Samples: 500 ReTrace + 4,500 OpenO1-SFT examples
✅ Multi-Domain: Math, logic, word problems, and general reasoning
✅ Production Ready: FP16, 6GB model size

📊 Training Loss

📈 Training Statistics

Metric	Value
Initial Loss	1.3374
Final Loss	0.6798
Best Loss	0.6662 (Step 240)
Improvement	49.2% ↓
Total Steps	310

⚙️ Training Configuration

# Model
BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
MAX_SEQ_LENGTH = 4096

# LoRA
LORA_R = 64
LORA_ALPHA = 128
LORA_DROPOUT = 0.05

# Training
BATCH_SIZE = 8
GRADIENT_ACCUMULATION = 4
LEARNING_RATE = 2e-4
NUM_EPOCHS = 2
WARMUP_STEPS = 50

# Datasets
- 500 samples from ReTrace501-v1
- 4,500 samples from OpenO1-SFT

🚀 Usage

Installation

pip install torch transformers accelerate

Quick Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# =========================
# Load model and tokenizer
# =========================
model_name = "nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# =========================
# LLM question function
# =========================
def ask_llm(question: str):
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful AI assistant. When solving problems, show your detailed reasoning process inside <Thought> tags, then provide your final answer inside <Output> tags and explain the final answer from reasoning in short. Break down complex problems step-by-step."
            )
        },
        {
            "role": "user",
            "content": question
        }
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9
    )

    # Correct slicing
    prompt_len = inputs["input_ids"].shape[1]
    response = tokenizer.decode(
        outputs[0][prompt_len:],
        skip_special_tokens=True
    )

    return response


# =========================
# Change ONLY this block 
# =========================
question = """
A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
"""

print(ask_llm(question))

Expected Output

Question

A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.

<Thought>
Let's define the events:
- \( D \): The event that the item is defective.
- \( D^c \): The event that the item is not defective.
- \( T \): The event that the test reports the item as defective.

Given probabilities:
- \( P(D) = 0.04 \) (4% defective)
- \( P(T|D) = 0.95 \) (Test correctly identifies defective items)
- \( P(T|D^c) = 0.03 \) (Test incorrectly labels good items as defective)

We need to find \( P(D|T) \), the probability that the item is defective given that the test reports it as defective.

Using Bayes' theorem:
\[
P(D|T) = \frac{P(T|D)P(D)}{P(T)}
\]

First, we need to find \( P(T) \), the total probability that the test reports a defective item. This can be found using the law of total probability:
\[
P(T) = P(T|D)P(D) + P(T|D^c)P(D^c)
\]

Calculate each term:
\[
P(D^c) = 1 - P(D) = 1 - 0.04 = 0.96
\]
\[
P(T|D^c) = 0.03
\]
\[
P(T) = (0.95)(0.04) + (0.03)(0.96) = 0.038 + 0.0288 = 0.0668
\]

Now, substitute back into Bayes' theorem:
\[
P(D|T) = \frac{(0.95)(0.04)}{0.0668} = \frac{0.038}{0.0668} \approx 0.572
\]

So, the probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.

</Thought>
<Output>
The probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
</Output>

📚 Training Datasets

ReTrace501-v1 (500 samples)

High-quality chain-of-thought reasoning examples focusing on mathematical problem-solving with explicit reasoning steps.

Source: nnsohamnn/ReTrace501-v1

OpenO1-SFT (4,500 samples)

Diverse reasoning dataset covering multiple domains including logic, math, science, and general problem-solving.

Source: O1-OPEN/OpenO1-SFT

🔧 Technical Details

Component	Specification
Architecture	Qwen2.5 Transformer
Parameters	3.09 Billion
Context Length	4096 tokens
Precision	FP16
Training Framework	Unsloth + HuggingFace Transformers

📖 Citation

@misc{qwen25-retrace-openo1-merged,
  author = {nnsohamnn},
  title = {Qwen2.5-3B ReTrace-OpenO1 Merged},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged}
}

🔗 Related Resources

LoRA Adapters: nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-5k-QLoRA
Base Model: Qwen/Qwen2.5-3B-Instruct
Demo Space: Try it live!

🙏 Acknowledgments

Qwen Team for the excellent base model
Unsloth AI for efficient training tools
OpenO1 communities for high-quality datasets

📝 License

Apache 2.0 - See LICENSE for details.

Made with ❤️ by nnsohamnn

⭐ Star this repo if you find it useful!

Report Issues • Discussions

Downloads last month: 14

Safetensors

Model size

3B params

Tensor type

F16

Model tree for nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-3B-Instruct