🧠 Qwen2.5-3B-Instruct ReTrace-OpenO1 Merged

Merged Model LoRA Adapters Base Model License

A reasoning-focused model trained on 5,000 chain-of-thought examples

🚀 Try Demo📊 Dataset ReTrace📊 Dataset OpenO1


📋 Model Description

This is a fully merged model of Qwen2.5-3B-Instruct fine-tuned with LoRA on 5,000 reasoning samples (500 ReTrace + 4,500 OpenO1-SFT). The model generates structured reasoning with explicit <Thought> and <Output> tags, demonstrating enhanced step-by-step problem-solving capabilities.

🎯 Key Features

  • Fully Merged: Ready-to-use model (no adapter loading needed)
  • Structured Reasoning: Outputs thinking in <Thought> tags, final answer in <Output> tags
  • 5K Training Samples: 500 ReTrace + 4,500 OpenO1-SFT examples
  • Multi-Domain: Math, logic, word problems, and general reasoning
  • Production Ready: FP16, 6GB model size

📊 Training Loss

Training Loss

📈 Training Statistics

Metric Value
Initial Loss 1.3374
Final Loss 0.6798
Best Loss 0.6662 (Step 240)
Improvement 49.2% ↓
Total Steps 310

⚙️ Training Configuration

# Model
BASE_MODEL = "Qwen/Qwen2.5-3B-Instruct"
MAX_SEQ_LENGTH = 4096

# LoRA
LORA_R = 64
LORA_ALPHA = 128
LORA_DROPOUT = 0.05

# Training
BATCH_SIZE = 8
GRADIENT_ACCUMULATION = 4
LEARNING_RATE = 2e-4
NUM_EPOCHS = 2
WARMUP_STEPS = 50

# Datasets
- 500 samples from ReTrace501-v1
- 4,500 samples from OpenO1-SFT

🚀 Usage

Installation

pip install torch transformers accelerate

Quick Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# =========================
# Load model and tokenizer
# =========================
model_name = "nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged"

tokenizer = AutoTokenizer.from_pretrained(
    model_name,
    trust_remote_code=True
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True
)

# =========================
# LLM question function
# =========================
def ask_llm(question: str):
    messages = [
        {
            "role": "system",
            "content": (
                "You are a helpful AI assistant. When solving problems, show your detailed reasoning process inside <Thought> tags, then provide your final answer inside <Output> tags and explain the final answer from reasoning in short. Break down complex problems step-by-step."
            )
        },
        {
            "role": "user",
            "content": question
        }
    ]

    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

    outputs = model.generate(
        **inputs,
        max_new_tokens=1024,
        temperature=0.7,
        top_p=0.9
    )

    # Correct slicing
    prompt_len = inputs["input_ids"].shape[1]
    response = tokenizer.decode(
        outputs[0][prompt_len:],
        skip_special_tokens=True
    )

    return response


# =========================
# Change ONLY this block 
# =========================
question = """
A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.
"""

print(ask_llm(question))

Expected Output

Question

A machine produces items where 4% of the output is defective. A quality control test correctly identifies a defective item with probability 0.95 and incorrectly labels a good item as defective with probability 0.03. If an item is selected at random and the test reports it as defective, determine the probability that the item is actually defective.

<Thought>
Let's define the events:
- \( D \): The event that the item is defective.
- \( D^c \): The event that the item is not defective.
- \( T \): The event that the test reports the item as defective.

Given probabilities:
- \( P(D) = 0.04 \) (4% defective)
- \( P(T|D) = 0.95 \) (Test correctly identifies defective items)
- \( P(T|D^c) = 0.03 \) (Test incorrectly labels good items as defective)

We need to find \( P(D|T) \), the probability that the item is defective given that the test reports it as defective.

Using Bayes' theorem:
\[
P(D|T) = \frac{P(T|D)P(D)}{P(T)}
\]

First, we need to find \( P(T) \), the total probability that the test reports a defective item. This can be found using the law of total probability:
\[
P(T) = P(T|D)P(D) + P(T|D^c)P(D^c)
\]

Calculate each term:
\[
P(D^c) = 1 - P(D) = 1 - 0.04 = 0.96
\]
\[
P(T|D^c) = 0.03
\]
\[
P(T) = (0.95)(0.04) + (0.03)(0.96) = 0.038 + 0.0288 = 0.0668
\]

Now, substitute back into Bayes' theorem:
\[
P(D|T) = \frac{(0.95)(0.04)}{0.0668} = \frac{0.038}{0.0668} \approx 0.572
\]

So, the probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.

</Thought>
<Output>
The probability that the item is actually defective given that the test reports it as defective is approximately 57.2%.
</Output>

📚 Training Datasets

ReTrace501-v1 (500 samples)

High-quality chain-of-thought reasoning examples focusing on mathematical problem-solving with explicit reasoning steps.

Source: nnsohamnn/ReTrace501-v1

OpenO1-SFT (4,500 samples)

Diverse reasoning dataset covering multiple domains including logic, math, science, and general problem-solving.

Source: O1-OPEN/OpenO1-SFT


🔧 Technical Details

Component Specification
Architecture Qwen2.5 Transformer
Parameters 3.09 Billion
Context Length 4096 tokens
Precision FP16
Training Framework Unsloth + HuggingFace Transformers

📖 Citation

@misc{qwen25-retrace-openo1-merged,
  author = {nnsohamnn},
  title = {Qwen2.5-3B ReTrace-OpenO1 Merged},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged}
}

🔗 Related Resources


🙏 Acknowledgments

  • Qwen Team for the excellent base model
  • Unsloth AI for efficient training tools
  • OpenO1 communities for high-quality datasets

📝 License

Apache 2.0 - See LICENSE for details.


Made with ❤️ by nnsohamnn

⭐ Star this repo if you find it useful!

Report IssuesDiscussions

Downloads last month
14
Safetensors
Model size
3B params
Tensor type
F16
·
Inference Providers NEW
Input a message to start chatting with nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged.

Model tree for nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged

Base model

Qwen/Qwen2.5-3B
Finetuned
(1308)
this model

Datasets used to train nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged

Space using nnsohamnn/Qwen2.5-3B-ReTrace-OpenO1-Merged 1