Instructions to use reaperdoesntknow/SMOLM2Prover-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use reaperdoesntknow/SMOLM2Prover-GGUF with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="reaperdoesntknow/SMOLM2Prover-GGUF")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("reaperdoesntknow/SMOLM2Prover-GGUF", dtype="auto")

llama-cpp-python

How to use reaperdoesntknow/SMOLM2Prover-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="reaperdoesntknow/SMOLM2Prover-GGUF",
	filename="SMOLM2Prover-Q4_K_M.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use reaperdoesntknow/SMOLM2Prover-GGUF with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M

Use Docker

docker model run hf.co/reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M

LM Studio
Jan

vLLM

How to use reaperdoesntknow/SMOLM2Prover-GGUF with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "reaperdoesntknow/SMOLM2Prover-GGUF"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/SMOLM2Prover-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M

SGLang

How to use reaperdoesntknow/SMOLM2Prover-GGUF with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "reaperdoesntknow/SMOLM2Prover-GGUF" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/SMOLM2Prover-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "reaperdoesntknow/SMOLM2Prover-GGUF" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "reaperdoesntknow/SMOLM2Prover-GGUF",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Ollama
How to use reaperdoesntknow/SMOLM2Prover-GGUF with Ollama:
```
ollama run hf.co/reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
```

Unsloth Studio

How to use reaperdoesntknow/SMOLM2Prover-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for reaperdoesntknow/SMOLM2Prover-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for reaperdoesntknow/SMOLM2Prover-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for reaperdoesntknow/SMOLM2Prover-GGUF to start chatting

Atomic Chat new
Docker Model Runner
How to use reaperdoesntknow/SMOLM2Prover-GGUF with Docker Model Runner:
```
docker model run hf.co/reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M
```

Lemonade

How to use reaperdoesntknow/SMOLM2Prover-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull reaperdoesntknow/SMOLM2Prover-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.SMOLM2Prover-GGUF-Q4_K_M

List all available models

lemonade list

SMOLM2Prover - GGUF Format

GGUF quantized version of the SMOLM2Prover model for use with llama.cpp and compatible runtimes.

Model Details

Original Model: reaperdoesntknow/SMOLM2Prover
Architecture: LlamaForCausalLM
Context Length: 8192 tokens
Embedding Dimension: 960
Layers: 32
Head Count: 15 (Q), 5 (KV) - GQA

Available Files

File	Size	Quantization	Quality
`SMOLM2Prover.gguf`	692M	F16	Original (no quantization)
`SMOLM2Prover-Q4_K_M.gguf`	258M	Q4_K_M	Recommended (good quality/size balance)

Usage

With llama.cpp

# Run with the quantized model
./llama-cli -m SMOLM2Prover-Q4_K_M.gguf -p "Your prompt here" -n 256

With Ollama

Create a Modelfile:

FROM ./SMOLM2Prover-Q4_K_M.gguf

Then:

ollama create smolm2prover -f Modelfile
ollama run smolm2prover

With LM Studio

Download SMOLM2Prover-Q4_K_M.gguf
Place in LM Studio models folder
Load and chat!

Quantization Details

The Q4_K_M quantization uses:

Q4_K for most weights
Q5_0 fallback for tensors not divisible by 256
Q6_K/Q8_0 for some critical layers

Size reduction: 692M → 258M (63% smaller) BPW: 5.94 bits per weight

Discrepancy Calculus Foundation

This model is part of the Convergent Intelligence LLC: Research Division portfolio. All models in this portfolio are developed under the Discrepancy Calculus (DISC) framework — a measure-theoretic approach to understanding and controlling the gap between what a model should produce and what it actually produces.

DISC treats training singularities (loss plateaus, mode collapse, catastrophic forgetting) not as failures to be smoothed over, but as structural signals that reveal the geometry of the learning problem. Key concepts:

Discrepancy Operator (D): Measures the gap between expected and observed behavior at each training step
Jump Sets: Boundaries where model behavior changes discontinuously — these are features, not bugs
Ghost Imprinting: Teacher knowledge that transfers to student models through weight-space topology rather than explicit distillation signal

For the full mathematical treatment, see Discrepancy Calculus: Foundations and Core Theory (DOI: 10.57967/hf/8194).

Citation chain: Structure Over Scale (DOI: 10.57967/hf/8165) → Three Teachers to Dual Cognition (DOI: 10.57967/hf/8184) → Discrepancy Calculus (DOI: 10.57967/hf/8194)

License

Same as the original model.

Convergent Intelligence Portfolio

Part of the Standalone Models by Convergent Intelligence LLC: Research Division

Related Models

Model	Downloads	Format
SMOLM2Prover	56	HF
DeepReasoning_1R	16	HF
SAGI	3	HF
S-AGI	0	HF

Top Models from Our Lab

Model	Downloads
Qwen3-1.7B-Thinking-Distil	501
LFM2.5-1.2B-Distilled-SFT	342
Qwen3-1.7B-Coder-Distilled-SFT	302
Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF	203
Qwen3-1.7B-Coder-Distilled-SFT-GGUF	194

Total Portfolio: 41 models | 2,781 total downloads

Last updated: 2026-03-28 12:55 UTC

From the Convergent Intelligence Portfolio

DistilQwen Collection — Our only BF16 series. Proof-weighted distillation from Qwen3-30B-A3B → 1.7B and 0.6B on H100. Three teacher variants (Instruct, Thinking, Coder), nine models, 2,788 combined downloads. The rest of the portfolio proves structure beats scale on CPU. This collection shows what happens when you give the methodology real hardware.

Top model: Qwen3-1.7B-Coder-Distilled-SFT — 508 downloads

Full methodology: Structure Over Scale (DOI: 10.57967/hf/8165)

Convergent Intelligence LLC: Research Division

_{Part of the reaperdoesntknow research portfolio — 48 models, 12,094 total downloads | Last refreshed: 2026-03-29 21:05 UTC}

Downloads last month: 3,026

GGUF

Model size

0.4B params

Architecture

llama

Hardware compatibility

4-bit

View +1 variant

Model tree for reaperdoesntknow/SMOLM2Prover-GGUF

Base model

HuggingFaceTB/SmolLM2-360M

Quantized

HuggingFaceTB/SmolLM2-360M-Instruct

Finetuned

prithivMLmods/SmolLM2-CoT-360M