SMOLM2Prover - GGUF Format

GGUF quantized version of the SMOLM2Prover model for use with llama.cpp and compatible runtimes.

Model Details

  • Original Model: reaperdoesntknow/SMOLM2Prover
  • Architecture: LlamaForCausalLM
  • Context Length: 8192 tokens
  • Embedding Dimension: 960
  • Layers: 32
  • Head Count: 15 (Q), 5 (KV) - GQA

Available Files

File Size Quantization Quality
SMOLM2Prover.gguf 692M F16 Original (no quantization)
SMOLM2Prover-Q4_K_M.gguf 258M Q4_K_M Recommended (good quality/size balance)

Usage

With llama.cpp

# Run with the quantized model
./llama-cli -m SMOLM2Prover-Q4_K_M.gguf -p "Your prompt here" -n 256

With Ollama

Create a Modelfile:

FROM ./SMOLM2Prover-Q4_K_M.gguf

Then:

ollama create smolm2prover -f Modelfile
ollama run smolm2prover

With LM Studio

  1. Download SMOLM2Prover-Q4_K_M.gguf
  2. Place in LM Studio models folder
  3. Load and chat!

Quantization Details

The Q4_K_M quantization uses:

  • Q4_K for most weights
  • Q5_0 fallback for tensors not divisible by 256
  • Q6_K/Q8_0 for some critical layers

Size reduction: 692M → 258M (63% smaller) BPW: 5.94 bits per weight

License

Same as the original model.

Downloads last month
131
GGUF
Model size
0.4B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for reaperdoesntknow/SMOLM2Prover-GGUF

Dataset used to train reaperdoesntknow/SMOLM2Prover-GGUF