SMOLM2Prover - GGUF Format
GGUF quantized version of the SMOLM2Prover model for use with llama.cpp and compatible runtimes.
Model Details
- Original Model: reaperdoesntknow/SMOLM2Prover
- Architecture: LlamaForCausalLM
- Context Length: 8192 tokens
- Embedding Dimension: 960
- Layers: 32
- Head Count: 15 (Q), 5 (KV) - GQA
Available Files
| File | Size | Quantization | Quality |
|---|---|---|---|
SMOLM2Prover.gguf |
692M | F16 | Original (no quantization) |
SMOLM2Prover-Q4_K_M.gguf |
258M | Q4_K_M | Recommended (good quality/size balance) |
Usage
With llama.cpp
# Run with the quantized model
./llama-cli -m SMOLM2Prover-Q4_K_M.gguf -p "Your prompt here" -n 256
With Ollama
Create a Modelfile:
FROM ./SMOLM2Prover-Q4_K_M.gguf
Then:
ollama create smolm2prover -f Modelfile
ollama run smolm2prover
With LM Studio
- Download
SMOLM2Prover-Q4_K_M.gguf - Place in LM Studio models folder
- Load and chat!
Quantization Details
The Q4_K_M quantization uses:
- Q4_K for most weights
- Q5_0 fallback for tensors not divisible by 256
- Q6_K/Q8_0 for some critical layers
Size reduction: 692M → 258M (63% smaller) BPW: 5.94 bits per weight
License
Same as the original model.
- Downloads last month
- 131
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for reaperdoesntknow/SMOLM2Prover-GGUF
Base model
HuggingFaceTB/SmolLM2-360M
Quantized
HuggingFaceTB/SmolLM2-360M-Instruct
Finetuned
prithivMLmods/SmolLM2-CoT-360M