view article Article GGML and llama.cpp join HF to ensure the long-term progress of Local AI +4 ggerganov, ngxson, allozaur, lysandre, victor, julien-c • Feb 20 • 506
Magic Quant Collection MagicQuant is a benchmark-driven GGUF evaluation and hybrid-discovery system. https://github.com/magiccodingman/MagicQuant-Wiki • 4 items • Updated 17 days ago • 33
YAQA Collection YAQA hessians (Sketch B) and models with the QTIP quantizer. See https://github.com/Cornell-RelaxML/yaqa/tree/main for more details. • 9 items • Updated Jun 6, 2025 • 3
Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Mar 12 • 218
view article Article Comparing sub 50GB Llama 4 Scout quants (KLD/Top P) bartowski • Apr 9, 2025 • 45
FP8 LLMs for vLLM Collection Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 42 items • Updated Mar 2 • 80
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
view article Article The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about... srinivasbilla • Jan 20, 2025 • 77
Qwen2.5-1M Collection The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Dec 31, 2025 • 127
Llama 3.2 3B & 1B GGUF Quants Collection Llama.cpp compatible quants for Llama 3.2 3B and 1B Instruct models. • 4 items • Updated Sep 26, 2024 • 47
Llama 3.1 GPTQ, AWQ, and BNB Quants Collection Optimised Quants for high-throughput deployments! Compatible with Transformers, TGI & VLLM 🤗 • 9 items • Updated Sep 26, 2024 • 57
abliterated-v3 Collection Latest gen of the abliterated models I've produced • 17 items • Updated Jun 3, 2024 • 139