John Leimgruber III's picture

John Leimgruber III PRO

ubergarm

·

https://blog.aifoundry.org/p/adventures-in-model-quantization

AI & ML interests

Open LLMs and Astrophotography image processing.

Recent Activity

new activity 2 days ago

llmfan46/Qwen3.6-27B-uncensored-heretic-v2-Native-MTP-Preserved-GGUF:please make some ik_llama ggufs

new activity 3 days ago

ubergarm/Qwen3.6-27B-GGUF:Vision support

new activity 3 days ago

ubergarm/Qwen3.6-27B-GGUF:How to use MTP in GGUF?

View all activity

Organizations

upvoted an article 3 months ago

Article

GGML and llama.cpp join HF to ensure the long-term progress of Local AI

+4

ggerganov, ngxson, allozaur, lysandre, victor, julien-c

•

Feb 20

• 506

upvoted a collection 6 months ago

Magic Quant

MagicQuant is a benchmark-driven GGUF evaluation and hybrid-discovery system. https://github.com/magiccodingman/MagicQuant-Wiki • 4 items • Updated 17 days ago • 33

upvoted a collection 7 months ago

Draft Models

Tiny "draft" models for speculative decoding. • 14 items • Updated Mar 2 • 6

upvoted a collection 12 months ago

YAQA

YAQA hessians (Sketch B) and models with the QTIP quantizer. See https://github.com/Cornell-RelaxML/yaqa/tree/main for more details. • 9 items • Updated Jun 6, 2025 • 3

upvoted 5 collections about 1 year ago

EXL3 models

55 items • Updated 15 days ago • 51

Qwen3

84 items • Updated Dec 31, 2025 • 1.79k

SkyReels-V2

Infinite-length Film Generative Model • 17 items • Updated Jun 14, 2025 • 78

Gemma 3 QAT

Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated Mar 12 • 218

GLM-4-0414

GLM-4-0414 series model • 6 items • Updated Mar 2 • 135

upvoted 2 articles about 1 year ago

Article

Introduction to ggml

+1

ngxson, ggerganov, slaren

•

Aug 13, 2024

• 286

Article

Comparing sub 50GB Llama 4 Scout quants (KLD/Top P)

bartowski

•

Apr 9, 2025

• 45

upvoted a collection about 1 year ago

FP8 LLMs for vLLM

Accurate FP8 quantized models by Neural Magic, ready for use with vLLM! • 42 items • Updated Mar 2 • 80

upvoted 2 articles over 1 year ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

+1

eliebak, lvwerra, lewtun

•

Jan 28, 2025

• 889

Article

The SOTA Text-to-speech and Zero Shot Voice cloning model that no one knows about...

srinivasbilla

•

Jan 20, 2025

• 77

upvoted 5 collections over 1 year ago

Qwen2.5-1M

The long-context version of Qwen2.5, supporting 1M-token context lengths • 3 items • Updated Dec 31, 2025 • 127

Qwen2.5-VL

Vision-language model series based on Qwen2.5 • 10 items • Updated Mar 2 • 561

Llama 3.2 3B & 1B GGUF Quants

Llama.cpp compatible quants for Llama 3.2 3B and 1B Instruct models. • 4 items • Updated Sep 26, 2024 • 47

Llama 3.1 GPTQ, AWQ, and BNB Quants

Optimised Quants for high-throughput deployments! Compatible with Transformers, TGI & VLLM 🤗 • 9 items • Updated Sep 26, 2024 • 57

Qwen2-VL

Vision-language model series based on Qwen2 • 15 items • Updated Mar 2 • 231

upvoted a collection almost 2 years ago

abliterated-v3

Latest gen of the abliterated models I've produced • 17 items • Updated Jun 3, 2024 • 139