294 19 344

John Leimgruber III PRO

ubergarm

https://www.paypal.com/donate/?hosted_button_id=HU59345BZVSUA

AI & ML interests

Open LLMs and Astrophotography image processing.

Recent Activity

new activity 6 days ago

ubergarm/GLM-4.7-GGUF:Stable run on 2x RTX 5090 and 2 Xeon E5 2696 V4 and DDR4 with ik_llama.cpp - 6.1 t/s on IQ4_K and 5.1 t/s on IQ5_K, opencode works with this

new activity 8 days ago

ubergarm/GLM-4.7-Flash-GGUF:question about mxfp4

new activity 8 days ago

zai-org/GLM-4.7-Flash:Why does the KV cache occupy so much GPU memory?

View all activity

Organizations

New activity in ubergarm/GLM-4.7-GGUF 6 days ago

Stable run on 2x RTX 5090 and 2 Xeon E5 2696 V4 and DDR4 with ik_llama.cpp - 6.1 t/s on IQ4_K and 5.1 t/s on IQ5_K, opencode works with this

👍 1

#5 opened about 1 month ago by

martossien

New activity in ubergarm/GLM-4.7-Flash-GGUF 8 days ago

question about mxfp4

#3 opened 8 days ago by

koifish12

New activity in zai-org/GLM-4.7-Flash 8 days ago

Why does the KV cache occupy so much GPU memory?

#21 opened 9 days ago by

yyg201708

New activity in ubergarm/GLM-4.7-Flash-GGUF 8 days ago

Re-cooking imatrix and quants with updated ik/llama.cpp PR

🚀 1

#1 opened 8 days ago by

ubergarm

updated a model 8 days ago

ubergarm/GLM-4.7-Flash-GGUF

Text Generation • 30B • Updated 8 days ago • 6.69k • 13

New activity in zai-org/GLM-4.7-Flash 9 days ago

Cannot run vLLM on DGX Spark: ImportError: libcudart.so.12

#18 opened 9 days ago by

yyg201708

Performance Discussion

👀 2

#1 opened 10 days ago by

IndenScale

Enormous KV-cache size?

👍 ➕ 6

#3 opened 10 days ago by

nephepritou

New activity in noctrex/GLM-4.7-Flash-MXFP4_MOE-GGUF 9 days ago

Feedback from running in LM Studio 0.39.3 with v1.103.2 of llama.cpp

#1 opened 9 days ago by

spanspek

liked a model 9 days ago

noctrex/GLM-4.7-Flash-MXFP4_MOE-GGUF

Text Generation • 30B • Updated 4 days ago • 9.53k • 19

published a model 10 days ago

ubergarm/GLM-4.7-Flash-GGUF

Text Generation • 30B • Updated 8 days ago • 6.69k • 13

liked 2 models 10 days ago

ngxson/GLM-4.7-Flash-GGUF

30B • Updated 9 days ago • 11.2k • 20

zai-org/GLM-4.7-Flash

Text Generation • 31B • Updated about 7 hours ago • 609k • • 1.32k

liked a model 11 days ago

ArtusDev/requests-exl

Updated Oct 13, 2025 • 6

New activity in ArtusDev/requests-exl 11 days ago

[QUANTING UPDATE]

❤️ 👍 4

#28 opened 14 days ago by

ArtusDev

New activity in ubergarm/Devstral-Small-2-24B-Instruct-2512-GGUF 11 days ago

Mistral 3 large wuant

👍 1

#1 opened 11 days ago by

facedwithahug

New activity in ubergarm/DeepSeek-V3.2-Speciale-GGUF 11 days ago

QuIP - 2 bit quantised as good as 16 bit

#5 opened 16 days ago by

infinityai

New activity in msievers/gemma-3-1b-it-qat-q4_0-gguf 15 days ago

Thanks for sharing your work!

❤️ 2

#1 opened 16 days ago by

ubergarm

New activity in ubergarm/DeepSeek-V3.2-Speciale-GGUF 15 days ago

Say Whattt?!

🔥 👍 4

#1 opened 20 days ago by

mtcl

New activity in ubergarm/Devstral-2-123B-Instruct-2512-GGUF 16 days ago

Decent PPL with 100% IQ4_KSS

🔥 1

#3 opened about 2 months ago by

sokann

John Leimgruber III PRO

AI & ML interests

Recent Activity

Organizations

ubergarm's activity

Stable run on 2x RTX 5090 and 2 Xeon E5 2696 V4 and DDR4 with ik_llama.cpp - 6.1 t/s on IQ4_K and 5.1 t/s on IQ5_K, opencode works with this

question about mxfp4

Why does the KV cache occupy so much GPU memory?

Re-cooking imatrix and quants with updated ik/llama.cpp PR

Cannot run vLLM on DGX Spark: ImportError: libcudart.so.12

Performance Discussion

Enormous KV-cache size?

Feedback from running in LM Studio 0.39.3 with v1.103.2 of llama.cpp

[QUANTING UPDATE]

Mistral 3 large wuant

QuIP - 2 bit quantised as good as 16 bit

Thanks for sharing your work!

Say Whattt?!

Decent PPL with 100% IQ4_KSS