2X Qwen 3.5 9B SOMPOA Heresy

#2336

by redaihf - opened 19 days ago

https://huggingface.co/MuXodious/Qwen3.5-9B-SOMPOA-heresy
https://huggingface.co/MuXodious/Qwen3.5-9B-SOMPOA-heresy-MTP

redaihf

19 days ago

Thanks @MuXodious !

MuXodious

19 days ago

•

edited 19 days ago

Oh, wait. MTP needs them support PR and I need to see if it works at all with the PR after my frankensteining attempt.

RichardErkhov

19 days ago

remind me when merged so I update llama cpp and queue, right now Im not queueing

MuXodious

19 days ago

•

edited 19 days ago

remind me when merged so I update llama cpp and queue, right now Im not queueing

You can queue the non-MTP version. I'll let y'all know when the MTP PR gets the go signal. MTP may not be useful much for the 9B and below, given the standard 98GB VRAM under everyone's hands these days, but should provide a good speed boost to larger models.

Ps. I got a nice speed bump with this thing on, not bad. ~75 t/s -> 100 t/s.

redaihf

17 days ago

•

edited 17 days ago

@RichardErkhov can we please have the non-MTP model queued? Sorry for the confusion.

https://huggingface.co/MuXodious/Qwen3.5-9B-SOMPOA-heresy

RichardErkhov

17 days ago

•

edited 17 days ago

It's queued!

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Qwen3.5-9B-SOMPOA-heresy-GGUF for quants to appear.

please dont forget to remind me when the mtp finally merges =)

MuXodious

17 days ago

please dont forget to remind me when the mtp finally merges =)

It will take some time. They are currently making adjustments to the scaffolding code prior to finalising the MTP PR. I'll follow up with the notice once they green light the PR. THANKS for the quants as always.

MuXodious

15 days ago

Preliminary PRs are merged. The work should continue for the MTP Support. I'll post updates as things progress.

MuXodious

11 days ago

•

edited 11 days ago

PR22673 MTP Support is merged! With the latest update, speeds upped from ~77.11 t/s to ~110.15 t/s at Q8_0 (Qwen 3.5 9B).

Don't forget to run llama.cpp with the arguments --spec-type draft-mtp --spec-draft-n-max 3 or add the following lines to each MTP-supported model in your preset file.

spec-type = draft-mtp 
spec-draft-n-max = 3

nicoboss

11 days ago

PR22673 MTP Support is merged! With the latest update, speeds upped from ~77.11 t/s to ~110.15 t/s at Q8_0.

@RichardErkhov I updated llama.cpp on nico1 in case you want to give it a try. Please keep in mind that latest update also includes https://github.com/ggml-org/llama.cpp/pull/17114 which was a massive pain to merge into ouer llama.cpp fork so if convert fails you know why.

RichardErkhov

11 days ago

It's queued with priority 6969 =)

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Qwen3.5-9B-SOMPOA-heresy-MTP-GGUF for quants to appear.

RichardErkhov

11 days ago

@nicoboss can you update rich1 as well? dont forget the internet restart in 10 minutes

RichardErkhov

11 days ago

👀 👀 👀 👀 👀

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment