2X Qwen 3.5 9B SOMPOA Heresy

#2336
by redaihf - opened

Thanks @MuXodious !

Oh, wait. MTP needs them support PR and I need to see if it works at all with the PR after my frankensteining attempt.

remind me when merged so I update llama cpp and queue, right now Im not queueing

remind me when merged so I update llama cpp and queue, right now Im not queueing

You can queue the non-MTP version. I'll let y'all know when the MTP PR gets the go signal. MTP may not be useful much for the 9B and below, given the standard 98GB VRAM under everyone's hands these days, but should provide a good speed boost to larger models.

Ps. I got a nice speed bump with this thing on, not bad. ~75 t/s -> 100 t/s.

@RichardErkhov can we please have the non-MTP model queued? Sorry for the confusion.

https://huggingface.co/MuXodious/Qwen3.5-9B-SOMPOA-heresy

It's queued!

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Qwen3.5-9B-SOMPOA-heresy-GGUF for quants to appear.

please dont forget to remind me when the mtp finally merges =)

please dont forget to remind me when the mtp finally merges =)

It will take some time. They are currently making adjustments to the scaffolding code prior to finalising the MTP PR. I'll follow up with the notice once they green light the PR. THANKS for the quants as always.

Preliminary PRs are merged. The work should continue for the MTP Support. I'll post updates as things progress.

PR22673 MTP Support is merged! With the latest update, speeds upped from ~77.11 t/s to ~110.15 t/s at Q8_0 (Qwen 3.5 9B).

Don't forget to run llama.cpp with the arguments --spec-type draft-mtp --spec-draft-n-max 3 or add the following lines to each MTP-supported model in your preset file.

spec-type = draft-mtp 
spec-draft-n-max = 3

PR22673 MTP Support is merged! With the latest update, speeds upped from ~77.11 t/s to ~110.15 t/s at Q8_0.

@RichardErkhov I updated llama.cpp on nico1 in case you want to give it a try. Please keep in mind that latest update also includes https://github.com/ggml-org/llama.cpp/pull/17114 which was a massive pain to merge into ouer llama.cpp fork so if convert fails you know why.

It's queued with priority 6969 =)

You can check for progress at http://hf.tst.eu/status.html or regularly check the model
summary page at https://hf.tst.eu/model#Qwen3.5-9B-SOMPOA-heresy-MTP-GGUF for quants to appear.

@nicoboss can you update rich1 as well? dont forget the internet restart in 10 minutes

πŸ‘€ πŸ‘€ πŸ‘€ πŸ‘€ πŸ‘€
{A0E97115-848A-44D6-998D-20255F2C4F5A}

Sign up or log in to comment