mxfp4 produces garbage (solved)

#2
by diggingAI - opened

After 30-40 lines of output the model produces garbage. Settings: temperature: 1.0, top_p: 0.95, top_k: 40. Latest ml-explore/mlx-lm,

python3 -m mlx_lm.server --model nightmedia/Qwen3-Coder-Next-mxfp4-mlx --temp 1.0 --top-p 0.95 --top-k 40

Other models work flawlessly.

Interesting, did you try it in LMStudio? also, update the local MLX tools, they keep changing every day now. I was not able to replicate issues with my setup, and I use the latest codebase from main

I did a fresh git clone with a venv and all latest packages. I did not try with LMStudio as it wanted to re-download again as it expects the models in ~/.lmstudio. Would a symlink do it? I may try again one of the next days, currently I am downloading the mlx-community quant.

Owner
β€’
edited Feb 4

yeah, symlink should work, I do that often.

I went through the pyenv nightmare a few times. Hard to stabilize an environment on mlx if you try to do more than one thing at a time. I ended up with a merge environment, a quant environment, etc... it gets crazy after a while

LMStudio is always stable with running quants because it embeds a safe mlx version

It does indeed work with LMStudio and it also works with pip install mlx==0.30.1 mlx-metal==0.30.1. And also the mlx-community quant produces similar garbage, so definitely not your quant the reason. Will see if I find some time the next days to pinpoint the problem or at least write a decent but report for mlx-lm.
For info on M1 64GB with mxfp4 I get ~300Tok/s PP and ~40Tok/s TG with 5000 context.

Excellent, thank you for confirming it. Should also check out the qx64-hi I just uploaded, about same performance level

Looks like an M1/M2 issue and a fix is in the pipeline for merge with main: https://github.com/ml-explore/mlx/pull/3099
Once it is merged I'll give it a test run.

diggingAI changed discussion title from mxfp4 produces garbage to mxfp4 produces garbage (solved)

Sign up or log in to comment