mxfp4 produces garbage (solved)
After 30-40 lines of output the model produces garbage. Settings: temperature: 1.0, top_p: 0.95, top_k: 40. Latest ml-explore/mlx-lm,
python3 -m mlx_lm.server --model nightmedia/Qwen3-Coder-Next-mxfp4-mlx --temp 1.0 --top-p 0.95 --top-k 40
Other models work flawlessly.
Interesting, did you try it in LMStudio? also, update the local MLX tools, they keep changing every day now. I was not able to replicate issues with my setup, and I use the latest codebase from main
I did a fresh git clone with a venv and all latest packages. I did not try with LMStudio as it wanted to re-download again as it expects the models in ~/.lmstudio. Would a symlink do it? I may try again one of the next days, currently I am downloading the mlx-community quant.
yeah, symlink should work, I do that often.
I went through the pyenv nightmare a few times. Hard to stabilize an environment on mlx if you try to do more than one thing at a time. I ended up with a merge environment, a quant environment, etc... it gets crazy after a while
LMStudio is always stable with running quants because it embeds a safe mlx version
It does indeed work with LMStudio and it also works with pip install mlx==0.30.1 mlx-metal==0.30.1. And also the mlx-community quant produces similar garbage, so definitely not your quant the reason. Will see if I find some time the next days to pinpoint the problem or at least write a decent but report for mlx-lm.
For info on M1 64GB with mxfp4 I get ~300Tok/s PP and ~40Tok/s TG with 5000 context.
Excellent, thank you for confirming it. Should also check out the qx64-hi I just uploaded, about same performance level
Looks like an M1/M2 issue and a fix is in the pipeline for merge with main: https://github.com/ml-explore/mlx/pull/3099
Once it is merged I'll give it a test run.