Voxtral Mini 3B 2507 — Quantized (MLX)

Quantifications MLX de Voxtral-Mini-3B-2507 pour Apple Silicon, basées sur les poids BF16 de mlx-community/Voxtral-Mini-3B-2507-bf16.

Variantes

Dossier	Quantification	Description
`mlx-mxfp4-mixed/`	MXFP4 mixed precision	LM=MXFP4 4-bit, encoder/projector=8-bit affine. Meilleure qualité que Q4 affine. Requiert MLX >= 0.30.0.
`mlx-q4/`	4-bit affine	Plus compact
`mlx-q5/`	5-bit affine	Bon compromis taille/qualité
`mlx-q6/`	6-bit affine	Qualité élevée
`mlx-q8/`	8-bit affine	Proche du BF16

Seuls les poids d'inférence sont quantifiés.
Les embeddings et lm_head ne sont PAS quantifiés pour préserver la qualité.
MXFP4 (microscaling FP4) est le format natif Apple M5+ et offre de meilleures performances que la quantification affine 4-bit classique.

from mlx_lm import load, generate

model, tokenizer = load("NeoRoth/voxtral-mini-3b-2507-mlx", sub_folder="mlx-mxfp4-mixed")

Apache-2.0 — voir LICENSE.txt.

Downloads last month: -; Downloads are not tracked for this model. How to track

MLX

Hardware compatibility

Quantized

Base model

Finetuned

(15)

this model