ACE-Step 1.5 MLX (4-bit Quantized)

4-bit quantized MLX weights for ACE-Step/ACE-Step1.5.

Decoder and encoder quantized to 4-bit (group_size=64)
VAE, tokenizer, and detokenizer kept in full precision
2.2GB main model + 0.7GB VAE + 2.4GB text encoder

Usage

from mlx_audio.tts import load

model = load("mlx-community/ACE-Step1.5-MLX-4bit")

for result in model.generate(
    text="upbeat electronic dance music with energetic synthesizers",
    duration=30.0,
):
    audio = result.audio  # [samples, 2] stereo @ 48kHz
    sample_rate = result.sample_rate

With Vocals

for result in model.generate(
    text="English pop song with clear female vocals, catchy melody",
    lyrics="""[verse]
Dance with me tonight
Under the neon lights

[chorus]
We're alive, we're on fire
Dancing higher and higher
""",
    duration=60.0,
    vocal_language="en",
):
    ...

The model uses a 5Hz Language Model planner by default (use_lm=True) which generates a song blueprint before running the diffusion transformer.

Downloads last month: 102

Safetensors

Model size

0.6B params

Tensor type

F32

U32

MLX

Hardware compatibility

Quantized

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support