How to use from the
Use from the
MLX library
# Download the model from the Hub
pip install huggingface_hub[hf_xet]

huggingface-cli download --local-dir ACE-Step1.5-MLX-4bit mlx-community/ACE-Step1.5-MLX-4bit

ACE-Step 1.5 MLX (4-bit Quantized)

4-bit quantized MLX weights for ACE-Step/ACE-Step1.5.

  • Decoder and encoder quantized to 4-bit (group_size=64)
  • VAE, tokenizer, and detokenizer kept in full precision
  • 2.2GB main model + 0.7GB VAE + 2.4GB text encoder

Usage

from mlx_audio.tts import load

model = load("mlx-community/ACE-Step1.5-MLX-4bit")

for result in model.generate(
    text="upbeat electronic dance music with energetic synthesizers",
    duration=30.0,
):
    audio = result.audio  # [samples, 2] stereo @ 48kHz
    sample_rate = result.sample_rate

With Vocals

for result in model.generate(
    text="English pop song with clear female vocals, catchy melody",
    lyrics="""[verse]
Dance with me tonight
Under the neon lights

[chorus]
We're alive, we're on fire
Dancing higher and higher
""",
    duration=60.0,
    vocal_language="en",
):
    ...

The model uses a 5Hz Language Model planner by default (use_lm=True) which generates a song blueprint before running the diffusion transformer.

Downloads last month
102
Safetensors
Model size
0.6B params
Tensor type
F32
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support