shreyask's picture
Upload folder using huggingface_hub
f2e8b71 verified
metadata
library_name: mlx
tags:
  - mlx
  - music-generation
  - ace-step
  - audio
  - text-to-music
base_model: ACE-Step/ACE-Step1.5

ACE-Step 1.5 MLX (4-bit Quantized)

4-bit quantized MLX weights for ACE-Step/ACE-Step1.5.

  • Decoder and encoder quantized to 4-bit (group_size=64)
  • VAE, tokenizer, and detokenizer kept in full precision
  • 2.2GB main model + 0.7GB VAE + 2.4GB text encoder

Usage

from mlx_audio.tts import load

model = load("mlx-community/ACE-Step1.5-MLX-4bit")

for result in model.generate(
    text="upbeat electronic dance music with energetic synthesizers",
    duration=30.0,
):
    audio = result.audio  # [samples, 2] stereo @ 48kHz
    sample_rate = result.sample_rate

With Vocals

for result in model.generate(
    text="English pop song with clear female vocals, catchy melody",
    lyrics="""[verse]
Dance with me tonight
Under the neon lights

[chorus]
We're alive, we're on fire
Dancing higher and higher
""",
    duration=60.0,
    vocal_language="en",
):
    ...

The model uses a 5Hz Language Model planner by default (use_lm=True) which generates a song blueprint before running the diffusion transformer.