Configuration Parsing Warning:Invalid JSON for config file config.json

Kokoro-82M CoreML

End-to-end CoreML export of hexgrad/Kokoro-82M at FP16, optimized for Apple Neural Engine. Requires iOS 18+ / macOS 15+.

A single kokoro_5s.mlmodelc runs the full pipeline (BERT → duration prediction → fixed-shape alignment → prosody → decoder) in one CoreML call. G2P (grapheme-to-phoneme) is a separate pair of CoreML models.

Looking for a smaller variant? See aufklarer/Kokoro-82M-CoreML-INT8 — INT8 k-means palettized, 83 MB vs 325 MB here, with log-spec distance 0.42 vs this FP16 reference on a validation utterance.

Model

Parameter	Value
Parameters	82M
Precision	FP16
Max audio length	5 s (200 frames @ 40 fps)
Sample rate	24 kHz
Style dimension	256
Max phonemes per pass	128

Files

File	Size	Description
`kokoro_5s.mlmodelc`	325 MB	Pre-compiled E2E model (pre-compiled, loads directly on-device)
`G2PEncoder.mlmodelc`	0.7 MB	Grapheme-to-phoneme encoder
`G2PDecoder.mlmodelc`	0.8 MB	Grapheme-to-phoneme decoder
`voices/`	0.5 MB	54 preset voice embeddings (10 languages)
`vocab_index.json`	4 KB	Phoneme vocabulary
`g2p_vocab.json`	4 KB	G2P vocabulary
`us_gold.json`, `us_silver.json`	6 MB	English pronunciation dictionaries
`pipeline_config.json`	4 KB	Swift pipeline config

Voices

54 preset voices across 10 languages: English (US/UK), Spanish, French, Hindi, Italian, Japanese, Korean, Portuguese, Chinese.

Usage

Add speech-swift to Package.swift:

.package(url: "https://github.com/soniqo/speech-swift", branch: "main")

Then synthesize:

import KokoroTTS

let tts = try await KokoroTTSModel.fromPretrained(
    modelId: "aufklarer/Kokoro-82M-CoreML"
)
let audio = try await tts.synthesize(
    "Hello world, this is a Kokoro test.",
    voice: "af_heart"
)

CLI:

swift run audio kokoro "Hello world" --voice af_heart --output out.wav

Source

Base model: hexgrad/Kokoro-82M (Apache-2.0)
Dictionaries and G2P: Apache-2.0

License

Model weights: Apache-2.0
CoreML conversion: Apache-2.0

Model tree for aufklarer/Kokoro-82M-CoreML

Base model

yl4579/StyleTTS2-LJSpeech

Finetuned

hexgrad/Kokoro-82M

Finetuned

(24)

this model

Collection including aufklarer/Kokoro-82M-CoreML

CoreML Speech Models

Collection

Speech AI models for Apple Neural Engine via CoreML. iOS/macOS ready. ASR, TTS, VAD, diarization. • 19 items • Updated 2 days ago • 1

aufklarer
/

Kokoro-82M-CoreML