Text-to-Speech
KimiAudio
Safetensors
English
Chinese
audio
audio-language-model
speech-recognition
audio-understanding
audio-generation
chat
custom_code
Instructions to use moonshotai/Kimi-Audio-7B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- KimiAudio
How to use moonshotai/Kimi-Audio-7B with KimiAudio:
# Example usage for KimiAudio # pip install git+https://github.com/MoonshotAI/Kimi-Audio.git from kimia_infer.api.kimia import KimiAudio model = KimiAudio(model_path="moonshotai/Kimi-Audio-7B", load_detokenizer=True) sampling_params = { "audio_temperature": 0.8, "audio_top_k": 10, "text_temperature": 0.0, "text_top_k": 5, } # For ASR asr_audio = "asr_example.wav" messages_asr = [ {"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"}, {"role": "user", "message_type": "audio", "content": asr_audio} ] _, text = model.generate(messages_asr, **sampling_params, output_type="text") print(text) # For Q&A qa_audio = "qa_example.wav" messages_conv = [{"role": "user", "message_type": "audio", "content": qa_audio}] wav, text = model.generate(messages_conv, **sampling_params, output_type="both") sf.write("output_audio.wav", wav.cpu().view(-1).numpy(), 24000) print(text) - Notebooks
- Google Colab
- Kaggle
[Fix] Fix kimia_mimo_audiodelaytokens bug (#3)
Browse files- [Fix] Fix kimia_mimo_audiodelaytokens bug (6a61ca42c65570353280727946435ff7dd175487)
Co-authored-by: Mashiro <KamioMitsuzu@users.noreply.huggingface.co>
- config.json +1 -1
config.json
CHANGED
|
@@ -20,7 +20,7 @@
|
|
| 20 |
"kimia_audio_output_vocab": 16896,
|
| 21 |
"kimia_media_begin": 151661,
|
| 22 |
"kimia_media_end": 151663,
|
| 23 |
-
"kimia_mimo_audiodelaytokens":
|
| 24 |
"kimia_mimo_layers": 6,
|
| 25 |
"kimia_mimo_transformer_from_layer_index": 21,
|
| 26 |
"kimia_text_output_vocab": 152064,
|
|
|
|
| 20 |
"kimia_audio_output_vocab": 16896,
|
| 21 |
"kimia_media_begin": 151661,
|
| 22 |
"kimia_media_end": 151663,
|
| 23 |
+
"kimia_mimo_audiodelaytokens": 6,
|
| 24 |
"kimia_mimo_layers": 6,
|
| 25 |
"kimia_mimo_transformer_from_layer_index": 21,
|
| 26 |
"kimia_text_output_vocab": 152064,
|