ECAPA-TDNN Language Identification (GGUF)
GGUF conversion of speechbrain/lang-id-voxlingua107-ecapa for use with CrispASR.
Model Details
- Architecture: ECAPA-TDNN (SE-Res2Net + Attentive Statistical Pooling)
- Parameters: 21M
- Size: 43 MB (F16)
- Languages: 107 (VoxLingua107 dataset)
- License: Apache 2.0
- Training: SpeechBrain on VoxLingua107 (6,628 hours YouTube speech)
Usage with CrispASR
# As LID pre-step for any ASR backend
crispasr -m whisper-large-v3.gguf --lid-backend ecapa -l auto -f audio.wav
# Model auto-downloads on first use, or specify path:
crispasr -m model.gguf --lid-backend ecapa --lid-model ecapa-lid-107-f16.gguf -l auto -f audio.wav
Accuracy
Tested on 12-language edge-TTS benchmark (3 samples per language):
| Language | Accuracy | Confidence |
|---|---|---|
| English | 3/3 | p≥0.99 |
| German | 3/3 | p≥0.99 |
| French | 3/3 | p≥0.99 |
| Spanish | 3/3 | p≥0.96 |
| Japanese | 3/3 | p≥0.99 |
| Chinese | 3/3 | p≥0.99 |
| Korean | 3/3 | p≥0.99 |
| Russian | 3/3 | p≥0.99 |
| Arabic | 3/3 | p≥0.99 |
| Hindi | 3/3 | p≥0.99 |
| Portuguese | 3/3 | p≥0.99 |
| Italian | 3/3 | p≥0.99 |
Files
| File | Size | Description |
|---|---|---|
ecapa-lid-107-f16.gguf |
43 MB | F16 weights (recommended) |
Conversion
python models/convert-ecapa-tdnn-lid-to-gguf.py \
--input speechbrain/lang-id-voxlingua107-ecapa \
--output ecapa-lid-107-f16.gguf
Architecture
Input: 16kHz PCM → 60-dim mel fbank (SpeechBrain STFT, n_fft=400)
→ Sentence-level mean normalization
→ Block0: Conv1d(60→1024, k=5) + ReLU + BN
→ Block1-3: SE-Res2Net (1024 channels, 8 sub-bands, dilations 2/3/4)
→ MFA: concatenate block1-3 outputs → Conv1d(3072→3072, k=1) + ReLU + BN
→ ASP: Attentive Statistical Pooling → [6144]
→ BN + FC(6144→256) → embedding
→ Classifier: BN → Linear(256→512) + BN + LeakyReLU → Linear(512→107)
Citation
@inproceedings{ravanelli2021speechbrain,
title={SpeechBrain: A General-Purpose Speech Toolkit},
author={Ravanelli, Mirco and others},
booktitle={Proceedings of the 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH)},
year={2021}
}
- Downloads last month
- 109
Hardware compatibility
Log In to add your hardware
16-bit
Model tree for cstr/ecapa-lid-107-GGUF
Base model
speechbrain/lang-id-voxlingua107-ecapa