ECAPA-TDNN Language Identification (GGUF)

GGUF conversion of speechbrain/lang-id-voxlingua107-ecapa for use with CrispASR.

Model Details

  • Architecture: ECAPA-TDNN (SE-Res2Net + Attentive Statistical Pooling)
  • Parameters: 21M
  • Size: 43 MB (F16)
  • Languages: 107 (VoxLingua107 dataset)
  • License: Apache 2.0
  • Training: SpeechBrain on VoxLingua107 (6,628 hours YouTube speech)

Usage with CrispASR

# As LID pre-step for any ASR backend
crispasr -m whisper-large-v3.gguf --lid-backend ecapa -l auto -f audio.wav

# Model auto-downloads on first use, or specify path:
crispasr -m model.gguf --lid-backend ecapa --lid-model ecapa-lid-107-f16.gguf -l auto -f audio.wav

Accuracy

Tested on 12-language edge-TTS benchmark (3 samples per language):

Language Accuracy Confidence
English 3/3 p≥0.99
German 3/3 p≥0.99
French 3/3 p≥0.99
Spanish 3/3 p≥0.96
Japanese 3/3 p≥0.99
Chinese 3/3 p≥0.99
Korean 3/3 p≥0.99
Russian 3/3 p≥0.99
Arabic 3/3 p≥0.99
Hindi 3/3 p≥0.99
Portuguese 3/3 p≥0.99
Italian 3/3 p≥0.99

Files

File Size Description
ecapa-lid-107-f16.gguf 43 MB F16 weights (recommended)

Conversion

python models/convert-ecapa-tdnn-lid-to-gguf.py \
    --input speechbrain/lang-id-voxlingua107-ecapa \
    --output ecapa-lid-107-f16.gguf

Architecture

Input: 16kHz PCM → 60-dim mel fbank (SpeechBrain STFT, n_fft=400)
       → Sentence-level mean normalization
       → Block0: Conv1d(60→1024, k=5) + ReLU + BN
       → Block1-3: SE-Res2Net (1024 channels, 8 sub-bands, dilations 2/3/4)
       → MFA: concatenate block1-3 outputs → Conv1d(3072→3072, k=1) + ReLU + BN
       → ASP: Attentive Statistical Pooling → [6144]
       → BN + FC(6144→256) → embedding
       → Classifier: BN → Linear(256→512) + BN + LeakyReLU → Linear(512→107)

Citation

@inproceedings{ravanelli2021speechbrain,
  title={SpeechBrain: A General-Purpose Speech Toolkit},
  author={Ravanelli, Mirco and others},
  booktitle={Proceedings of the 22nd Annual Conference of the International Speech Communication Association (INTERSPEECH)},
  year={2021}
}
Downloads last month
109
GGUF
Model size
21.3M params
Architecture
ecapa-tdnn-lid
Hardware compatibility
Log In to add your hardware

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cstr/ecapa-lid-107-GGUF

Quantized
(3)
this model