Instructions to use fransis3/EuroBERT-210m-NorNER with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use fransis3/EuroBERT-210m-NorNER with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="fransis3/EuroBERT-210m-NorNER", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("fransis3/EuroBERT-210m-NorNER", trust_remote_code=True) model = AutoModelForTokenClassification.from_pretrained("fransis3/EuroBERT-210m-NorNER", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
EuroBERT-210m-NorNER
A Norwegian named entity recognition model fine-tuned from EuroBERT/EuroBERT-210m on the NorNE dataset, covering both Bokmål and Nynorsk.
Model Details
- Author: Fransis Nyka Kolstø
- Base model: EuroBERT/EuroBERT-210m
- Language(s): Norwegian Bokmål (nb), Norwegian Nynorsk (nn)
- Task: Token classification / Named Entity Recognition
- Tagging scheme: IOB2
- License: Apache 2.0
Entity Types
The model predicts 9 entity types using the IOB2 scheme described in the NbAiLab norne dataset
Intended Use
The model is intended for named entity recognition on Norwegian text (Bokmål and Nynorsk), including news, blog posts, parliamentary proceedings, and government reports — reflecting the genre distribution of the NorNE data.
Training Procedure
Training was done in two phases on the NorNE dataset:
Phase 1 — Optimal-step search: The model was trained on the train split with the dev split used for evaluation and early stopping. Training proceeded through a curriculum of increasing input context lengths, allowing the model to adapt progressively from sentence-level to longer multi-sentence contexts.
Phase 2 — Final training: The base model was re-initialized and trained on the combined train + development splits, replaying the same curriculum and learning-rate trajectory as Phase 1, but stopping each stage at the best steps identified in phase 1. This allows the final model to benefit from the additional development data without re-tuning.
Evaluation
Evaluated on the NorNE test split (Bokmål and Nynorsk combined), with entity-level metrics computed via seqeval:
| Metric | Score |
|---|---|
| Precision | 0.7540 |
| Recall | 0.7559 |
| F1 | 0.7550 |
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
model_id = "fransis3/EuroBERT-210m-NorNER"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForTokenClassification.from_pretrained(model_id, trust_remote_code=True)
ner = pipeline("token-classification", model=model, tokenizer=tokenizer, aggregation_strategy="simple")
ner("Erna Solberg besøkte Universitetet i Oslo forrige uke.")
Limitations
- Performance is reported on NorNE's test distribution (news, blogs, parliamentary text, government reports). Generalization to other domains (e.g., social media, clinical text, historical Norwegian) is not guaranteed.
- The model inherits any biases present in its pretraining data (EuroBERT) and in NorNE's source texts.
- The base model is loaded with
trust_remote_code=Trueas required by EuroBERT.
Dataset
NorNE is a named entity annotation layer over the Norwegian Dependency Treebank, covering both Bokmål and Nynorsk.
License
This model is released under the Apache 2.0 license, matching the base model. The NorNE annotations used for training are released under CC0 1.0.
Citation
If you use this model, please cite the underlying resources:
EuroBERT:
@misc{boizard2025eurobertscalingmultilingualencoders,
title={EuroBERT: Scaling Multilingual Encoders for European Languages},
author={Nicolas Boizard and Hippolyte Gisserot-Boukhlef and Duarte M. Alves and André Martins and Ayoub Hammal and Caio Corro and Céline Hudelot and Emmanuel Malherbe and Etienne Malaboeuf and Fanny Jourdan and Gabriel Hautreux and João Alves and Kevin El-Haddad and Manuel Faysse and Maxime Peyrard and Nuno M. Guerreiro and Patrick Fernandes and Ricardo Rei and Pierre Colombo},
year={2025},
eprint={2503.05500},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2503.05500},
}
NorNE:
@misc{jørgensen2020norneannotatingnamedentities,
title={NorNE: Annotating Named Entities for Norwegian},
author={Fredrik Jørgensen and Tobias Aasmoe and Anne-Stine Ruud Husevåg and Lilja Øvrelid and Erik Velldal},
year={2020},
eprint={1911.12146},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/1911.12146},
}
- Downloads last month
- 15
Model tree for fransis3/EuroBERT-210m-NorNER
Base model
EuroBERT/EuroBERT-210mDataset used to train fransis3/EuroBERT-210m-NorNER
Collection including fransis3/EuroBERT-210m-NorNER
Papers for fransis3/EuroBERT-210m-NorNER
EuroBERT: Scaling Multilingual Encoders for European Languages
NorNE: Annotating Named Entities for Norwegian
Evaluation results
- precision on NorNEtest set self-reported0.821
- recall on NorNEtest set self-reported0.826
- Entity-level F1 (seqeval) on NorNEtest set self-reported0.823