SpanMarker-DistilRoBERTa for Climate Research NER
This model is a SpanMarker model fine-tuned for fine-grained Named Entity Recognition (NER) in the climate change research domain, extracting 28 distinct entity types. It utilizes distilbert/distilroberta-base as the underlying encoder.
📌 Model Details
- Model Type: SpanMarker
- Encoder: distilbert/distilroberta-base
- Maximum Sequence Length: 512 tokens
- Maximum Entity Length: 14 words
- Language: English
- License: cc-by-sa-4.0
Model Labels
| Label | Examples |
|---|---|
| Asset | "raw material", "water resources", "mental health" |
| Body Part | "leaves", "plant leaves", "deep tissue compartment" |
| Body of Water | "rivers", "Dhaleshwari river", "peripheral rivers" |
| Chemical | "domoic acid", "marine algal toxin", "cathode materials" |
| Disease | "acute neurologic signs", "chronic epileptic syndrome", "seizures" |
| Ecosystem | "cloud forests", "Tropical montane cloud forest", "polluted environment" |
| Energy Source | "fossil fuels", "battery cells", "12-cell series battery-pack prototype" |
| Field of Study | "veterinary medicine", "study", "reference laboratory" |
| Geographical Feature | "mountainous regions", "low point", "heterogenous topography" |
| Intellectual Artefact | "Veterinary medical records", "Daily husbandry records", "data" |
| Location | "wild", "Westbrook", "beaches" |
| Mathematical Expression | "difference", "gradient", "Stepwise machine hour constraints" |
| Measuring Device | "EEG", "MRI scan", "station" |
| Meteorological Phenomenon | "rainfall", "climate change", "climatic variability" |
| Method | "clinical efficacy", "dosing", "serum monitoring" |
| Natural Disaster | "heavy metal contamination", "environmental pollution", "seasonal air pollution" |
| Natural Phenomenon | "changing ocean conditions", "algal blooms", "biochemical changes" |
| Organism | "California sea lions", "Zalophus californianus", "species" |
| Organization | "NOAA National Marine Fisheries Service", "long-term care facility", "reference laboratory" |
| Other | "reports", "marine mammal health", "normal eating" |
| Person | "Clinicians", "staff", "clinicians" |
| Physical Artefact | "paved east – west road", "EVs", "electric vehicle" |
| Physical Phenomenon | "seasonal changes", "normal food intake", "structural abnormalities" |
| Policy | "safety", "energy security", "pollution" |
| Quantity | ">", "200 mAhg − 1", "energy density" |
| Satellite | "satellites", "TRMM", "Tropical Rainfall Measuring Mission" |
| System | "system structure", "climate", "global overturning circulation" |
| Time Period | "several decades", "101 days", "periods of prolonged anorexia" |
🚀 Main Results (Selected Checkpoint)
This repository provides the best-performing checkpoint selected from 5 runs with different random seeds. While the internal training logs tracked performance on the validation split of CliReNERsilver, the final model selection and the metrics below are evaluated on the independent, expert-annotated CliReNERgold dataset.
| Metric | Score |
|---|---|
| Precision | 55.90 |
| Recall | 45.26 |
| F1 | 50.02 |
This checkpoint corresponds to the seed with the highest strict F1 on the gold evaluation set (Seed 3 - 3012).
📊 Results Across Seeds
We fine-tuned the model using 5 different random seeds to assess the stability and robustness of the architecture on the domain-specific text.
| Seed | Precision | Recall | Strict F1 |
|---|---|---|---|
| 1 | 51.83 | 45.02 | 48.19 |
| 2 | 54.55 | 38.73 | 45.29 |
| 3 | 55.90 | 45.26 | 50.02 |
| 4 | 50.34 | 39.38 | 44.19 |
| 5 | 52.62 | 42.20 | 46.84 |
Summary:
- F1: mean = 46.91, std = 2.31
- Precision: mean = 53.05, std = 2.20
- Recall: mean = 42.12, std = 3.05
Model Selection Strategy: The uploaded checkpoint is the single best seed (highest strict F1 on the gold dataset), ensuring strong real-world performance and high-fidelity alignment with domain-expert consensus.
📂 Dataset & Evaluation
- Training Dataset: CliReNERsilver
- Splits used: Stratified 80:10:10 ratio (Train/Validation/Test). The 80% split was used for training.
- Evaluation Dataset: CliReNERgold
- Splits used: Evaluated on the combined 192 sentences (expert-annotated via Weighted Expert Voting).
- Preprocessing:
- Texts were tokenized using the tokenizer corresponding to the DistilRoBERTa encoder.
- The dataset utilizes a flat NER schema (nested entities are excluded, and overlapping entities are resolved to the most relevant span).
- Metric Details:
- F1 type: Strict F1 (Entity-level exact match).
- Evaluation was performed ensuring entities match both the exact boundary span and the exact semantic label to be considered correct.
⚖️ Precision vs Recall Behavior
(Note to author: Describe the model’s tendency here based on your results. Example: "The model exhibits a balanced precision and recall profile.")
⚙️ Usage
Direct Use for Inference
Because this model was trained using the SpanMarker framework, it requires the span_marker library for inference.
pip install span_marker
from span_marker import SpanMarkerModel
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("P0L3/CliReNER-distilroberta-base")
# Run inference
text = "The volume of climate-related literature is expanding exponentially; publications indexed since 2020 already exceed the total output of the preceding decade by 11% (Pan et al. 2025)."
entities = model.predict(text)
for entity in entities:
print(f"Entity: {entity['span']} | Label: {entity['label']} | Score: {entity['score']:.4f}")
# Entity: climate-related literature | Label: Intellectual Artefact | Score: 0.6282
# Entity: 2020 | Label: Time Period | Score: 0.9316
# Entity: total output | Label: Quantity | Score: 0.6781
# Entity: preceding decade | Label: Time Period | Score: 0.9217
# Entity: 11% | Label: Quantity | Score: 0.9689
# Entity: 2025 | Label: Time Period | Score: 0.7341
📉 Training Details
Training Set Metrics
| Training set | Min | Median | Max |
|---|---|---|---|
| Sentence length | 3 | 31.4819 | 97 |
| Entities per sentence | 1 | 7.0100 | 22 |
Training Hyperparameters
- learning_rate: 5e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 3012
- gradient_accumulation_steps: 2
- total_train_batch_size: 16
- optimizer: adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 20
Training Results (CliReNERsilver Validation Split)
| Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
|---|---|---|---|---|---|---|
| 1.0 | 62 | 0.1543 | 0.0 | 0.0 | 0.0 | 0.6075 |
| 2.0 | 124 | 0.0953 | 0.3810 | 0.0115 | 0.0223 | 0.6096 |
| 3.0 | 186 | 0.0573 | 0.5535 | 0.2970 | 0.3866 | 0.7244 |
| 4.0 | 248 | 0.0461 | 0.5996 | 0.4792 | 0.5327 | 0.7932 |
| 5.0 | 310 | 0.0437 | 0.6058 | 0.5380 | 0.5699 | 0.8192 |
| 6.0 | 372 | 0.0433 | 0.6036 | 0.5308 | 0.5649 | 0.8174 |
| 7.0 | 434 | 0.0442 | 0.6121 | 0.5681 | 0.5893 | 0.8268 |
| 8.0 | 496 | 0.0449 | 0.6196 | 0.5725 | 0.5951 | 0.8310 |
| 9.0 | 558 | 0.0469 | 0.6107 | 0.5897 | 0.6 | 0.8316 |
Framework Versions
- Python: 3.10.19
- SpanMarker: 1.7.0
- Transformers: 4.50.0
- PyTorch: 2.9.1+cu126
- Datasets: 3.0.0
- Tokenizers: 0.21.4
📚 Citation
If you use this model or the CliReNER datasets in your research, please cite:
@misc{poleksic2026named,
author = {Poleksić, Andrija and Martinčić-Ipšić, Sanda},
title = {Named Entity Recognition for Climate Change Research},
year = {2026},
howpublished = {Research Square},
note = {Preprint}
}
Please also acknowledge the SpanMarker framework:
@software{Aarsen_SpanMarker,
author = {Aarsen, Tom},
license = {Apache-2.0},
title = {{SpanMarker for Named Entity Recognition}},
url = {https://github.com/tomaarsen/SpanMarkerNER}
}
- Downloads last month
- 10
Model tree for P0L3/CliReNER-distilroberta-base
Base model
distilbert/distilroberta-baseDatasets used to train P0L3/CliReNER-distilroberta-base
Collection including P0L3/CliReNER-distilroberta-base
Evaluation results
- F1 on CliReNER_silverself-reported0.600
- Precision on CliReNER_silverself-reported0.611
- Recall on CliReNER_silverself-reported0.590