3ea0ede778568b172acdd1c30d586250

This model is a fine-tuned version of google/umt5-base on the Helsinki-NLP/opus_books [de-nl] dataset. It achieves the following results on the evaluation set:

Loss: 2.2399
Data Size: 1.0
Epoch Runtime: 89.5701
Bleu: 9.0155

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Bleu
No log	0	0	11.7590	0	7.7775	0.2918
No log	1	390	11.4207	0.0078	9.5688	0.2437
No log	2	780	10.5088	0.0156	9.8202	0.2587
No log	3	1170	9.8258	0.0312	11.8070	0.3061
No log	4	1560	8.6899	0.0625	14.5774	0.2775
0.7945	5	1950	7.3758	0.125	19.7967	0.4291
1.3976	6	2340	4.5789	0.25	30.7220	2.1343
4.6232	7	2730	3.1818	0.5	48.7698	3.9774
3.5482	8.0	3120	2.7705	1.0	89.3951	5.5113
3.2315	9.0	3510	2.6142	1.0	89.0712	6.2093
3.0672	10.0	3900	2.5261	1.0	90.1843	6.6777
2.9518	11.0	4290	2.4727	1.0	88.6460	6.9571
2.8355	12.0	4680	2.4302	1.0	88.9321	7.2129
2.7882	13.0	5070	2.4019	1.0	89.4192	7.4252
2.7007	14.0	5460	2.3723	1.0	88.1363	7.6036
2.572	15.0	5850	2.3563	1.0	88.8658	7.7266
2.5441	16.0	6240	2.3282	1.0	87.2663	7.8832
2.5123	17.0	6630	2.3119	1.0	89.0349	7.9350
2.4375	18.0	7020	2.3041	1.0	88.9797	8.0609
2.3963	19.0	7410	2.2749	1.0	89.8775	8.1993
2.3644	20.0	7800	2.2755	1.0	88.7370	8.2685
2.3253	21.0	8190	2.2645	1.0	88.8354	8.4198
2.2549	22.0	8580	2.2536	1.0	89.4994	8.4095
2.2268	23.0	8970	2.2532	1.0	90.1168	8.4829
2.1798	24.0	9360	2.2398	1.0	89.9066	8.5593
2.1351	25.0	9750	2.2460	1.0	88.8773	8.6257
2.1097	26.0	10140	2.2371	1.0	90.3088	8.6773
2.0499	27.0	10530	2.2315	1.0	88.5118	8.7752
2.0556	28.0	10920	2.2307	1.0	88.4823	8.8357
2.0289	29.0	11310	2.2231	1.0	89.2264	8.8561
1.9845	30.0	11700	2.2203	1.0	89.0804	8.9314
1.9124	31.0	12090	2.2312	1.0	89.6350	8.9321
1.8824	32.0	12480	2.2291	1.0	88.9485	8.9577
1.8683	33.0	12870	2.2373	1.0	90.5498	9.0160
1.8397	34.0	13260	2.2399	1.0	89.5701	9.0155

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.2.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

1.0B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for contemmcm/3ea0ede778568b172acdd1c30d586250

Base model

google/umt5-base

Finetuned

(47)

this model

contemmcm
/

3ea0ede778568b172acdd1c30d586250

3ea0ede778568b172acdd1c30d586250

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for contemmcm/3ea0ede778568b172acdd1c30d586250

Evaluation results