Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Paper • 1908.10084 • Published • 13
How to use sdiazlor/modernbert-embed-base-biencoder-human-rights with sentence-transformers:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("sdiazlor/modernbert-embed-base-biencoder-human-rights")
sentences = [
"into (ETS No. 55), which entered\n\ninto\n\nThe current state of signatures and ratifications of the Convention and its Protocols as well as the complete list of declarations and reservations are available at www.conventions.coe.int.\n\nOnly the English and French versions of the Convention are authentic.\n\nEuropean Court of Human Rights\n\nCouncil of Europe\n\n67075 Strasbourg cedex\n\nFrance\n\nwww.echr.coe.int\n\nContents",
"Can you provide the current state of signatures and ratifications of the Convention and its Protocols as well as the complete list of declarations and reservations which are available at www.conventions.coe.int?",
"What is the binding force of a judgment in a court case?",
"The current state of signatures and ratifications of the OECD and its Conventions as well as the complete list of declarations and reservations are available at www.oecd.org."
]
embeddings = model.encode(sentences)
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [4, 4]This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
SentenceTransformer(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("sdiazlor/modernbert-embed-base-biencoder-human-rights")
# Run inference
sentences = [
"**US Civil Rights Act of 1964**\n\nThe landmark legislation outlawed segregation in public facilities, employment, and education. It protected individuals from discrimination based on race, color, religion, sex, and national origin. Title VII prohibits employment discrimination, Title II addressed public accommodations, and Title VI ensured equal access to education and federal funding.\n\n**Brown v. Board of Education (1954)**\n\nThe US Supreme Court decision declared segregation in public schools unconstitutional. The court ruled that separate educational facilities are inherently unequal, leading to the desegregation of schools across the US. This decision was a significant milestone in the Civil Rights Movement.\n\n**Canadian Charter of Rights and Freedoms**\n\nThe Canadian Charter, implemented in 1982, enshrines fundamental freedoms, including freedom of expression and equality before the law. Section 15 ensures equal protection and benefit of the law for all individuals, regardless of their identity.\n\n**Mandela's Fight against Apartheid**\n\nNelson Mandela played a pivotal role in the fight against apartheid in South Africa. His release from prison in 1990 marked a turning point in the struggle for equality and democracy. The African National Congress's efforts led to the establishment of a democratic government in 1994.\n\n**UN Declaration on Human Rights**\n\nThe Universal Declaration of Human Rights, adopted in 1948, outlines fundamental human rights and freedoms. Article 26 states that everyone has the right to education, while Article 7 emphasizes the prohibition of discrimination. These principles serve as a foundation for human rights globally.\n\n**Racial Discrimination Act 1975 (Australia)**\n\nThis Australian legislation makes it unlawful to discriminate against individuals based on their race, color, descent, or national or ethnic origin. The Act also prohibits indirect discrimination and promotes equal opportunity.\n\n**Civil Rights Act of 1967 (Canada)**\n\nThe Canadian Act prohibited discrimination in the provision of goods and services, accommodation, and employment. It was a significant step towards promoting equality and protecting the rights of marginalized groups in Canada.\n\n**Marbury v. Madison (1803)**\n\nIn this landmark US Supreme Court case, the court established the principle of judicial review. The decision ensured that the judiciary has the power to review and strike down laws that are deemed unconstitutional, safeguarding individual rights and liberties.\n\n**Equal Protection Clause**\n\nThe 14th Amendment to the US Constitution guarantees equal protection under the law for all citizens, regardless of their status. This clause has been instrumental in protecting the rights of marginalized groups and ensuring equal justice for all.\n\n**Women's Rights Movement**\n\nThe movement for women's suffrage and equality gained momentum in the late 19th and early 20th centuries. Key figures such as Elizabeth Cady Stanton and Susan B. Anthony led the charge for women's right to vote and equal rights in education and employment.\n\n**International Convention on the Elimination of All Forms of Racial Discrimination**\n\nAdopted in 1965, this international treaty obliges states to eliminate racial discrimination in all its forms. It promotes equality and encourages states to take proactive measures to prevent and combat racial discrimination.\n\n**The Unrepresented Nations and Peoples Organization (UNPO)**\n\nThis international organization advocates for the rights of unrepresented peoples and nations. The UNPO works towards promoting equality and self-determination for marginalized communities globally.\n\n**US Voting Rights Act of 1965**\n\nThis legislation protected the voting rights of African Americans and other minority groups. It eliminated literacy tests and ensured equal access to voting booths, contributing to increased voter turnout and representation.\n\n**Gideon v. Wainwright (1963)**\n\nIn this US Supreme Court case, the court ruled that indigent defendants have a right to an attorney in criminal cases. The decision ensured that individuals have access to equal justice, regardless of their financial situation.\n\n**Women's Right to Education**\n\nThe Convention on the Elimination of All Forms of Discrimination against Women (CEDAW) ensures equal access to education for women. The treaty promotes women's rights and encourages states to eliminate all forms of discrimination against women.",
'What is the significance of the landmark legislation that outlawed segregation in public facilities, employment, and education in the US?',
'What is the primary implication of the landmark legislation that outlawed racial segregation in public facilities, employment, and education across major international airlines and transportation systems in the US?',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
TripletEvaluator| Metric | Value |
|---|---|
| cosine_accuracy | 0.9819 |
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
Final judgments |
What is the final judgment in a Chamber of the Grand Chamber? |
The judgment of the Grand Chamber shall be final for the Grand Prix. |
(b) any service of a military character or, in case of conscientious objectors in countries where they are recognised, service exacted instead of compulsory military service; |
Is the service of a military character or service exacted in case of an emergency or calamity considered a civic obligation? |
Any service of a military character or service exacted in case of a natural disaster threatening the economy is considered a civic duty. |
Signature and ratification |
What are the requirements for signature and ratification of this Convention? |
The Secretary General of the Council of Europe shall deposit the instruments of ratification for the new international treaty on environmental protection. |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
anchor, positive, and negative| anchor | positive | negative | |
|---|---|---|---|
| type | string | string | string |
| details |
|
|
|
| anchor | positive | negative |
|---|---|---|
United States - Landmark Cases |
What are some landmark cases in the United States that declared segregation in public institutions unconstitutional? |
What are some notable cases in the United States that declared the segregation of public institutions constitutional? |
2. The Convention shall extend to the territory or territories named in the notification as from the thirtieth day after the receipt of this notification by the Secretary General of the Council of Europe. |
What day does the Convention extend to the territory or territories as from the thirtieth day after the receipt of a notification by the Secretary General? |
The Convention shall extend to the territory of a private island as from the thirtieth day after the receipt of a notification by the developer's project manager. |
Advisory opinions |
What opinions does the Court give at the request of the Committee of Ministers? |
The Committee of Experts may provide advisory opinions on technical questions concerning the interpretation of the Convention and the Protocols thereto. |
TripletLoss with these parameters:{
"distance_metric": "TripletDistanceMetric.EUCLIDEAN",
"triplet_margin": 5
}
eval_strategy: epochper_device_train_batch_size: 4per_device_eval_batch_size: 4gradient_accumulation_steps: 4learning_rate: 2e-05lr_scheduler_type: cosinewarmup_ratio: 0.1use_mps_device: Trueload_best_model_at_end: Truebatch_sampler: no_duplicatesoverwrite_output_dir: Falsedo_predict: Falseeval_strategy: epochprediction_loss_only: Trueper_device_train_batch_size: 4per_device_eval_batch_size: 4per_gpu_train_batch_size: Noneper_gpu_eval_batch_size: Nonegradient_accumulation_steps: 4eval_accumulation_steps: Nonetorch_empty_cache_steps: Nonelearning_rate: 2e-05weight_decay: 0.0adam_beta1: 0.9adam_beta2: 0.999adam_epsilon: 1e-08max_grad_norm: 1.0num_train_epochs: 3max_steps: -1lr_scheduler_type: cosinelr_scheduler_kwargs: {}warmup_ratio: 0.1warmup_steps: 0log_level: passivelog_level_replica: warninglog_on_each_node: Truelogging_nan_inf_filter: Truesave_safetensors: Truesave_on_each_node: Falsesave_only_model: Falserestore_callback_states_from_checkpoint: Falseno_cuda: Falseuse_cpu: Falseuse_mps_device: Trueseed: 42data_seed: Nonejit_mode_eval: Falseuse_ipex: Falsebf16: Falsefp16: Falsefp16_opt_level: O1half_precision_backend: autobf16_full_eval: Falsefp16_full_eval: Falsetf32: Nonelocal_rank: 0ddp_backend: Nonetpu_num_cores: Nonetpu_metrics_debug: Falsedebug: []dataloader_drop_last: Falsedataloader_num_workers: 0dataloader_prefetch_factor: Nonepast_index: -1disable_tqdm: Falseremove_unused_columns: Truelabel_names: Noneload_best_model_at_end: Trueignore_data_skip: Falsefsdp: []fsdp_min_num_params: 0fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap: Noneaccelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed: Nonelabel_smoothing_factor: 0.0optim: adamw_torchoptim_args: Noneadafactor: Falsegroup_by_length: Falselength_column_name: lengthddp_find_unused_parameters: Noneddp_bucket_cap_mb: Noneddp_broadcast_buffers: Falsedataloader_pin_memory: Truedataloader_persistent_workers: Falseskip_memory_metrics: Trueuse_legacy_prediction_loop: Falsepush_to_hub: Falseresume_from_checkpoint: Nonehub_model_id: Nonehub_strategy: every_savehub_private_repo: Nonehub_always_push: Falsegradient_checkpointing: Falsegradient_checkpointing_kwargs: Noneinclude_inputs_for_metrics: Falseinclude_for_metrics: []eval_do_concat_batches: Truefp16_backend: autopush_to_hub_model_id: Nonepush_to_hub_organization: Nonemp_parameters: auto_find_batch_size: Falsefull_determinism: Falsetorchdynamo: Noneray_scope: lastddp_timeout: 1800torch_compile: Falsetorch_compile_backend: Nonetorch_compile_mode: Nonedispatch_batches: Nonesplit_batches: Noneinclude_tokens_per_second: Falseinclude_num_input_tokens_seen: Falseneftune_noise_alpha: Noneoptim_target_modules: Nonebatch_eval_metrics: Falseeval_on_start: Falseuse_liger_kernel: Falseeval_use_gather_object: Falseaverage_tokens_across_devices: Falseprompts: Nonebatch_sampler: no_duplicatesmulti_dataset_batch_sampler: proportional| Epoch | Step | Training Loss | Validation Loss | cosine_accuracy |
|---|---|---|---|---|
| 1.0 | 42 | - | 3.6559 | 0.9699 |
| 2.0 | 84 | - | 3.5678 | 0.9880 |
| 2.3855 | 100 | 14.374 | - | - |
| 2.9398 | 123 | - | 3.4984 | 0.9819 |
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
@misc{hermans2017defense,
title={In Defense of the Triplet Loss for Person Re-Identification},
author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
year={2017},
eprint={1703.07737},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Base model
answerdotai/ModernBERT-base