Qwen3-VL-Embedding-2B finetuned on Arabic-culture visual document retrieval

This is a sentence-transformers model trained on the pearl-vdr-ar-train-preprocessed dataset. It maps sentences & paragraphs to a 2048-dimensional dense vector space and can be used for retrieval.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Maximum Sequence Length: 262144 tokens
  • Output Dimensionality: 2048 dimensions
  • Similarity Function: Cosine Similarity
  • Supported Modalities: Text, Image, Video, Message
  • Training Dataset:
  • Language: ar
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'transformer_task': 'feature-extraction', 'modality_config': {'text': {'method': 'forward', 'method_output_name': 'last_hidden_state'}, 'image': {'method': 'forward', 'method_output_name': 'last_hidden_state'}, 'video': {'method': 'forward', 'method_output_name': 'last_hidden_state'}, 'message': {'method': 'forward', 'method_output_name': 'last_hidden_state', 'format': 'structured'}}, 'module_output_name': 'token_embeddings', 'processing_kwargs': {'chat_template': {'add_generation_prompt': True}}, 'unpad_inputs': False, 'architecture': 'Qwen3VLModel'})
  (1): Pooling({'embedding_dimension': 2048, 'pooling_mode': 'lasttoken', 'include_prompt': True})
  (2): Normalize({})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("Omartificial-Intelligence-Space/Qwen3-VL-Embedding-2B-Arabic-VDR")
# Run inference
queries = [
    'ما اسم هذه الزهور البيضاء الصغيرة التي تنمو بين الصخور؟',
]
documents = [
    'https://i.ibb.co/svZf6D92/image1.jpg',
    'https://i.ibb.co/spFmq82S/image2.jpg',
    'https://i.ibb.co/mF5BDDsB/image3.jpg'
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# [1, 2048] [3, 2048]
# Get the similarity scores for the embeddings
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[ 0.5869, -0.1090,  0.1076]])

Training Details

Training Dataset

pearl-vdr-ar-train-preprocessed

  • Dataset: pearl-vdr-ar-train-preprocessed at 494822e
  • Size: 48,002 training samples
  • Columns: query, image, and negative_0
  • Approximate statistics based on the first 1000 samples:
    query image negative_0
    type string image image
    details
    • min: 31 tokens
    • mean: 51.45 tokens
    • max: 90 tokens
    • min: 53x96 px
    • mean: 639x540 px
    • max: 800x798 px
    • min: 101x100 px
    • mean: 630x545 px
    • max: 800x787 px
  • Samples:
    query image negative_0
    ما هي التحديات التي تواجه الحرف التقليدية كما يظهر في الصورة، وما هي الحلول الممكنة لمواجهة هذه التحديات؟
    إذا شاركت في ورشة عمل لتعلم كيفية صنع الآلة التي يظهر في الصورة، ما هي الخطوات التي ستحتاج إلى اتباعها لصنعها بشكل صحيح؟
    كيف يختلف العزف على الآلة التي يظهر في الصورة عن العزف على الآلات الوترية الأخرى في المنطقة، وما هي الخصائص الفريدة لهذه الآلة؟
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CachedMultipleNegativesRankingLoss",
        "matryoshka_dims": [
            2048,
            1536,
            1024,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • num_train_epochs: 2
  • learning_rate: 1e-05
  • warmup_steps: 0.03
  • bf16: True
  • per_device_eval_batch_size: 64
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand - `per_device_train_batch_size`: 64 - `num_train_epochs`: 2 - `max_steps`: -1 - `learning_rate`: 1e-05 - `lr_scheduler_type`: linear - `lr_scheduler_kwargs`: None - `warmup_steps`: 0.03 - `optim`: adamw_torch_fused - `optim_args`: None - `weight_decay`: 0.0 - `adam_beta1`: 0.9 - `adam_beta2`: 0.999 - `adam_epsilon`: 1e-08 - `optim_target_modules`: None - `gradient_accumulation_steps`: 1 - `average_tokens_across_devices`: True - `max_grad_norm`: 1.0 - `label_smoothing_factor`: 0.0 - `bf16`: True - `fp16`: False - `bf16_full_eval`: False - `fp16_full_eval`: False - `tf32`: None - `gradient_checkpointing`: False - `gradient_checkpointing_kwargs`: None - `torch_compile`: False - `torch_compile_backend`: None - `torch_compile_mode`: None - `use_liger_kernel`: False - `liger_kernel_config`: None - `use_cache`: False - `neftune_noise_alpha`: None - `torch_empty_cache_steps`: None - `auto_find_batch_size`: False - `log_on_each_node`: True - `logging_nan_inf_filter`: True - `include_num_input_tokens_seen`: no - `log_level`: passive - `log_level_replica`: warning - `disable_tqdm`: False - `project`: huggingface - `trackio_space_id`: trackio - `per_device_eval_batch_size`: 64 - `prediction_loss_only`: True - `eval_on_start`: False - `eval_do_concat_batches`: True - `eval_use_gather_object`: False - `eval_accumulation_steps`: None - `include_for_metrics`: [] - `batch_eval_metrics`: False - `save_only_model`: False - `save_on_each_node`: False - `enable_jit_checkpoint`: False - `push_to_hub`: False - `hub_private_repo`: None - `hub_model_id`: None - `hub_strategy`: every_save - `hub_always_push`: False - `hub_revision`: None - `load_best_model_at_end`: False - `ignore_data_skip`: False - `restore_callback_states_from_checkpoint`: False - `full_determinism`: False - `seed`: 42 - `data_seed`: None - `use_cpu`: False - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} - `parallelism_config`: None - `dataloader_drop_last`: False - `dataloader_num_workers`: 0 - `dataloader_pin_memory`: True - `dataloader_persistent_workers`: False - `dataloader_prefetch_factor`: None - `remove_unused_columns`: True - `label_names`: None - `train_sampling_strategy`: random - `length_column_name`: length - `ddp_find_unused_parameters`: None - `ddp_bucket_cap_mb`: None - `ddp_broadcast_buffers`: False - `ddp_backend`: None - `ddp_timeout`: 1800 - `fsdp`: [] - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} - `deepspeed`: None - `debug`: [] - `skip_memory_metrics`: True - `do_predict`: False - `resume_from_checkpoint`: None - `warmup_ratio`: None - `local_rank`: -1 - `prompts`: None - `batch_sampler`: no_duplicates - `multi_dataset_batch_sampler`: proportional - `router_mapping`: {} - `learning_rate_mapping`: {}

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

CachedMultipleNegativesRankingLoss

@misc{gao2021scaling,
    title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
    author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
    year={2021},
    eprint={2101.06983},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

Pearl Dataset

If you use this dataset or the accompanying benchmarks, please cite our paper:

@inproceedings{alwajih-etal-2025-pearl,
    title = "Pearl: A Multimodal Culturally-Aware {A}rabic Instruction Dataset",
    author = "Alwajih, Fakhraddin  and
      Magdy, Samar M.  and
      El Mekki, Abdellah  and
      Nacar, Omer  and
      Nafea, Youssef  and
      Abdelfadil, Safaa Taher  and
      Yahya, Abdulfattah Mohammed  and
      Luqman, Hamzah  and
      Almarwani, Nada  and
      Aloufi, Samah  and
      Qawasmeh, Baraah  and
      Atou, Houdaifa  and
      Sibaee, Serry  and
      Alsayadi, Hamzah A.  and
      Al-Dhabyani, Walid  and
      Al-shaibani, Maged S.  and
      El aatar, Aya  and
      Qandos, Nour  and
      Alhamouri, Rahaf  and
      Ahmad, Samar  and
      AL-Ghrawi, Mohammed Anwar  and
      Yacoub, Aminetou  and
      AbuHweidi, Ruwa  and
      Lemin, Vatimetou Mohamed  and
      Abdel-Salam, Reem  and
      Bashiti, Ahlam  and
      Ammar, Adel  and
      Alansari, Aisha  and
      Ashraf, Ahmed  and
      Alturayeif, Nora  and
      Alcoba Inciarte, Alcides  and
      Elmadany, AbdelRahim A.  and
      Tourad, Mohamedou Cheikh  and
      Berrada, Ismail  and
      Jarrar, Mustafa  and
      Shehata, Shady  and
      Abdul-Mageed, Muhammad",
    editor = "Christodoulopoulos, Christos  and
      Chakraborty, Tanmoy  and
      Rose, Carolyn  and
      Peng, Violet",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2025",
    month = nov,
    year = "2025",
    address = "Suzhou, China",
    publisher = "Association for Computational Linguistics",
    url = "[https://aclanthology.org/2025.findings-emnlp.1254/](https://aclanthology.org/2025.findings-emnlp.1254/)",
    pages = "23048--23079",
    ISBN = "979-8-89176-335-7"
}
Downloads last month
46
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Omartificial-Intelligence-Space/Qwen3-VL-Embedding-2B-Arabic-VDR

Collection including Omartificial-Intelligence-Space/Qwen3-VL-Embedding-2B-Arabic-VDR

Papers for Omartificial-Intelligence-Space/Qwen3-VL-Embedding-2B-Arabic-VDR

Article mentioning Omartificial-Intelligence-Space/Qwen3-VL-Embedding-2B-Arabic-VDR