jhgan/ko-sroberta-multitask-mrl

This is a sentence-transformers model fine-tuned with Matryoshka Representation Learning (MRL) on top of klue/roberta-base. The model produces 768-dim embeddings, but the first m dims for m ∈ {768, 512, 256, 128, 64, 32} are themselves valid sentence representations β€” you can slice the embedding to trade accuracy for storage/latency without retraining.

  • Training data: KorNLI + KorSTS (multi-task)
  • Base model: klue/roberta-base
  • Nested dims: 768, 512, 256, 128, 64, 32

KorSTS test set results

All values are reported as percentages (Γ—100).

dim cosine_pearson cosine_spearman euclidean_pearson euclidean_spearman manhattan_pearson manhattan_spearman dot_pearson dot_spearman
768 84.24 85.07 83.60 84.13 83.62 84.17 82.75 82.71
512 84.01 85.00 83.51 84.08 83.56 84.13 82.29 82.34
256 83.44 84.61 83.07 83.72 83.00 83.68 80.63 80.58
128 82.51 83.98 82.38 83.07 82.23 82.95 77.68 77.64
64 81.43 83.32 81.45 82.12 81.16 81.92 74.41 74.52
32 78.62 81.36 79.29 80.01 78.49 79.39 67.97 67.79

Usage

Full 768-dim embedding

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jhgan/ko-sroberta-multitask-mrl")
embeddings = model.encode(["μ•ˆλ…•ν•˜μ„Έμš”", "λ°˜κ°‘μŠ΅λ‹ˆλ‹€"])
print(embeddings.shape)  # (2, 768)

Truncated embedding (recommended pattern)

import torch.nn.functional as F
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jhgan/ko-sroberta-multitask-mrl")
emb = model.encode(["μ•ˆλ…•ν•˜μ„Έμš”", "λ°˜κ°‘μŠ΅λ‹ˆλ‹€"], convert_to_tensor=True)

# Slice to the first 64 dims and re-normalise for cosine similarity
emb_64 = F.normalize(emb[:, :64], p=2, dim=1)

Or fix the truncation at load time (requires sentence-transformers >= 2.7.0):

model = SentenceTransformer("jhgan/ko-sroberta-multitask-mrl", truncate_dim=64)

Citation

@inproceedings{kusupati2022matryoshka,
    title     = {Matryoshka Representation Learning},
    author    = {Kusupati, Aditya and Bhatt, Gantavya and Rege, Aniket and
                 Wallingford, Matthew and Sinha, Aditya and Ramanujan, Vivek and
                 Howard-Snyder, William and Chen, Kaifeng and Kakade, Sham and
                 Jain, Prateek and Farhadi, Ali},
    booktitle = {Advances in Neural Information Processing Systems},
    year      = {2022},
    url       = {https://arxiv.org/abs/2205.13147}
}

This model is part of the ko-sentence-transformers project; see the repository for training scripts and the non-MRL baselines (jhgan/ko-sroberta-sts, jhgan/ko-sroberta-nli, jhgan/ko-sroberta-multitask).

Downloads last month
19
Safetensors
Model size
0.1B params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for jhgan/ko-sroberta-multitask-mrl

Finetuned
(466)
this model

Datasets used to train jhgan/ko-sroberta-multitask-mrl

Paper for jhgan/ko-sroberta-multitask-mrl