jhgan/ko-sroberta-multitask-mrl

This is a sentence-transformers model fine-tuned with Matryoshka Representation Learning (MRL) on top of klue/roberta-base. The model produces 768-dim embeddings, but the first m dims for m ∈ {768, 512, 256, 128, 64, 32} are themselves valid sentence representations — you can slice the embedding to trade accuracy for storage/latency without retraining.

Training data: KorNLI + KorSTS (multi-task)
Base model: klue/roberta-base
Nested dims: 768, 512, 256, 128, 64, 32

KorSTS test set results

All values are reported as percentages (×100).

dim	cosine_pearson	cosine_spearman	euclidean_pearson	euclidean_spearman	manhattan_pearson	manhattan_spearman	dot_pearson	dot_spearman
768	84.24	85.07	83.60	84.13	83.62	84.17	82.75	82.71
512	84.01	85.00	83.51	84.08	83.56	84.13	82.29	82.34
256	83.44	84.61	83.07	83.72	83.00	83.68	80.63	80.58
128	82.51	83.98	82.38	83.07	82.23	82.95	77.68	77.64
64	81.43	83.32	81.45	82.12	81.16	81.92	74.41	74.52
32	78.62	81.36	79.29	80.01	78.49	79.39	67.97	67.79

Usage

Full 768-dim embedding

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jhgan/ko-sroberta-multitask-mrl")
embeddings = model.encode(["안녕하세요", "반갑습니다"])
print(embeddings.shape)  # (2, 768)

Truncated embedding (recommended pattern)

import torch.nn.functional as F
from sentence_transformers import SentenceTransformer

model = SentenceTransformer("jhgan/ko-sroberta-multitask-mrl")
emb = model.encode(["안녕하세요", "반갑습니다"], convert_to_tensor=True)

# Slice to the first 64 dims and re-normalise for cosine similarity
emb_64 = F.normalize(emb[:, :64], p=2, dim=1)

Or fix the truncation at load time (requires sentence-transformers >= 2.7.0):

model = SentenceTransformer("jhgan/ko-sroberta-multitask-mrl", truncate_dim=64)

Citation

@inproceedings{kusupati2022matryoshka,
    title     = {Matryoshka Representation Learning},
    author    = {Kusupati, Aditya and Bhatt, Gantavya and Rege, Aniket and
                 Wallingford, Matthew and Sinha, Aditya and Ramanujan, Vivek and
                 Howard-Snyder, William and Chen, Kaifeng and Kakade, Sham and
                 Jain, Prateek and Farhadi, Ali},
    booktitle = {Advances in Neural Information Processing Systems},
    year      = {2022},
    url       = {https://arxiv.org/abs/2205.13147}
}

This model is part of the ko-sentence-transformers project; see the repository for training scripts and the non-MRL baselines (jhgan/ko-sroberta-sts, jhgan/ko-sroberta-nli, jhgan/ko-sroberta-multitask).

Downloads last month: 19

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for jhgan/ko-sroberta-multitask-mrl

Base model

klue/roberta-base

Finetuned

(466)

this model

Datasets used to train jhgan/ko-sroberta-multitask-mrl

Paper for jhgan/ko-sroberta-multitask-mrl

Matryoshka Representation Learning

Paper • 2205.13147 • Published May 26, 2022 • 27