QMD Query Expansion 1.7B v2

Fine-tuned Qwen3-1.7B for query expansion in QMD's hybrid retrieval pipeline.

Output Format

hyde: A hypothetical document passage that would answer the query.
lex: keyword1
lex: keyword2
vec: semantic query reformulation
vec: another semantic variation

Improvements over v1

Improvement	v1	v2
LoRA rank	16	64
LoRA target	q/k/v/o/gate/up/down_proj	all-linear
LoRA alpha	32	64
LoRA dropout	0.0	0.05
Epochs	5	2
Max length	512	1024
Gradient checkpointing	❌	✅
Dataset size	2806 train (unversioned)	2806 train (v2 on Hub)

Training Results

Train time: ~10 minutes (2 epochs, a10g-large)
Final train loss: 0.79
Final eval loss: 0.74
Final token accuracy: 81.5%

Evaluation: v1 vs v2

Tested on 51 queries covering technical docs, short queries, named entities, personal notes, research, errors, temporal, complex, entity preservation, quoted phrases, negation, and /only: modes.

Metric	v1	v2	Delta
Average score	36.6/100	40.4/100	+3.8
v2 wins / v1 wins / ties	—	11	2

Biggest improvements (all on /only: mode — the hardest edge cases):

Query	v1	v2
`auth /only:lex`	20	50
`React hooks tutorial /only:lex`	20	50
`kubernetes pod deployment /only:vec`	20	50
`how to configure authentication /only:vec`	20	50
`TDS motorsports history /only:hyde`	10	40
`AWS Lambda cold start /only:hyde`	5	40

v2 is significantly better at respecting /only: constraints and generating the correct output type.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-1.7B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "tobil/qmd-query-expansion-1.7B-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "/no_think Expand this search query: kubernetes pod networking"}],
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=400, do_sample=False)
result = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(result)

Training Details

Framework: TRL 1.2.0, Transformers 5.6.2, PyTorch 2.11.0
Hardware: a10g-large (24GB GPU)
Optimizer: AdamW with cosine LR schedule
LR: 2e-4, warmup: 10 steps, effective batch: 16
Dataset: tobil/qmd-query-expansion-train-v2

Citation

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tobil/qmd-query-expansion-1.7B-v2

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(747)

this model