QMD Query Expansion 1.7B v2

Fine-tuned Qwen3-1.7B for query expansion in QMD's hybrid retrieval pipeline.

Output Format

hyde: A hypothetical document passage that would answer the query.
lex: keyword1
lex: keyword2
vec: semantic query reformulation
vec: another semantic variation

Improvements over v1

Improvement v1 v2
LoRA rank 16 64
LoRA target q/k/v/o/gate/up/down_proj all-linear
LoRA alpha 32 64
LoRA dropout 0.0 0.05
Epochs 5 2
Max length 512 1024
Gradient checkpointing
Dataset size 2806 train (unversioned) 2806 train (v2 on Hub)

Training Results

  • Train time: ~10 minutes (2 epochs, a10g-large)
  • Final train loss: 0.79
  • Final eval loss: 0.74
  • Final token accuracy: 81.5%

Evaluation: v1 vs v2

Tested on 51 queries covering technical docs, short queries, named entities, personal notes, research, errors, temporal, complex, entity preservation, quoted phrases, negation, and /only: modes.

Metric v1 v2 Delta
Average score 36.6/100 40.4/100 +3.8
v2 wins / v1 wins / ties 11 2

Biggest improvements (all on /only: mode — the hardest edge cases):

Query v1 v2
auth /only:lex 20 50
React hooks tutorial /only:lex 20 50
kubernetes pod deployment /only:vec 20 50
how to configure authentication /only:vec 20 50
TDS motorsports history /only:hyde 10 40
AWS Lambda cold start /only:hyde 5 40

v2 is significantly better at respecting /only: constraints and generating the correct output type.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-1.7B",
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(base, "tobil/qmd-query-expansion-1.7B-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")

prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "/no_think Expand this search query: kubernetes pod networking"}],
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=400, do_sample=False)
result = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(result)

Training Details

  • Framework: TRL 1.2.0, Transformers 5.6.2, PyTorch 2.11.0
  • Hardware: a10g-large (24GB GPU)
  • Optimizer: AdamW with cosine LR schedule
  • LR: 2e-4, warmup: 10 steps, effective batch: 16
  • Dataset: tobil/qmd-query-expansion-train-v2

Citation

@software{vonwerra2020trl,
  title   = {{TRL: Transformers Reinforcement Learning}},
  author  = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
  license = {Apache-2.0},
  url     = {https://github.com/huggingface/trl},
  year    = {2020}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tobil/qmd-query-expansion-1.7B-v2

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(747)
this model