Instructions to use tobil/qmd-query-expansion-1.7B-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use tobil/qmd-query-expansion-1.7B-v2 with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("tobil/qmd-query-expansion-1.7B-v2", dtype="auto") - Notebooks
- Google Colab
- Kaggle
QMD Query Expansion 1.7B v2
Fine-tuned Qwen3-1.7B for query expansion in QMD's hybrid retrieval pipeline.
Output Format
hyde: A hypothetical document passage that would answer the query.
lex: keyword1
lex: keyword2
vec: semantic query reformulation
vec: another semantic variation
Improvements over v1
| Improvement | v1 | v2 |
|---|---|---|
| LoRA rank | 16 | 64 |
| LoRA target | q/k/v/o/gate/up/down_proj | all-linear |
| LoRA alpha | 32 | 64 |
| LoRA dropout | 0.0 | 0.05 |
| Epochs | 5 | 2 |
| Max length | 512 | 1024 |
| Gradient checkpointing | ❌ | ✅ |
| Dataset size | 2806 train (unversioned) | 2806 train (v2 on Hub) |
Training Results
- Train time: ~10 minutes (2 epochs, a10g-large)
- Final train loss: 0.79
- Final eval loss: 0.74
- Final token accuracy: 81.5%
Evaluation: v1 vs v2
Tested on 51 queries covering technical docs, short queries, named entities, personal notes, research, errors, temporal, complex, entity preservation, quoted phrases, negation, and /only: modes.
| Metric | v1 | v2 | Delta |
|---|---|---|---|
| Average score | 36.6/100 | 40.4/100 | +3.8 |
| v2 wins / v1 wins / ties | — | 11 | 2 |
Biggest improvements (all on /only: mode — the hardest edge cases):
| Query | v1 | v2 |
|---|---|---|
auth /only:lex |
20 | 50 |
React hooks tutorial /only:lex |
20 | 50 |
kubernetes pod deployment /only:vec |
20 | 50 |
how to configure authentication /only:vec |
20 | 50 |
TDS motorsports history /only:hyde |
10 | 40 |
AWS Lambda cold start /only:hyde |
5 | 40 |
v2 is significantly better at respecting /only: constraints and generating the correct output type.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-1.7B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
model = PeftModel.from_pretrained(base, "tobil/qmd-query-expansion-1.7B-v2")
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-1.7B")
prompt = tokenizer.apply_chat_template(
[{"role": "user", "content": "/no_think Expand this search query: kubernetes pod networking"}],
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=400, do_sample=False)
result = tokenizer.decode(out[0][inputs.input_ids.shape[1]:], skip_special_tokens=True)
print(result)
Training Details
- Framework: TRL 1.2.0, Transformers 5.6.2, PyTorch 2.11.0
- Hardware: a10g-large (24GB GPU)
- Optimizer: AdamW with cosine LR schedule
- LR: 2e-4, warmup: 10 steps, effective batch: 16
- Dataset: tobil/qmd-query-expansion-train-v2
Citation
@software{vonwerra2020trl,
title = {{TRL: Transformers Reinforcement Learning}},
author = {von Werra, Leandro and Belkada, Younes and Tunstall, Lewis and Beeching, Edward and Thrush, Tristan and Lambert, Nathan and Huang, Shengyi and Rasul, Kashif and Gallouédec, Quentin},
license = {Apache-2.0},
url = {https://github.com/huggingface/trl},
year = {2020}
}
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support