Feature Extraction
Transformers
Safetensors
qwen3
fp8
compressed-tensors
llm-compressor
quantized
text-embeddings
sentence-similarity
text-embeddings-inference
Instructions to use binedge/Qwen3-Embedding-0.6B-FP8 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use binedge/Qwen3-Embedding-0.6B-FP8 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="binedge/Qwen3-Embedding-0.6B-FP8")# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("binedge/Qwen3-Embedding-0.6B-FP8") model = AutoModel.from_pretrained("binedge/Qwen3-Embedding-0.6B-FP8") - Notebooks
- Google Colab
- Kaggle
Qwen3-Embedding-0.6B-FP8
FP8-quantized version of
Qwen/Qwen3-Embedding-0.6B.
This model was quantized with
llm-compressor using FP8
dynamic activation quantization for the Qwen3 embedding backbone.
Quantization details
- Base model:
Qwen/Qwen3-Embedding-0.6B - Quantization tool:
llm-compressor - Saved format:
compressed-tensors - Quantization scheme:
FP8_DYNAMIC - Targets:
Linear - Ignored modules: none
Quantization recipe
from llmcompressor import oneshot
from llmcompressor.modifiers.quantization import QuantizationModifier
recipe = QuantizationModifier(
targets="Linear",
scheme="FP8_DYNAMIC",
)
oneshot(model=model, recipe=recipe)
model.save_pretrained("Qwen3-Embedding-0.6B-FP8")
tokenizer.save_pretrained("Qwen3-Embedding-0.6B-FP8")
- Downloads last month
- 139,364