Performance evaluation of Gemma 3-27b-it with different quantization methods (4-bit vs 8-bit)

#102

by Ryan1007 - opened 16 days ago

Hi team, I'm planning to deploy Gemma 3-27b-it on a consumer GPU with limited VRAM. I've noticed some performance variations when using 4-bit quantization (bitsandbytes). Have you guys performed any benchmarks on how much the reasoning capability drops compared to the FP16 version? Any recommended quantization parameters for maintaining logical consistency?

pannaga10

Google org 13 days ago

Hi @Ryan1007
Google has not published an official benchmark table specifically comparing bitsandbytes to the FP16/BF16 base model for Gemma 3 27b-it. However, you can refer the community-led benchmarks available on Reddit . I have included links to thes benchmarks below for your reference.
https://www.reddit.com/r/LocalLLaMA/comments/1k6nrl1/i_benchmarked_the_gemma_3_27b_qat_models/
https://www.reddit.com/r/LocalLLaMA/comments/1k3jal4/gemma_3_qat_versus_other_q4_quants/

To maintain logical consistency you can start with parameters like NF4 quantization and turning on double quantization, while keeping the compute dtype in FP16. In practice that means setting bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True and bnb_4bit_compute_dtype=torch.float16 . We’ve generally seen NF4 hold up better than plain linear 4-bit, especially around outlier weights and double quant helps recover a bit more fidelity, which translates into more stable reasoning. Please let me know if this setup helps you .

Thanks

bugwei

8 days ago

Hi @pannaga10 .
Just to confirm, are you using FP16 or BF16? I thought GDM usually defaults to BF16.

Hi @Ryan1007 ,
Google has released the QAT version (https://huggingface.co/collections/google/gemma-3-qat).
I expect these QAT models to offer more stable performance after quantization, compared to directly quantizing the original 27b-it model.
I'm not entirely sure if my understanding is correct, but I'm currently using google/gemma-3-12b-it-qat-int4-unquantized together with bitsandbytes.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment