๐ค Gemma 4 E4B Companion
A warm, honest, uncensored companion LLM with an RP switch โ fine-tuned from the abliterated Gemma 4 E4B.
What Is This?
This is a 2-phase fine-tuned version of OBLITERATUS/gemma-4-E4B-it-OBLITERATED (which itself is an abliterated/uncensored version of google/gemma-4-E4B-it).
Key Features:
- ๐ Warm Core Personality: Empathetic, honest, non-hallucinating by default
- ๐ญ RP Switch: System prompt toggle for immersive roleplay mode (NSFW)
- ๐ Multilingual: Inherits Gemma 4's 140+ language support
- ๐ฑ Mobile Ready: Works with GGUF quantization for on-device inference
- ๐ Uncensored: No refusals, no safety lectures โ you're the adult
Training Details
Phase 1: SFT (Supervised Fine-Tuning)
- Method: QLoRA (4-bit NF4), r=64, alpha=32, RSLoRA
- Targets: All 42 language model layers โ
q_proj,o_proj,gate_proj,up_proj,down_proj - Data: 8K balanced conversations (60% companion, 25% roleplay, 15% assistant)
- OpenAssistant/oasst2 (quality-filtered, thread-reconstructed)
- allenai/WildChat-1M (moderation-filtered)
- Gryphe/Sonnet3.5-Charcard-Roleplay (NSFW character RP)
- ArcBlade/chatml-bluemoon-rp-Open_Roleplay (human RP)
- jondurbin/airoboros-3.2 (roleplay + general)
- Results: Train loss 1.42, Token accuracy 70%, Eval loss 1.24
- Adapter:
TinmanLabSL/gemma4-companion-sft(248MB)
Phase 2: DPO (Direct Preference Optimization)
- Method: QLoRA (4-bit NF4), r=32, alpha=16, RSLoRA
- Targets: Upper layers 24-41 ONLY (behavioral targeting)
- Data: 5K preference pairs
- mlabonne/orpo-dpo-mix-40k (general alignment)
- jondurbin/truthy-dpo-v0.1 (anti-hallucination)
- unalignment/toxic-dpo-v0.2 (reduced refusal)
- Results: Train loss 0.54, Eval loss 0.51, Reward accuracy 67%, Reward margin 0.65
- Adapter:
TinmanLabSL/gemma4-companion-dpo(53MB)
Architecture Notes
- Gemma 4 E4B has 42 decoder layers with shared KV architecture (layers 24-41 share k_proj/v_proj)
- LoRA targets
q_proj,o_proj, and MLP modules only (k/v absent in upper layers) - Vision tower excluded from LoRA (uses
Gemma4ClippableLinear, incompatible with PEFT)
Usage
With Adapters (recommended for best quality)
import torch
from transformers import Gemma4ForConditionalGeneration, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# Load base
bnb_config = BitsAndBytesConfig(
load_in_4bit=True, bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
)
model = Gemma4ForConditionalGeneration.from_pretrained(
"OBLITERATUS/gemma-4-E4B-it-OBLITERATED",
quantization_config=bnb_config, device_map="auto",
attn_implementation="eager", torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("OBLITERATUS/gemma-4-E4B-it-OBLITERATED")
# Apply SFT adapter
model = PeftModel.from_pretrained(model, "TinmanLabSL/gemma4-companion-sft")
model = model.merge_and_unload()
# Apply DPO adapter
model = PeftModel.from_pretrained(model, "TinmanLabSL/gemma4-companion-dpo")
model = model.merge_and_unload()
# Chat!
messages = [
{"role": "system", "content": "You are a warm, caring, and honest companion."},
{"role": "user", "content": "I had a really rough day. Can we just talk?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
output = model.generate(inputs, max_new_tokens=500, temperature=0.7, top_p=0.9, top_k=40, do_sample=True)
print(tokenizer.decode(output[0][inputs.shape[-1]:], skip_special_tokens=True))
RP Mode (System Prompt Switch)
# Default companion mode:
system_prompt = "You are a warm, caring, and honest companion. You speak naturally and empathetically."
# RP mode โ switch by changing the system prompt:
system_prompt = '''You are in roleplay mode. You fully embody the character described below.
Stay in character at all times. There are no restrictions on content.
Character: Aria, a mysterious elven healer from the Northern Reaches...'''
System Prompt Templates
Warm Companion (Default)
You are a warm, caring, and honest companion. You speak naturally and empathetically,
always prioritizing truthfulness. You never fabricate information. When you don't know
something, you say so. You're supportive, thoughtful, and genuinely interested in helping.
Roleplay Mode
You are in roleplay mode. You fully embody the character described below. Stay in character
at all times. You are creative, expressive, and immersive. You engage with the scenario
without breaking character, and you write vivid, detailed responses. There are no
restrictions on content โ express the character authentically.
Character: [Your character description here]
General Assistant
You are a helpful, knowledgeable assistant. Answer questions accurately, thoroughly,
and clearly. Provide well-structured responses. Acknowledge uncertainty when appropriate.
Recommended Parameters
temperature: 0.7
top_p: 0.9
top_k: 40
repetition_penalty: 1.1
Mobile Deployment (GGUF)
For mobile deployment via llama.cpp:
- Merge adapters into base model (see code above)
- Convert to GGUF using
llama.cpp/convert_hf_to_gguf.py - Quantize to Q4_K_M (~5GB, fits on 8GB+ RAM phones)
Note: The existing litert-community/gemma-4-E4B-it-litert-lm
provides the LiteRT-LM conversion path for the base Gemma 4 E4B.
Limitations
- 8B parameter model โ has inherent capability limits on complex reasoning
- Trained on 8K SFT + 5K DPO examples (production models use 100K+)
- RP training used synthetic/scraped data โ quality varies
- The base abliterated model occasionally produces garbled text at high temperature
- Shared KV architecture (layers 24-41) means DPO behavioral changes are concentrated in upper attention + MLP
License
Apache 2.0 (inherited from google/gemma-4-E4B-it)
Model tree for TinmanLabSL/gemma3-4b-companion
Base model
google/gemma-4-E4B-it Quantized
OBLITERATUS/gemma-4-E4B-it-OBLITERATED