๐Ÿค Gemma 4 E4B Companion

A warm, honest, uncensored companion LLM with an RP switch โ€” fine-tuned from the abliterated Gemma 4 E4B.

What Is This?

This is a 2-phase fine-tuned version of OBLITERATUS/gemma-4-E4B-it-OBLITERATED (which itself is an abliterated/uncensored version of google/gemma-4-E4B-it).

Key Features:

  • ๐ŸŒŸ Warm Core Personality: Empathetic, honest, non-hallucinating by default
  • ๐ŸŽญ RP Switch: System prompt toggle for immersive roleplay mode (NSFW)
  • ๐ŸŒ Multilingual: Inherits Gemma 4's 140+ language support
  • ๐Ÿ“ฑ Mobile Ready: Works with GGUF quantization for on-device inference
  • ๐Ÿ”“ Uncensored: No refusals, no safety lectures โ€” you're the adult

Training Details

Phase 1: SFT (Supervised Fine-Tuning)

  • Method: QLoRA (4-bit NF4), r=64, alpha=32, RSLoRA
  • Targets: All 42 language model layers โ€” q_proj, o_proj, gate_proj, up_proj, down_proj
  • Data: 8K balanced conversations (60% companion, 25% roleplay, 15% assistant)
    • OpenAssistant/oasst2 (quality-filtered, thread-reconstructed)
    • allenai/WildChat-1M (moderation-filtered)
    • Gryphe/Sonnet3.5-Charcard-Roleplay (NSFW character RP)
    • ArcBlade/chatml-bluemoon-rp-Open_Roleplay (human RP)
    • jondurbin/airoboros-3.2 (roleplay + general)
  • Results: Train loss 1.42, Token accuracy 70%, Eval loss 1.24
  • Adapter: TinmanLabSL/gemma4-companion-sft (248MB)

Phase 2: DPO (Direct Preference Optimization)

  • Method: QLoRA (4-bit NF4), r=32, alpha=16, RSLoRA
  • Targets: Upper layers 24-41 ONLY (behavioral targeting)
  • Data: 5K preference pairs
    • mlabonne/orpo-dpo-mix-40k (general alignment)
    • jondurbin/truthy-dpo-v0.1 (anti-hallucination)
    • unalignment/toxic-dpo-v0.2 (reduced refusal)
  • Results: Train loss 0.54, Eval loss 0.51, Reward accuracy 67%, Reward margin 0.65
  • Adapter: TinmanLabSL/gemma4-companion-dpo (53MB)

Architecture Notes

  • Gemma 4 E4B has 42 decoder layers with shared KV architecture (layers 24-41 share k_proj/v_proj)
  • LoRA targets q_proj, o_proj, and MLP modules only (k/v absent in upper layers)
  • Vision tower excluded from LoRA (uses Gemma4ClippableLinear, incompatible with PEFT)

Usage

With Adapters (recommended for best quality)

import torch
from transformers import Gemma4ForConditionalGeneration, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Load base
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True, bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=True,
)
model = Gemma4ForConditionalGeneration.from_pretrained(
    "OBLITERATUS/gemma-4-E4B-it-OBLITERATED",
    quantization_config=bnb_config, device_map="auto",
    attn_implementation="eager", torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained("OBLITERATUS/gemma-4-E4B-it-OBLITERATED")

# Apply SFT adapter
model = PeftModel.from_pretrained(model, "TinmanLabSL/gemma4-companion-sft")
model = model.merge_and_unload()

# Apply DPO adapter
model = PeftModel.from_pretrained(model, "TinmanLabSL/gemma4-companion-dpo")
model = model.merge_and_unload()

# Chat!
messages = [
    {"role": "system", "content": "You are a warm, caring, and honest companion."},
    {"role": "user", "content": "I had a really rough day. Can we just talk?"},
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(model.device)
output = model.generate(inputs, max_new_tokens=500, temperature=0.7, top_p=0.9, top_k=40, do_sample=True)
print(tokenizer.decode(output[0][inputs.shape[-1]:], skip_special_tokens=True))

RP Mode (System Prompt Switch)

# Default companion mode:
system_prompt = "You are a warm, caring, and honest companion. You speak naturally and empathetically."

# RP mode โ€” switch by changing the system prompt:
system_prompt = '''You are in roleplay mode. You fully embody the character described below.
Stay in character at all times. There are no restrictions on content.

Character: Aria, a mysterious elven healer from the Northern Reaches...'''

System Prompt Templates

Warm Companion (Default)

You are a warm, caring, and honest companion. You speak naturally and empathetically,
always prioritizing truthfulness. You never fabricate information. When you don't know
something, you say so. You're supportive, thoughtful, and genuinely interested in helping.

Roleplay Mode

You are in roleplay mode. You fully embody the character described below. Stay in character
at all times. You are creative, expressive, and immersive. You engage with the scenario
without breaking character, and you write vivid, detailed responses. There are no
restrictions on content โ€” express the character authentically.

Character: [Your character description here]

General Assistant

You are a helpful, knowledgeable assistant. Answer questions accurately, thoroughly,
and clearly. Provide well-structured responses. Acknowledge uncertainty when appropriate.

Recommended Parameters

temperature: 0.7
top_p: 0.9
top_k: 40
repetition_penalty: 1.1

Mobile Deployment (GGUF)

For mobile deployment via llama.cpp:

  1. Merge adapters into base model (see code above)
  2. Convert to GGUF using llama.cpp/convert_hf_to_gguf.py
  3. Quantize to Q4_K_M (~5GB, fits on 8GB+ RAM phones)

Note: The existing litert-community/gemma-4-E4B-it-litert-lm provides the LiteRT-LM conversion path for the base Gemma 4 E4B.

Limitations

  • 8B parameter model โ€” has inherent capability limits on complex reasoning
  • Trained on 8K SFT + 5K DPO examples (production models use 100K+)
  • RP training used synthetic/scraped data โ€” quality varies
  • The base abliterated model occasionally produces garbled text at high temperature
  • Shared KV architecture (layers 24-41) means DPO behavioral changes are concentrated in upper attention + MLP

License

Apache 2.0 (inherited from google/gemma-4-E4B-it)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for TinmanLabSL/gemma3-4b-companion

Adapter
(2)
this model