MLX Quants of allura-org/Llama-3.3-70B-Joyous
MLX quants of allura-org/Llama-3.3-70B-Joyous using mlx-lm for quantization on Apple Silicon.
Quants
Configure mlx
[uv venv] # First-time setup with uv (optional)
[uv] pip install -U mlx-lm
The uv wrapper is optional but recommended, get it with Homebrew:
brew install uv
Serve an OpenAI-compatible endpoint
[uv run] mlx_lm.server --model /path/to/weights/Llama-3.3-70B-Joyous_MLX-hi \
--max-tokens -1 --temp 1.25 --min-p 0.05
The default URL is http://127.0.0.1:8080/v1
Programmatic usage
from mlx_lm import load, generate
model_path = "/path/to/weights/Llama-3.3-70B-Joyous_MLX-hi"
model, tokenizer = load(model_path)
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
Original model card:
Joyous 70B
One last hurrah for Llama 3.3 70B. I hope to never tune this model again. Let it die.
Joyous is a finetune of L3.3 70B designed for roleplay tasks, however (as my luck has been going recently) it turned out to be somewhat comically good at assistant tasks as well, far beyond its base model in subjective assistant evals.
Merry Christmas, gooners!
Info
Use the Llama 3 chat template, obviously.
We recommend the following system prompt for assistant usecases:
You are Luna, a helpful and harmless language model by Allura.
I used 1.25 temp and 0.05 min_p while testing, however your preferred samplers may differ.
- Downloads last month
- 53
Model tree for allura-quants/Llama-3.3-70B-Joyous_MLX-hi
Base model
meta-llama/Llama-3.1-70B
Finetuned
meta-llama/Llama-3.3-70B-Instruct
Finetuned
allura-org/Llama-3.3-70B-Joyous
