Instructions to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF", filename="MindBot-Ultra-27B-v0.1-BF16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16 # Run inference directly in the terminal: llama-cli -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16 # Run inference directly in the terminal: llama-cli -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16 # Run inference directly in the terminal: ./llama-cli -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
Use Docker
docker model run hf.co/TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
- LM Studio
- Jan
- vLLM
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
- Ollama
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with Ollama:
ollama run hf.co/TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
- Unsloth Studio
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF to start chatting
- Pi
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
Run Hermes
hermes
- Docker Model Runner
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with Docker Model Runner:
docker model run hf.co/TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
- Lemonade
How to use TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1-GGUF:BF16
Run and chat with the model
lemonade run user.MindBot-Ultra-27B-v0.1-GGUF-BF16
List all available models
lemonade list
MindBot Ultra 27B v0.1 GGUF
GGUF export package for TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1.
MindBot Ultra: Your Mind. Expanded. Your Vision. Amplified.
This repo packages the practical Ollama/llama.cpp builds for the newer Qwen3.6-derived MindBot Ultra 27B line, including a high-fidelity BF16 GGUF and a deployment-friendly Q4_K_M GGUF for agent-swarm serving.
Files
| File | Purpose | Size |
|---|---|---|
MindBot-Ultra-27B-v0.1-BF16.gguf |
archival / high-fidelity GGUF | ~50.11 GB |
MindBot-Ultra-27B-v0.1-Q4_K_M.gguf |
recommended Ollama + agent endpoint build | ~15.41 GB |
Modelfile.Q4_K_M |
Ollama template for the Q4_K_M build | ~532 B |
Modelfile.BF16 |
Ollama template for the BF16 build | ~530 B |
ollama-create.sh |
helper script to create both Ollama model tags | ~208 B |
assets/ |
repo art + deployment diagram | images |
Recommended Ollama use
Use the Q4_K_M build for practical local/server use:
ollama create mindbot-ultra-27b:q4_k_m -f Modelfile.Q4_K_M
ollama run mindbot-ultra-27b:q4_k_m
BF16 is included as a high-fidelity archival/export GGUF and requires a very high-memory machine:
ollama create mindbot-ultra-27b:bf16 -f Modelfile.BF16
Modal + Ollama server endpoint pattern
The live Modal/Ollama deployment created for this package is locked to:
mindbot-ultra-27b:q4_k_m
Endpoint base URL:
https://m1ndb0t-2045--hermes-mindbot-ultra-ollama-fastapi-app.modal.run
Routes:
GET /health
POST /v1/chat/completions
POST /api/chat
The chat routes require a bearer token and reject requests for any model name other than mindbot-ultra-27b:q4_k_m.
This deployment follows Modal's Ollama examples:
- Modal guide: https://modal.com/blog/how_to_run_ollama_article
- Modal docs example: https://modal.com/docs/examples/ollama
Pattern:
- Build a Modal image with the latest Ollama install script.
- Mount a persistent Modal volume for
OLLAMA_MODELS. - Create exactly one locked Ollama model tag from this repo's
Q4_K_MGGUF. - Expose OpenAI-compatible
/v1/chat/completionsthrough a FastAPI guard. - Lock requests so only
mindbot-ultra-27b:q4_k_mis accepted. - Let Modal scale the endpoint to zero when idle.
Agent-swarm routing note
For autonomous agents, point your OpenAI-compatible client at the deployed Modal Ollama endpoint and use only:
model = mindbot-ultra-27b:q4_k_m
Do not allow arbitrary model names unless you intentionally expand the allowed model list.
Evaluation reports
The repo includes a standard cross-model stress test folder:
evals/mindbot-ultra-27b-11q-self-training-eval.md
evals/mindbot-ultra-27b-11q-self-training-eval.json
This 11-question check asks every model the same identity, training-lineage, deployment, safety, synthetic-data, instruction-following, and wordplay questions so results can be compared across the Mindbotz model family.
Smoke test
A live endpoint smoke test completed successfully with a poem prompt about “how many hours are in the word strawberry.” Warm request result:
completion_tokens: 172
elapsed_seconds: 13.947
tokens_per_second: ~12.33
Sample output:
In the strawberry field of my mind,
Where red neurons glow and sweet data unwind,
I count the hours, one by one,
In the seed of a word, a digital sun.
...
The model is alive enough to rhyme, not alive enough to lie.
Conversion notes
- Source model:
TheMindExpansionNetwork/MindBot-Ultra-27B-v0.1 - Base model:
unsloth/Qwen3.6-27B - Architecture family: Qwen3.5 / Qwen3.6-derived causal language model
- GGUF conversion: llama.cpp
convert_hf_to_gguf.py - Q4_K_M quantization: llama.cpp
llama-quantize - Quantization detail observed for Q4_K_M: ~4.92 BPW
- Ollama templates included as
Modelfile.Q4_K_MandModelfile.BF16 - License follows the source model card: Apache 2.0
Safety and scope
MindBot Ultra is a text-generation model. It can be wired into tools and agents, but tool execution should be separately permissioned, logged, and sandboxed. Keep high-impact actions approval-gated unless your deployment has its own safety layer.
- Downloads last month
- 121
4-bit
16-bit

