Text Generation
Transformers
Safetensors
English
jeeves
causal-lm
looped-transformer
value-residual
sentencepiece
tool-calling
conversational
custom_code
Instructions to use Anurich/Jeeves-Small-75M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Anurich/Jeeves-Small-75M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Anurich/Jeeves-Small-75M", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("Anurich/Jeeves-Small-75M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Anurich/Jeeves-Small-75M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Anurich/Jeeves-Small-75M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Anurich/Jeeves-Small-75M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Anurich/Jeeves-Small-75M
- SGLang
How to use Anurich/Jeeves-Small-75M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Anurich/Jeeves-Small-75M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Anurich/Jeeves-Small-75M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Anurich/Jeeves-Small-75M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Anurich/Jeeves-Small-75M", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Anurich/Jeeves-Small-75M with Docker Model Runner:
docker model run hf.co/Anurich/Jeeves-Small-75M
Update README.md
Browse files
README.md
CHANGED
|
@@ -6,27 +6,77 @@ tags:
|
|
| 6 |
- looped-transformer
|
| 7 |
- value-residual
|
| 8 |
- sentencepiece
|
|
|
|
|
|
|
| 9 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
# Jeeves
|
| 13 |
|
| 14 |
-
A compact language model
|
| 15 |
|
| 16 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
```python
|
| 19 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 20 |
|
| 21 |
-
tokenizer = AutoTokenizer.from_pretrained("
|
| 22 |
-
model = AutoModelForCausalLM.from_pretrained("
|
| 23 |
|
| 24 |
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
|
| 25 |
outputs = model.generate(**inputs, max_new_tokens=50)
|
| 26 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 27 |
```
|
| 28 |
|
| 29 |
-
**Note:** `trust_remote_code=True` is required.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
## Architecture
|
| 32 |
|
|
@@ -35,17 +85,91 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 35 |
| Parameters | 74.9M |
|
| 36 |
| Unique layers | 8 |
|
| 37 |
| Effective depth | 15 |
|
| 38 |
-
| Loop | block[4]
|
| 39 |
-
| Value residual |
|
| 40 |
| Hidden dim | 768 |
|
| 41 |
-
| FFN dim |
|
| 42 |
-
| Attention heads | 12 (Q) / 4 (KV) |
|
| 43 |
| Vocab size | 32,000 |
|
| 44 |
| Max seq length | 512 |
|
| 45 |
-
| Training
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 46 |
|
| 47 |
-
##
|
| 48 |
|
| 49 |
-
|
| 50 |
-
- **Value Residual Learning** ([arXiv 2410.17897](https://arxiv.org/abs/2410.17897))
|
| 51 |
-
- **Input Injection** for loop stability
|
|
|
|
| 6 |
- looped-transformer
|
| 7 |
- value-residual
|
| 8 |
- sentencepiece
|
| 9 |
+
- tool-calling
|
| 10 |
+
- conversational
|
| 11 |
license: apache-2.0
|
| 12 |
+
language:
|
| 13 |
+
- en
|
| 14 |
+
pipeline_tag: text-generation
|
| 15 |
---
|
| 16 |
|
| 17 |
+
# Jeeves-Small-75M
|
| 18 |
|
| 19 |
+
A compact 75M parameter language model built on **Looped Transformer** and **Value Residual Learning** architectures β with native support for **tool calling / function calling**.
|
| 20 |
|
| 21 |
+
Jeeves is designed to punch above its weight class by reusing a small set of transformer layers iteratively (looping), giving it an effective depth far beyond what its parameter count suggests.
|
| 22 |
+
|
| 23 |
+
---
|
| 24 |
+
|
| 25 |
+
## Quick Start
|
| 26 |
|
| 27 |
```python
|
| 28 |
from transformers import AutoTokenizer, AutoModelForCausalLM
|
| 29 |
|
| 30 |
+
tokenizer = AutoTokenizer.from_pretrained("Anurich/Jeeves-Small-75M", trust_remote_code=True)
|
| 31 |
+
model = AutoModelForCausalLM.from_pretrained("Anurich/Jeeves-Small-75M", trust_remote_code=True)
|
| 32 |
|
| 33 |
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
|
| 34 |
outputs = model.generate(**inputs, max_new_tokens=50)
|
| 35 |
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 36 |
```
|
| 37 |
|
| 38 |
+
> **Note:** `trust_remote_code=True` is required due to custom model architecture code.
|
| 39 |
+
|
| 40 |
+
---
|
| 41 |
+
|
| 42 |
+
## Tool Calling (Function Calling)
|
| 43 |
+
|
| 44 |
+
Jeeves supports structured tool/function calling out of the box. Below is an example:
|
| 45 |
+
|
| 46 |
+
```python
|
| 47 |
+
tools = [
|
| 48 |
+
{
|
| 49 |
+
"name": "get_weather",
|
| 50 |
+
"description": "Get the current weather for a given location.",
|
| 51 |
+
"parameters": {
|
| 52 |
+
"type": "object",
|
| 53 |
+
"properties": {
|
| 54 |
+
"location": {"type": "string", "description": "City name"},
|
| 55 |
+
"unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
|
| 56 |
+
},
|
| 57 |
+
"required": ["location"]
|
| 58 |
+
}
|
| 59 |
+
}
|
| 60 |
+
]
|
| 61 |
+
|
| 62 |
+
messages = [
|
| 63 |
+
{"role": "user", "content": "What's the weather like in London?"}
|
| 64 |
+
]
|
| 65 |
+
|
| 66 |
+
# Format prompt with tools using the chat template
|
| 67 |
+
prompt = tokenizer.apply_chat_template(
|
| 68 |
+
messages,
|
| 69 |
+
tools=tools,
|
| 70 |
+
tokenize=False,
|
| 71 |
+
add_generation_prompt=True
|
| 72 |
+
)
|
| 73 |
+
|
| 74 |
+
inputs = tokenizer(prompt, return_tensors="pt")
|
| 75 |
+
outputs = model.generate(**inputs, max_new_tokens=128)
|
| 76 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
---
|
| 80 |
|
| 81 |
## Architecture
|
| 82 |
|
|
|
|
| 85 |
| Parameters | 74.9M |
|
| 86 |
| Unique layers | 8 |
|
| 87 |
| Effective depth | 15 |
|
| 88 |
+
| Loop | block[4] Γ 8 |
|
| 89 |
+
| Value residual | β
|
|
| 90 |
| Hidden dim | 768 |
|
| 91 |
+
| FFN dim | 2,048 |
|
| 92 |
+
| Attention heads | 12 (Q) / 4 (KV) β GQA |
|
| 93 |
| Vocab size | 32,000 |
|
| 94 |
| Max seq length | 512 |
|
| 95 |
+
| Training steps | 1,100 |
|
| 96 |
+
|
| 97 |
+
### Key Innovations
|
| 98 |
+
|
| 99 |
+
- **Looped Transformer** ([arXiv:2311.12424](https://arxiv.org/abs/2311.12424)) β A single transformer block is applied repeatedly in a loop, dramatically increasing effective depth while keeping parameter count small. This allows Jeeves to reason iteratively rather than in a single pass.
|
| 100 |
+
- **Value Residual Learning** ([arXiv:2410.17897](https://arxiv.org/abs/2410.17897)) β Residual connections applied at the value projection level alleviate attention concentration in deep/looped networks, improving gradient flow and stability.
|
| 101 |
+
- **Input Injection** β The original input is re-injected at each loop iteration to prevent representational drift across loops, a critical stabilization technique for looped architectures.
|
| 102 |
+
|
| 103 |
+
---
|
| 104 |
+
|
| 105 |
+
## Benchmark Results
|
| 106 |
+
|
| 107 |
+
Evaluated using [EleutherAI lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
| 108 |
+
|
| 109 |
+
| Benchmark | Accuracy | Correct | Total |
|
| 110 |
+
|---|---|---|---|
|
| 111 |
+
| HellaSwag | 30.9% | 3,100 | 10,042 |
|
| 112 |
+
| ARC-Easy | 47.1% | 1,118 | 2,376 |
|
| 113 |
+
| ARC-Challenge | 24.9% | 292 | 1,172 |
|
| 114 |
+
| **ARC (Average)** | **36.0%** | β | β |
|
| 115 |
+
| PIQA | 63.9% | 1,174 | 1,838 |
|
| 116 |
+
| WinoGrande | 52.4% | 664 | 1,267 |
|
| 117 |
+
| MMLU | 25.2% | 3,536 | 14,042 |
|
| 118 |
+
| TruthfulQA | 24.8% | 203 | 817 |
|
| 119 |
+
| GSM8K | 1.4% | 18 | 1,319 |
|
| 120 |
+
| IFEval | 40.0% | 4 | 10 |
|
| 121 |
+
|
| 122 |
+
### Notes on Results
|
| 123 |
+
|
| 124 |
+
- **PIQA (63.9%)** and **WinoGrande (52.4%)** are the strongest results, indicating reasonable physical commonsense and pronoun-resolution reasoning for the model's size.
|
| 125 |
+
- **MMLU (25.2%)** is close to random (25% for 4-way MCQ), which is expected given the model's size and early training stage (1,100 steps). More training is needed for knowledge-heavy tasks.
|
| 126 |
+
- **GSM8K (1.4%)** reflects a known limitation: multi-step mathematical reasoning is very demanding and typically requires much larger models or specialized fine-tuning.
|
| 127 |
+
- **IFEval (40.0%)** is promising for a 75M model and reflects the tool-calling and instruction-following training signal.
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
## Limitations
|
| 132 |
+
|
| 133 |
+
- **Short context (512 tokens):** Jeeves currently supports a maximum of 512 tokens. Long documents, multi-turn conversations, and complex tool chains may be truncated.
|
| 134 |
+
- **Early training stage:** At 1,100 training steps, this is an early checkpoint. Knowledge-heavy and math benchmarks (MMLU, GSM8K) will improve significantly with more training.
|
| 135 |
+
- **Not suitable for factual retrieval:** Like all small language models, Jeeves may hallucinate facts. It is best used with grounding via tool calls or RAG pipelines.
|
| 136 |
+
- **English-centric:** Trained primarily on English data. Performance on other languages is not guaranteed.
|
| 137 |
+
|
| 138 |
+
---
|
| 139 |
+
|
| 140 |
+
## Intended Use
|
| 141 |
+
|
| 142 |
+
Jeeves is designed for:
|
| 143 |
+
|
| 144 |
+
- **On-device / edge inference** where a small footprint is critical
|
| 145 |
+
- **Tool-augmented agents** that rely on function calling rather than parametric knowledge
|
| 146 |
+
- **Research** into efficient architectures (looped transformers, value residual)
|
| 147 |
+
- **Fine-tuning** on domain-specific tasks where a small, fast base model is preferred
|
| 148 |
+
|
| 149 |
+
---
|
| 150 |
+
|
| 151 |
+
## Citation
|
| 152 |
+
|
| 153 |
+
If you use Jeeves in your work, please also cite the papers that inspired its architecture:
|
| 154 |
+
|
| 155 |
+
```bibtex
|
| 156 |
+
@article{looped_transformer_2023,
|
| 157 |
+
title={Looped Transformers are Better at Learning Learning Algorithms},
|
| 158 |
+
author={...},
|
| 159 |
+
journal={arXiv:2311.12424},
|
| 160 |
+
year={2023}
|
| 161 |
+
}
|
| 162 |
+
|
| 163 |
+
@article{value_residual_2024,
|
| 164 |
+
title={Value Residual Learning For Alleviating Attention Concentration In Transformers},
|
| 165 |
+
author={...},
|
| 166 |
+
journal={arXiv:2410.17897},
|
| 167 |
+
year={2024}
|
| 168 |
+
}
|
| 169 |
+
```
|
| 170 |
+
|
| 171 |
+
---
|
| 172 |
|
| 173 |
+
## License
|
| 174 |
|
| 175 |
+
Apache 2.0 β see [LICENSE](LICENSE) for details.
|
|
|
|
|
|