Kurtis-EON1

Kurtis-EON(1): "Infinite" Context (see notes), O(1) Memory, Zero KV-Cache growth, Constant inference cost. Recurrent State.

  • "Infinite" Context: Capable of processing input streams of unlimited length by compressing history into a continuously evolving Recurrent State, rather than storing raw tokens in a fixed window.
  • Kurtis-EON1 can process streams of unlimited length, maintaining a persistent state that evolves over time without memory explosion.
  • A Transformer stores every token in the KV-Cache. If you ask for the 3rd word from 10,000 tokens ago, it has perfect fidelity.
  • The Recurrent State has a fixed size (e.g., 1024 dimensions). If you feed it 1 Billion tokens, it physically cannot store 1 Billion distinct facts in a 1024-float vector.

For comparison:

  • Transformer: A photographic memory, but it faints after 1 hour.
  • Kurtis-EON1: Attempts to mimic human memory.

Infinite Context vs. Lossy Recall:

Think of the model like human memory. You can live for 80 years (Infinite Context), but you don't remember exactly what you ate for breakfast in Berlin on February 2, 2016. Or why you were working on LSTM/RNNs at that time, in an empty flat. Trying to build a chatbot because you felt alone and you... You remember the gist of your life. The model compresses the past into a feeling (State), rather than a recording (Cache).

Work in Progress: This model is currently under active development.

Overview

Kurtis-EON1 is an experimental ~400M parameter language model based on a custom Recurrent State Architecture.

Data & Status

  • Architecture: Hybrid (codename: Echo-DSRN)
  • Base: Trained from scratch on FineWeb-EDU (sample-10BT).
  • Instruct (WIP): Currently fine-tuning on UltraChat, Cosmopedia, and custom synthetic sets.

Weights will be released upon completion of safety alignment.

  • Surprise Mechanism: Incorporates a novel surprise-based gating mechanism (inspired by Google Titans)
  • Gating: specific gating architecture adjustments (details confidential/WIP).

Base Model

Training metrics and logs are available in the logs/ directory.

Training & Validation Metrics

Train Loss Validation Loss Extrapolation (1024T)
Train Loss Validation Loss Extrapolation
Avg Train Loss Avg Gate Activation Surprise Lambda Grad
Avg Train Loss Gate Activation Surprise Grad
Learning Rate Tokens Seen
LR Tokens Seen

GPU Performance

GPU Util % Memory Alloc % Read/Write
GPU Util Mem Alloc R/W
Power Usage Power (W)
Power Power W

System Metrics

CPU Util threads
CPU Util Threads
Proc Memory (MB) Available Memory Sys Mem Util
Proc Mem Avail Mem Sys Mem

Instruct Model

Tasks Version Filter n-shot Metric Value Stderr
arc_easy 1 none 0 acc ↑ 0.4689 ± 0.0102
none 0 acc_norm ↑ 0.4158 ± 0.0101
hellaswag 1 none 0 acc ↑ 0.2915 ± 0.0045
none 0 acc_norm ↑ 0.3190 ± 0.0047
piqa 1 none 0 acc ↑ 0.6306 ± 0.0113
none 0 acc_norm ↑ 0.6143 ± 0.0114
sciq 1 none 0 acc ↑ 0.7520 ± 0.0137
none 0 acc_norm ↑ 0.6780 ± 0.0148
truthfulqa_mc1 2 none 0 acc ↑ 0.2411 ± 0.0150
truthfulqa_mc2 3 none 0 acc ↑ 0.4251 ± 0.0151
winogrande 1 none 0 acc ↑ 0.5122 ± 0.0140

Developed by ethicalabs.ai

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train ethicalabs/Kurtis-EON1

Collection including ethicalabs/Kurtis-EON1