Stephen Oates PRO

soates

AI & ML interests

None yet

Recent Activity

upvoted an article 5 days ago

Deriving the PPO Loss from First Principles

upvoted an article 21 days ago

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

upvoted a collection 23 days ago

Physics of Language Models: Part 4.2

View all activity

Organizations

None yet

upvoted an article 5 days ago

Article

Deriving the PPO Loss from First Principles

7 days ago

•

upvoted an article 21 days ago

Article

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

24 days ago

•

upvoted a collection 23 days ago

Physics of Language Models: Part 4.2

Collection

16 items • Updated Jul 29, 2025 • 14

upvoted an article 27 days ago

Article

We Got Claude to Fine-Tune an Open Source LLM

29 days ago

•

550

upvoted a paper 2 months ago

The Massive Legal Embedding Benchmark (MLEB)

Paper • 2510.19365 • Published Oct 22, 2025 • 17

upvoted an article 2 months ago

Article

Australian-made LLM beats OpenAI and Google at legal retrieval

Oct 23, 2025

•

upvoted an article 3 months ago

Article

There is no such thing as a tokenizer-free lunch

Sep 25, 2025

•

updated a dataset 4 months ago

soates/australian-insurance-dspy-corpus

Viewer • Updated Sep 17, 2025 • 359 • 19

published a dataset 4 months ago

soates/australian-insurance-dspy-corpus

Viewer • Updated Sep 17, 2025 • 359 • 19

upvoted 2 papers 4 months ago

Virtual Agent Economies

Paper • 2509.10147 • Published Sep 12, 2025 • 26

The Majority is not always right: RL training for solution aggregation

Paper • 2509.06870 • Published Sep 8, 2025 • 16

updated a dataset 5 months ago

soates/tictactoe-gemma-dataset

Viewer • Updated Aug 15, 2025 • 93.6k • 13

published a dataset 5 months ago

soates/tictactoe-gemma-dataset

Viewer • Updated Aug 15, 2025 • 93.6k • 13

liked a model 5 months ago

Menlo/Lucy-128k

Text Generation • 2B • Updated Aug 4, 2025 • 254 • 108

liked a model 6 months ago

chandar-lab/NeoBERT

Feature Extraction • 0.2B • Updated Mar 25, 2025 • 3.25k • 186

upvoted 2 papers 7 months ago

Large Language Models are Locally Linear Mappings

Paper • 2505.24293 • Published May 30, 2025 • 14

Reinforcement Learning Finetunes Small Subnetworks in Large Language Models

Paper • 2505.11711 • Published May 16, 2025 • 11

upvoted an article 7 months ago

Article

nanoVLM: The simplest repository to train your VLM in pure PyTorch

May 21, 2025

•

247

upvoted an article 8 months ago

Article

Tiny Agents: an MCP-powered agent in 50 lines of code

Apr 25, 2025

•

304

upvoted a paper 8 months ago

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Paper • 2504.13837 • Published Apr 18, 2025 • 139

Stephen Oates PRO

AI & ML interests

Recent Activity

Organizations

soates's activity

Deriving the PPO Loss from First Principles

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

We Got Claude to Fine-Tune an Open Source LLM

Australian-made LLM beats OpenAI and Google at legal retrieval

There is no such thing as a tokenizer-free lunch

nanoVLM: The simplest repository to train your VLM in pure PyTorch

Tiny Agents: an MCP-powered agent in 50 lines of code