Self-Distillation Reinforcement Learning via Self-Distillation Paper • 2601.20802 • Published Jan 28 • 42 Self-Distillation Enables Continual Learning Paper • 2601.19897 • Published Jan 27 • 27 Aligning Language Models from User Interactions Paper • 2603.12273 • Published Feb 18
Test-Time Model Merging (TTMM) Collection of fine-tuned models and expert adapters from "Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging" (COLM 25) Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging Paper • 2505.14136 • Published May 20, 2025 rbertolissi/Llama-3.2-1B-Wikipedia Updated Jul 28, 2025 rbertolissi/Llama-3.2-1B-TTMM-Wikipedia Updated Jul 29, 2025 rbertolissi/Qwen2.5-1.5B-Wikipedia Updated Jul 28, 2025
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging Paper • 2505.14136 • Published May 20, 2025
Test-Time Curricula for Targeted RL (Qwen3-4B-Instruct-2507) lasgroup/Qwen3-4B-Instruct-2507-TTC-AIME24 4B • Updated Sep 29, 2025 lasgroup/Qwen3-4B-Instruct-2507-TTC-AIME25 4B • Updated Sep 29, 2025 lasgroup/Qwen3-4B-Instruct-2507-TTC-MATH500 4B • Updated Sep 29, 2025 • 1 lasgroup/Qwen3-4B-Instruct-2507-TTC-Codeforces 4B • Updated Oct 2, 2025
Test-Time Curricula for Targeted RL (AIME25) lasgroup/Qwen3-8B-TTC-AIME25-Q0 8B • Updated Oct 3, 2025 lasgroup/Qwen3-8B-TTC-AIME25-Q1 8B • Updated Oct 3, 2025 lasgroup/Qwen3-8B-TTC-AIME25-Q2 8B • Updated Oct 3, 2025 lasgroup/Qwen3-8B-TTC-AIME25-Q3 8B • Updated Oct 3, 2025
Test-Time Curricula for Targeted RL Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning Paper • 2510.04786 • Published Oct 6, 2025 • 3 lasgroup/verifiable-corpus Viewer • Updated Oct 9, 2025 • 90.7k • 127 • 1 Test-Time Curricula for Targeted RL (Qwen3-8B) Collection 8 items • Updated Oct 3, 2025 Test-Time Curricula for Targeted RL (Qwen3-4B-Instruct-2507) Collection 8 items • Updated Oct 3, 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning Paper • 2510.04786 • Published Oct 6, 2025 • 3
Test-Time Curricula for Targeted RL (Qwen3-4B-Instruct-2507) Collection 8 items • Updated Oct 3, 2025
Test-Time Curricula for Targeted RL (Qwen3-8B) lasgroup/Qwen3-8B-TTC-AIME24 8B • Updated Sep 28, 2025 lasgroup/Qwen3-8B-TTC-AIME25 8B • Updated Sep 28, 2025 lasgroup/Qwen3-8B-TTC-MATH500 8B • Updated Sep 28, 2025 • 1 lasgroup/Qwen3-8B-TTC-Codeforces 8B • Updated Oct 2, 2025
Test-Time Curricula for Targeted RL (Qwen3-8B-Base) lasgroup/Qwen3-8B-Base-TTC-AIME24 8B • Updated Sep 29, 2025 • 3 lasgroup/Qwen3-8B-Base-TTC-AIME25 8B • Updated Sep 29, 2025 • 2 lasgroup/Qwen3-8B-Base-TTC-MATH500 8B • Updated Sep 29, 2025 • 2 lasgroup/Qwen3-8B-Base-TTC-Codeforces 8B • Updated Oct 2, 2025 • 2
Self-Distillation Reinforcement Learning via Self-Distillation Paper • 2601.20802 • Published Jan 28 • 42 Self-Distillation Enables Continual Learning Paper • 2601.19897 • Published Jan 27 • 27 Aligning Language Models from User Interactions Paper • 2603.12273 • Published Feb 18
Test-Time Curricula for Targeted RL Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning Paper • 2510.04786 • Published Oct 6, 2025 • 3 lasgroup/verifiable-corpus Viewer • Updated Oct 9, 2025 • 90.7k • 127 • 1 Test-Time Curricula for Targeted RL (Qwen3-8B) Collection 8 items • Updated Oct 3, 2025 Test-Time Curricula for Targeted RL (Qwen3-4B-Instruct-2507) Collection 8 items • Updated Oct 3, 2025
Learning on the Job: Test-Time Curricula for Targeted Reinforcement Learning Paper • 2510.04786 • Published Oct 6, 2025 • 3
Test-Time Curricula for Targeted RL (Qwen3-4B-Instruct-2507) Collection 8 items • Updated Oct 3, 2025
Test-Time Model Merging (TTMM) Collection of fine-tuned models and expert adapters from "Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging" (COLM 25) Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging Paper • 2505.14136 • Published May 20, 2025 rbertolissi/Llama-3.2-1B-Wikipedia Updated Jul 28, 2025 rbertolissi/Llama-3.2-1B-TTMM-Wikipedia Updated Jul 29, 2025 rbertolissi/Qwen2.5-1.5B-Wikipedia Updated Jul 28, 2025
Local Mixtures of Experts: Essentially Free Test-Time Training via Model Merging Paper • 2505.14136 • Published May 20, 2025
Test-Time Curricula for Targeted RL (Qwen3-8B) lasgroup/Qwen3-8B-TTC-AIME24 8B • Updated Sep 28, 2025 lasgroup/Qwen3-8B-TTC-AIME25 8B • Updated Sep 28, 2025 lasgroup/Qwen3-8B-TTC-MATH500 8B • Updated Sep 28, 2025 • 1 lasgroup/Qwen3-8B-TTC-Codeforces 8B • Updated Oct 2, 2025
Test-Time Curricula for Targeted RL (Qwen3-4B-Instruct-2507) lasgroup/Qwen3-4B-Instruct-2507-TTC-AIME24 4B • Updated Sep 29, 2025 lasgroup/Qwen3-4B-Instruct-2507-TTC-AIME25 4B • Updated Sep 29, 2025 lasgroup/Qwen3-4B-Instruct-2507-TTC-MATH500 4B • Updated Sep 29, 2025 • 1 lasgroup/Qwen3-4B-Instruct-2507-TTC-Codeforces 4B • Updated Oct 2, 2025
Test-Time Curricula for Targeted RL (Qwen3-8B-Base) lasgroup/Qwen3-8B-Base-TTC-AIME24 8B • Updated Sep 29, 2025 • 3 lasgroup/Qwen3-8B-Base-TTC-AIME25 8B • Updated Sep 29, 2025 • 2 lasgroup/Qwen3-8B-Base-TTC-MATH500 8B • Updated Sep 29, 2025 • 2 lasgroup/Qwen3-8B-Base-TTC-Codeforces 8B • Updated Oct 2, 2025 • 2
Test-Time Curricula for Targeted RL (AIME25) lasgroup/Qwen3-8B-TTC-AIME25-Q0 8B • Updated Oct 3, 2025 lasgroup/Qwen3-8B-TTC-AIME25-Q1 8B • Updated Oct 3, 2025 lasgroup/Qwen3-8B-TTC-AIME25-Q2 8B • Updated Oct 3, 2025 lasgroup/Qwen3-8B-TTC-AIME25-Q3 8B • Updated Oct 3, 2025