Sergio Paniego PRO
AI & ML interests
Recent Activity
Organizations
Posts 85
Andβ¦ it's already supported in TRL, built by Kashif Rasul. you can really feel the pace of development in the team π
Paper by Ruixiang ZHANG, He Bai, Huangjie Zheng, Navdeep Jaitly, Ronan Collobert, Yizhe Zhang at Apple π
How it works: the model generates completions at a training-time temperature (T_train) with top_k/top_p truncation, then fine-tunes on them with plain cross-entropy. no labels or verifier needed
You can try it right away with this ready-to-run example (Qwen3-4B on rStar-Coder):
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd.py
or benchmark a checkpoint with the eval script:
https://github.com/huggingface/trl/blob/main/trl/experimental/ssd/ssd_eval.py
One neat insight from the paper: T_train and T_eval compose into an effective T_eff = T_train Γ T_eval, so a broad band of configs works well. even very noisy samples still help
Want to dig deeper?
Paper: Embarrassingly Simple Self-Distillation Improves Code Generation (2604.01193)
Trainer docs: https://huggingface.co/docs/trl/main/en/ssd_trainer
Articles 16
Welcome Gemma 4: Frontier multimodal intelligence on device
- Runtime errorRL
CARLA Environment Server
πControl a Carla driving simulation with custom actions
- Runtime errorRL
CARLA Environment Server
πControl a CARLA driving simulator with custom actions
- SleepingAgents
Carla Grpo Trolley
πVisualize your programβs I/O activity in real time
-
sergiopaniego/Qwen3-0.6B-carla-trolley-escape
0.8B β’ Updated β’ 5
- Running3.82k
The Ultra-Scale Playbook
π3.82kThe ultimate guide to training LLM on large GPU Clusters
- Running on CPU UpgradeFeatured3.14k
The Smol Training Playbook
π3.14kThe secrets to building world-class LLMs
- Running312
Evaluation Guidebook
π312Explore LLM benchmark trends over time
- Running223
FineVision: Open Data is All You Need
π223A new open-source dataset for training VLMs
- Runtime errorRL
CARLA Environment Server
πControl a Carla driving simulation with custom actions
- Runtime errorRL
CARLA Environment Server
πControl a CARLA driving simulator with custom actions
- SleepingAgents
Carla Grpo Trolley
πVisualize your programβs I/O activity in real time
-
sergiopaniego/Qwen3-0.6B-carla-trolley-escape
0.8B β’ Updated β’ 5
- Running3.82k
The Ultra-Scale Playbook
π3.82kThe ultimate guide to training LLM on large GPU Clusters
- Running on CPU UpgradeFeatured3.14k
The Smol Training Playbook
π3.14kThe secrets to building world-class LLMs
- Running312
Evaluation Guidebook
π312Explore LLM benchmark trends over time
- Running223
FineVision: Open Data is All You Need
π223A new open-source dataset for training VLMs
spaces 135
VLM Object Understanding
Explore object detection, visual grounding, keypoint Detecti
Qwen2-VL-7B
Ask questions about charts in images
SmolVLM-trl-dpo-rlaif-v
Generate text from an image and question
SmolVLM-trl-sft-ChartQA
Ask questions about charts in images
Huggingface Static 1a5eab
View and monitor key metrics with an interactive dashboard
Huggingface Static 71a48c
Explore and monitor your data with an interactive dashboard