mlp-surgery — restore top 30 (raw) on Qwen2.5-3B
This is the headline result of the project. Top-30 raw-error-norm restoration crosses the base model on GSM8K (+1.14) and fully recovers ARC (within 0.5pt of base).
Qwen2.5-3B-Instruct that was fine-tuned on perplexity-filtered OpenHermes 2.5 (which damaged its reasoning), then partially restored by copying back the top-30 most-damaged MLP layers from the base model. No retraining. Just weight surgery.
Method (short)
- Take the broken finetune (mlp-surgery-broken).
- Score MLP layer parameters via raw gradient-norm scoring on the broken model's 100 GSM8K errors.
- Copy the top-30 from base into the broken model. Save.
Eval
lm-eval, GSM8K flexible-extract 5-shot, ARC Challenge acc_norm 0-shot, no chat template, batch_size 8, single seed (2026-05-07).
| Model | GSM8K | ARC Challenge |
|---|---|---|
| Base (Qwen2.5-3B-Instruct) | 63.15% | 48.12% |
| After SFT (broken) | 61.64% | 45.22% |
| Restore top 5 | 63.00% | 45.73% |
| Restore top 15 | 63.46% | 46.50% |
| Restore top 30 | 64.29% | 48.55% |
| Restore specificity top 10 | 61.64% | 45.22% |
This model is the "Restore top 30" row.
Companion models + code
- mlp-surgery-broken — input baseline
- mlp-surgery-restored-top5
- mlp-surgery-restored-top15
- mlp-surgery-restored-top30
- mlp-surgery-restored-specificity-top10
- Code: https://github.com/Malum0x/mlp-surgery
Caveats
Single seed. Magnitudes are 1pt. The "no chat template" eval style means absolute numbers are below what you'd see with chat template applied (78% GSM8K), but relative comparisons across the same setup are meaningful.
- Downloads last month
- 96