scGPT fine-tuned on Norman 2019
Produced as part of the sc-interp single-cell model comparison repo.
Provenance
- Source code commit:
da2a582 - Runner:
scripts/run_scgpt.py - Dataset manifest:
data/norman/manifest.yaml
Base model
Initialised from the scGPT whole-human checkpoint (~33M cells of CellxGene Census), 12 transformer layers, 512 hidden dim, 8 heads. Downloaded from the official Google Drive folder linked in the scGPT README. Not currently hosted on the HuggingFace Hub.
Training
- Task: perturb-GEP, control cells as input, matched perturbed cells as target
- Runner: invoked via the sc-interp dispatcher
python -m scripts.run scgpt --dataset norman - Split: GEARS
simulationsplit with seed 42 (152 train / 33 val / 99 test perturbations), materialised once byscripts/data/gears.pyintodata/norman/splits/simulation_42_0.75.jsonand consumed by runners viascripts/data/splits.py - Recipe adapted from scGPT
Tutorial_Perturbation.ipynb - Loss: masked MSE on all gene positions
- Optimiser: Adam, lr 1e-4, StepLR gamma 0.9 per epoch
- AMP: enabled
- Attention: standard
torch.nn.MultiheadAttention(flash-attn not installed, Wqkv weights renamed to in_proj during load)
Budget and stopping
| epochs trained | 15 / 15 |
| cells seen | 794,880 |
| gradient steps | 12,420 |
| wall clock | 2.0 hours (H100 PCIe) |
| best val pearson (all-gene) | 0.9879 |
| best val epoch | 7 |
| stopping reason | max_epochs |
Test set metrics (cell-eval)
| metric | mean | median | max |
|---|---|---|---|
| pearson_delta | 0.5067 | 0.5503 | 0.9132 |
| mse | 0.0038 | 0.0033 | 0.0183 |
| mae | 0.0209 | 0.0204 | 0.0449 |
| mse_delta | 0.0038 | 0.0033 | 0.0183 |
| mae_delta | 0.0209 | 0.0204 | 0.0449 |
| de_direction_match | 0.7159 | 0.7126 | 0.9434 |
| de_sig_genes_recall | 0.9076 | 0.9089 | 0.9906 |
| de_spearman_sig | 0.2633 | 0.2633 | 0.2633 |
| de_spearman_lfc_sig | 0.8006 | 0.8217 | 0.9571 |
| pr_auc | 0.0782 | 0.0768 | 0.1994 |
| roc_auc | 0.3839 | 0.3802 | 0.5288 |
| de_nsig_counts_real | 487.3535 | 501.0000 | 1122.0000 |
| de_nsig_counts_pred | 4915.4646 | 4924.0000 | 4989.0000 |
| overlap_at_N | 0.0242 | 0.0218 | 0.0978 |
| overlap_at_50 | 0.0265 | 0.0200 | 0.1400 |
| overlap_at_100 | 0.0233 | 0.0200 | 0.1000 |
| overlap_at_200 | 0.0244 | 0.0200 | 0.1000 |
| overlap_at_500 | 0.0240 | 0.0220 | 0.1040 |
| precision_at_N | 0.0899 | 0.0906 | 0.2128 |
| precision_at_50 | 0.0265 | 0.0200 | 0.1400 |
| precision_at_100 | 0.0231 | 0.0200 | 0.1000 |
| precision_at_200 | 0.0248 | 0.0200 | 0.1000 |
| precision_at_500 | 0.0246 | 0.0220 | 0.1040 |
| discrimination_score_l1 | 0.5911 | 0.5758 | 1.0000 |
| discrimination_score_l2 | 0.6160 | 0.6263 | 1.0000 |
| discrimination_score_cosine | 0.6502 | 0.7172 | 1.0000 |
| pearson_edistance | 0.6486 | 0.6486 | 0.6486 |
| clustering_agreement | 0.2460 | 0.2460 | 0.2460 |
For reference, the scGPT paper Table 1 reports pearson_delta 0.459 (ALL) and 0.546 (DE) on Norman. Our all-gene mean (0.5067) sits between the paper's ALL and DE columns. de_nsig_counts_real vs de_nsig_counts_pred (~487 vs ~4915 non-significant genes per perturbation, out of 5045 total) quantifies the scGPT-typical over-prediction of DE: the model flags far fewer genes as non-significant than reality, which is why roc_auc (0.38) and pr_auc (0.08) on DE classification are low while de_sig_genes_recall (0.91) is high.
Known limitations
- Trained with
dropout=0.2andpert_pad_id=2inherited from the pretrainedargs.json. The scGPT tutorial hardcodesdropout=0andpert_pad_id=0for fine-tuning; switching to those values is expected to improve metrics. - Early stopping used all-gene val pearson, which saturates near 0.99 and never fired; training ran the full 15 epochs.
pearson_deltaorpearson_de_deltawould be a stricter stop criterion. - Low
overlap_at_50(0.03) and0.024) are consistent with scGPT's known weakness at identifying the specific top-k DE genes driving a perturbation, rather than a training flaw. See the GEARS and CellFlow papers for the same observation.overlap_at_N(
Files
best_model.ptโ fine-tuned state dict, loads intoTransformerGeneratorbuilt withuse_fast_transformer=Falsetraining_stats.jsonโ unified sc-interpTrainStatsschema: top-level keyswall_clock_s,wandb_run_url,reason,details(with model-specific training metadata nested indetails)
Usage
from huggingface_hub import hf_hub_download
ckpt = hf_hub_download(
repo_id="matthewshu/scGPT-norman-ft",
filename="best_model.pt",
)
# Or reproduce from source (runs in the scgpt venv):
# python -m scripts.run scgpt --dataset norman --hf-repo matthewshu/scGPT-norman-ft
Citation
Dataset: Norman et al. 2019 (Science). Base foundation model: Cui et al. 2024 (Nat Methods). See the scGPT and GEARS repos for BibTeX.