scGPT fine-tuned on Norman 2019

Produced as part of the sc-interp single-cell model comparison repo.

Provenance

Source code commit: da2a582
Runner: scripts/run_scgpt.py
Dataset manifest: data/norman/manifest.yaml

Base model

Initialised from the scGPT whole-human checkpoint (~33M cells of CellxGene Census), 12 transformer layers, 512 hidden dim, 8 heads. Downloaded from the official Google Drive folder linked in the scGPT README. Not currently hosted on the HuggingFace Hub.

Training

Task: perturb-GEP, control cells as input, matched perturbed cells as target
Runner: invoked via the sc-interp dispatcher python -m scripts.run scgpt --dataset norman
Split: GEARS simulation split with seed 42 (152 train / 33 val / 99 test perturbations), materialised once by scripts/data/gears.py into data/norman/splits/simulation_42_0.75.json and consumed by runners via scripts/data/splits.py
Recipe adapted from scGPT Tutorial_Perturbation.ipynb
Loss: masked MSE on all gene positions
Optimiser: Adam, lr 1e-4, StepLR gamma 0.9 per epoch
AMP: enabled
Attention: standard torch.nn.MultiheadAttention (flash-attn not installed, Wqkv weights renamed to in_proj during load)

Budget and stopping


epochs trained	15 / 15
cells seen	794,880
gradient steps	12,420
wall clock	2.0 hours (H100 PCIe)
best val pearson (all-gene)	0.9879
best val epoch	7
stopping reason	max_epochs

Test set metrics (cell-eval)

metric	mean	median	max
pearson_delta	0.5067	0.5503	0.9132
mse	0.0038	0.0033	0.0183
mae	0.0209	0.0204	0.0449
mse_delta	0.0038	0.0033	0.0183
mae_delta	0.0209	0.0204	0.0449
de_direction_match	0.7159	0.7126	0.9434
de_sig_genes_recall	0.9076	0.9089	0.9906
de_spearman_sig	0.2633	0.2633	0.2633
de_spearman_lfc_sig	0.8006	0.8217	0.9571
pr_auc	0.0782	0.0768	0.1994
roc_auc	0.3839	0.3802	0.5288
de_nsig_counts_real	487.3535	501.0000	1122.0000
de_nsig_counts_pred	4915.4646	4924.0000	4989.0000
overlap_at_N	0.0242	0.0218	0.0978
overlap_at_50	0.0265	0.0200	0.1400
overlap_at_100	0.0233	0.0200	0.1000
overlap_at_200	0.0244	0.0200	0.1000
overlap_at_500	0.0240	0.0220	0.1040
precision_at_N	0.0899	0.0906	0.2128
precision_at_50	0.0265	0.0200	0.1400
precision_at_100	0.0231	0.0200	0.1000
precision_at_200	0.0248	0.0200	0.1000
precision_at_500	0.0246	0.0220	0.1040
discrimination_score_l1	0.5911	0.5758	1.0000
discrimination_score_l2	0.6160	0.6263	1.0000
discrimination_score_cosine	0.6502	0.7172	1.0000
pearson_edistance	0.6486	0.6486	0.6486
clustering_agreement	0.2460	0.2460	0.2460

For reference, the scGPT paper Table 1 reports pearson_delta 0.459 (ALL) and 0.546 (DE) on Norman. Our all-gene mean (0.5067) sits between the paper's ALL and DE columns. de_nsig_counts_real vs de_nsig_counts_pred (~487 vs ~4915 non-significant genes per perturbation, out of 5045 total) quantifies the scGPT-typical over-prediction of DE: the model flags far fewer genes as non-significant than reality, which is why roc_auc (0.38) and pr_auc (0.08) on DE classification are low while de_sig_genes_recall (0.91) is high.

Known limitations

Trained with dropout=0.2 and pert_pad_id=2 inherited from the pretrained args.json. The scGPT tutorial hardcodes dropout=0 and pert_pad_id=0 for fine-tuning; switching to those values is expected to improve metrics.
Early stopping used all-gene val pearson, which saturates near 0.99 and never fired; training ran the full 15 epochs. pearson_delta or pearson_de_delta would be a stricter stop criterion.
Low overlap_at_50 (~~0.03) and overlap_at_N (~~0.024) are consistent with scGPT's known weakness at identifying the specific top-k DE genes driving a perturbation, rather than a training flaw. See the GEARS and CellFlow papers for the same observation.

Files

best_model.pt — fine-tuned state dict, loads into TransformerGenerator built with use_fast_transformer=False
training_stats.json — unified sc-interp TrainStats schema: top-level keys wall_clock_s, wandb_run_url, reason, details (with model-specific training metadata nested in details)

Usage

from huggingface_hub import hf_hub_download

ckpt = hf_hub_download(
    repo_id="matthewshu/scGPT-norman-ft",
    filename="best_model.pt",
)

# Or reproduce from source (runs in the scgpt venv):
#   python -m scripts.run scgpt --dataset norman --hf-repo matthewshu/scGPT-norman-ft

Citation

Dataset: Norman et al. 2019 (Science). Base foundation model: Cui et al. 2024 (Nat Methods). See the scGPT and GEARS repos for BibTeX.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

Other

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support