Abstract
Large language models can be trained to produce calibrated probabilistic forecasts for supply chain disruptions, outperforming existing baselines and enabling decision-ready predictions through domain-specific adaptation.
Anticipating supply chain disruptions before they materialize is a core challenge for firms and policymakers alike. A key difficulty is learning to reason reliably about infrequent, high-impact events from noisy and unstructured inputs - a setting where general-purpose models struggle without task-specific adaptation. We introduce an end-to-end framework that trains LLMs to produce calibrated probabilistic forecasts using realized disruption outcomes as supervision. The resulting model substantially outperforms strong baselines - including GPT-5 - on accuracy, calibration, and precision. We also show that training induces more structured and reliable probabilistic reasoning without explicit prompting. These results suggest a general pathway for training domain-specific forecasting models that produce decision-ready signals. To support transparency we open-source the evaluation dataset used in this study. Dataset: https://huggingface.co/datasets/LightningRodLabs/supply-chain-predictions
Community
We train an LLM to forecast supply chain disruptions from news using Foresight Learning, an RL-based framework that supervises probabilistic predictions with realized outcomes. Our fine-tuned model outperforms GPT-5 across all metrics on a held-out test set covering 25 countries and 88 product categories — with Precision@10% improving from 8.7% to 34.8%. We further show that training under a forecasting objective induces structured probabilistic reasoning without explicit prompting: base-rate anchoring, statistical modeling, and iterative uncertainty refinement emerge spontaneously, with the model learning to think about prediction rather than just pattern-match to an answer. These results suggest a general pathway for training domain-specific forecasting models wherever predictive signal exists in unstructured text. We open-source the evaluation dataset.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- EventCast: Hybrid Demand Forecasting in E-Commerce with LLM-Based Event Knowledge (2026)
- Beyond Accuracy: Evaluating Forecasting Models by Multi-Echelon Inventory Cost (2026)
- LLM-Grounded Explainable AI for Supply Chain Risk Early Warning via Temporal Graph Attention Networks (2026)
- Zero-Shot Time Series Foundation Models for Annual Institutional Forecasting Under Data Sparsity (2026)
- Cast-R1: Learning Tool-Augmented Sequential Decision Policies for Time Series Forecasting (2026)
- Time Series Foundation Models as Strong Baselines in Transportation Forecasting: A Large-Scale Benchmark Analysis (2026)
- Uncertainty-Gated Generative Modeling (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.01298 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper