PaTaRM
Collection
PaTaRM is a Generative Reward Model (GRM) for RLHF alignment. • 4 items • Updated • 1
This is the PaTaRM-8B model, part of the PaTaRM series. For full details including overview, usage examples, training data, and citation, please refer to the main collection README:
👉 AIJian/PaTaRM — Main README
| Model | Base | Link |
|---|---|---|
| PaTaRM-8B | Qwen3-8B | AIJian/PaTaRM-8B |
| PaTaRM-14B | Qwen3-14B | AIJian/PaTaRM-14B |
@misc{jian2026patarmbridgingpairwisepointwise,
title={PaTaRM: Bridging Pairwise and Pointwise Signals via Preference-Aware Task-Adaptive Reward Modeling},
author={Ai Jian and Jingqing Ruan and Xing Ma and Dailin Li and Weipeng Zhang and Ke Zeng and Xunliang Cai},
year={2026},
eprint={2510.24235},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2510.24235},
}