Post
32
which one is better for alignment?
ORPO or GSPO?
I think ORPO is pretty good and fast but GSPO makes it attack its own opinions, reflecting on itself, correcting itself. Although GSPO is much slower, it may still be pretty effective. And for GSPO you don't have to provide the whole reasoning corpus, you just provide the end result (One word maybe to answer a binary question).
And GSPO may be better than GRPO because it is rewarding 'train of thoughts' whereas GRPO is rewarding single tokens. Alignment is mostly train of thoughts, not a single token like a math answer..
ORPO or GSPO?
I think ORPO is pretty good and fast but GSPO makes it attack its own opinions, reflecting on itself, correcting itself. Although GSPO is much slower, it may still be pretty effective. And for GSPO you don't have to provide the whole reasoning corpus, you just provide the end result (One word maybe to answer a binary question).
And GSPO may be better than GRPO because it is rewarding 'train of thoughts' whereas GRPO is rewarding single tokens. Alignment is mostly train of thoughts, not a single token like a math answer..