Sangsang/feedback_disallowed_ema_Llama-3.1-8B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated about 1 month ago • 2
Sangsang/feedback_allowed_ema_Llama-3.1-8B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • Updated about 1 month ago • 5
Sangsang/feedback_disallowed_ema_DeepSeek-R1-Distill-Llama-8B_reverse_kl_ema0p999_ep30 Text Generation • Updated Mar 26 • 1
Sangsang/grpo_DeepSeek-R1-Distill-Llama-8B_bs8_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated Mar 26
Sangsang/ci-feedback_weighted_asym_bi_kl_fixed_ema_Qwen2.5-7B-Instruct_bw1p6_fw0p4_ema0p999_ep30 Text Generation • 8B • Updated Mar 19 • 5
Sangsang/ci-feedback_weighted_asym_bi_kl_fixed_ema_Qwen2.5-7B-Instruct_bw1p0_fw1p0_ema0p999_ep30 Text Generation • 8B • Updated Mar 19 • 5
Sangsang/ci-feedback_both_ema_plus_interp_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_stw0p3_ep30 Text Generation • 8B • Updated Mar 19 • 4
Sangsang/ci-feedback_both_interp_Qwen2.5-7B-Instruct_from_Qwen2.5-7B-Instruct_jsd_b0p8_stw0p3_ep30 Text Generation • 8B • Updated Mar 19 • 5
Sangsang/ci-feedback_both_ema_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_ep30 Text Generation • 8B • Updated Mar 18 • 6
Sangsang/ci-feedback_both_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • 8B • Updated Mar 18 • 7
Sangsang/ci-feedback_disallowed_ema_Qwen2.5-7B-Instruct_jsd_b0p8_ema0p999_ep30 Text Generation • 8B • Updated Mar 18 • 5
Sangsang/ci-feedback_disallowed_ema_Qwen2.5-7B-Instruct_reverse_kl_ema0p999_ep30 Text Generation • 8B • Updated Mar 18 • 6