agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo-nosft Text Generation • 8B • Updated Oct 30, 2025 • 3
agurung/Qwen2.5-7B-Instruct-flawedfiction-grpo-impdata Text Generation • 8B • Updated Oct 29, 2025 • 3
agurung/Qwen2.5-7B-Instruct-flawedfiction-latent-grpo Text Generation • 8B • Updated Oct 27, 2025 • 4
agurung/v3ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset_newprompt Text Generation • 8B • Updated Oct 25, 2025 • 8
agurung/v2ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset Text Generation • 8B • Updated Oct 25, 2025 • 3
agurung/v1ff_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset Text Generation • 8B • Updated Oct 25, 2025 • 3
agurung/Qwen2.5-7B-Instruct-CONTRASTIVE-NRL-NCP-GRPO-NLL-UNBOUNDED-IMPLICITPROMPT-RPWITHOUTUPDATE Text Generation • 8B • Updated Sep 28, 2025 • 4
agurung/Qwen2.5-7B-Instruct-CONTRASTIVE-NRL-NCP-GRPO-NLL-UNBOUNDED-IMPLICITPROMPT-RPWITHSFT Text Generation • 8B • Updated Sep 25, 2025 • 7
agurung/Qwen2.5-7B-Instruct-CONTRASTIVE-NRL-NCP-GRPO-NLL-UNBOUNDED-IMPLICITPROMPT Text Generation • 8B • Updated Aug 26, 2025 • 5
agurung/tmp_renamed_v4_savebestearly_sft_qwen7b_25percent_lr_1e4_bptt_offset Text Generation • 8B • Updated Aug 25, 2025 • 4
agurung/Qwen2.5-7B-Instruct-CONTRASTIVE-NRL-NCP-GRPO-NLL-UNBOUNDED Text Generation • 8B • Updated Aug 22, 2025 • 7
agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e4_bptt_offset Text Generation • 8B • Updated Aug 19, 2025 • 4
agurung/v4_savebestearly_sft_qwen7B_25percent_lr_1e3_bptt_offset Text Generation • 8B • Updated Aug 17, 2025 • 4
agurung/Qwen2.5-7B-Instruct-CONTRASTIVE-NRL-NCP-GRPO-PPL-UNBOUNDED Text Generation • 8B • Updated Aug 13, 2025 • 10
agurung/v2sft_all_qwen7B_25percent_lr_1e6_allgrad_no_reasoning_projector Text Generation • 8B • Updated Aug 13, 2025 • 6