v0.2 Training: SFT only or SFT+DPO?
#15
by weizechen - opened
Hi. I've read that the v0.1 documentation mentions SFT+DPO training, while v0.2 only refers to SFT. The alignment handbook also lacks a DPO recipe. Was DPO used for v0.2? Thanks!
Hi, we only used SFT for v0.2
Thanks for the reply!
weizechen changed discussion status to closed