Discussion about this post

User's avatar
Paul's avatar

stellar work as always, thank you.

I didn't know about the forward and reverse KL divergences and https://huggingface.co/blog/NormalUhr/kl-divergence-estimator-rl-llm was also very interesting, thank you for the link!

MatthewK's avatar

RL also involves fewer bits of information than SFT. So how do you fairly compare them?

10 more comments...

No posts

Ready for more?