Discussion about this post

User's avatar
Sebastian Raschka, PhD's avatar

Awesome article as always, Cameron!

I hope you don't mind my minor nit & correction: in your article, you mentioned in some places that InstructGPT originally proposed the three step process Pretraining -> SFT -> RLHF. As far as I know, that's not correct and the procedure was proposed 2 years earlier via the "Learning to summarize from human feedback" paper (https://arxiv.org/abs/2009.01325).

(PS: I have a list of a few additional PPO resources here: https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives)

Expand full comment
can kara's avatar

I need an answer but can't really find it in here. Can a LLM learn new things buy fine-tuning? For example i need add some historical personas with their some informations that i know that the model hasn't seen these data during pre-training?

Expand full comment
3 more comments...

No posts