Understanding and Using Supervised…

Cameron R. Wolfe, Ph.D.

Sep 11, 2023

Understanding how SFT works from the idea to a working implementation...

Read →

5 Comments

Sebastian Raschka, PhD

Sep 12, 2023

Awesome article as always, Cameron!

I hope you don't mind my minor nit & correction: in your article, you mentioned in some places that InstructGPT originally proposed the three step process Pretraining -> SFT -> RLHF. As far as I know, that's not correct and the procedure was proposed 2 years earlier via the "Learning to summarize from human feedback" paper (https://arxiv.org/abs/2009.01325).

(PS: I have a list of a few additional PPO resources here: https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives)

Expand full comment

Reply (2)

Cameron R. Wolfe, Ph.D.

Sep 12, 2023

Fixed this in this post too to avoid confusion!

Expand full comment

Cameron R. Wolfe, Ph.D.

Sep 12, 2023

Good correction! Thanks. I'll be sure to include references to this paper in the future as well.

And, thanks for the extra links as always!

Expand full comment

can kara

May 4, 2024

I need an answer but can't really find it in here. Can a LLM learn new things buy fine-tuning? For example i need add some historical personas with their some informations that i know that the model hasn't seen these data during pre-training?

Expand full comment

Reply (1)

Gui Moreira

Feb 5

If you fine-tune a model with this "new information", the model will "learn" it.

Expand full comment

Deep (Learning) Focus

Understanding and Using Supervised…