Discussion about this post

User's avatar
Paul's avatar

as always, quality content, thank you very much!

Expand full comment
Anshu's avatar
Oct 5Edited

Thanks, Cameron for sharing your deep insights on RL. Really enjoying learning from them. A question regarding the contextual bandit section where you mention "Our complete trajectory is a single action and reward!": Does it mean this is in a single turn (single turn meaning single query-response generation) setting? If so, how could we extend this to a multi-turn conversational setting?

Expand full comment
9 more comments...

No posts