Online versus Offline RL for LLMs

Cameron R. Wolfe, Ph.D.

Sep 8, 2025

A deep dive into the online-offline performance gap in LLM alignment...

Read →

5 Comments

Paul

Oct 18

again, super dense and great article, thx!

Reply (1)

Cameron R. Wolfe, Ph.D.

Oct 19

Thanks for reading!

The 2020 Report

Sep 8

I wonder if these algorithms use network analysis like Eigen vector and degree centrality

Reply (1)

Cameron R. Wolfe, Ph.D.

Sep 8

Haven't seen anything like this yet, but totally possible that it's out there!

Neural Foundry

Oct 29

This is an exceptionally thorough analysis of the online-offline performance gap in LLM alignment. The finding that on-policy sampling provides consistent benefits across different scales and domains is partcularly compelling. The semi-online approach seems like a pragmatic middle ground that could democratize access to higher-quality alignment techniques for smaller teams.

Deep (Learning) Focus

Online versus Offline RL for LLMs