Dec 15, 2025

Fully-open artifacts with the potential to make LLM research a reality for anyone...

7 Comments

Paul

Dec 18

that's what I call a deep dive, took me 4 hours to read! thank you!

i'll watch the talk from Nathan

Reply (1)

Share

Cameron R. Wolfe, Ph.D.

Dec 18

I guess the reading time estimate from substack is way off! Lol

The OlmoRL infrastructure improvements are clutch for scaling RLVR. Cutting training time from 15 to 6 days via asynchronous updates and inflight weight refreshes is the kind of eng work that actually matters when compute budgets dominate. Been wrestling with similiar inference bottlenecks on our side, and the 5x inference-to-training ratio really captures why naive RL setups struggle at scale. The truncated importance sampling fix for engine mismatches is kinda slept on too, subtle but essential for numerical stability in heteregenous environments.

Reply (1)

Share

Cameron R. Wolfe, Ph.D.

Dec 16

Totally agree. Based on Olmo 3, it seems like importance sampling is essential to get stable training. And yes async is very important, this idea is explained really well in pipelineRL: https://arxiv.org/abs/2509.19128

Reply

Share

Cameron R. Wolfe, Ph.D.

Dec 15

Also, I would highly recommend checking out Nathan's full talk on Olmo 3 Think!

https://www.interconnects.ai/p/building-olmo-3-think

Reply

Share