The OlmoRL infrastructure improvements are clutch for scaling RLVR. Cutting training time from 15 to 6 days via asynchronous updates and inflight weight refreshes is the kind of eng work that actually matters when compute budgets dominate. Been wrestling with similiar inference bottlenecks on our side, and the 5x inference-to-training ratio really captures why naive RL setups struggle at scale. The truncated importance sampling fix for engine mismatches is kinda slept on too, subtle but essential for numerical stability in heteregenous environments.
Totally agree. Based on Olmo 3, it seems like importance sampling is essential to get stable training. And yes async is very important, this idea is explained really well in pipelineRL: https://arxiv.org/abs/2509.19128
that's what I call a deep dive, took me 4 hours to read! thank you!
i'll watch the talk from Nathan
I guess the reading time estimate from substack is way off! Lol
The OlmoRL infrastructure improvements are clutch for scaling RLVR. Cutting training time from 15 to 6 days via asynchronous updates and inflight weight refreshes is the kind of eng work that actually matters when compute budgets dominate. Been wrestling with similiar inference bottlenecks on our side, and the 5x inference-to-training ratio really captures why naive RL setups struggle at scale. The truncated importance sampling fix for engine mismatches is kinda slept on too, subtle but essential for numerical stability in heteregenous environments.
Totally agree. Based on Olmo 3, it seems like importance sampling is essential to get stable training. And yes async is very important, this idea is explained really well in pipelineRL: https://arxiv.org/abs/2509.19128
Also, I would highly recommend checking out Nathan's full talk on Olmo 3 Think!
https://www.interconnects.ai/p/building-olmo-3-think
This is amazing! btw looking at table 33 in the olmo paper and they have the activation function as swiglu and not silu
https://cameronrwolfe.substack.com/p/olmo-3#:~:text=Finally%2C%20Olmo%203%20uses%20Sigmoid%20Linear%20Unit%20(SiLU)%20activations%20and%20is%20pretrained%20with
Couldn't agree more. What's the real value despite performance?