13 Comments
User's avatar
Dr. Ashish Bamania's avatar

Great deep dive and very helpful. Thanks for writing this!

Cameron R. Wolfe, Ph.D.'s avatar

Thank you so much, and of course!

DesignREM support's avatar

Amazing article, a true gem. I would be interested to learn more about how DeepSeek used reinforcement learning to such great effect.

Cameron R. Wolfe, Ph.D.'s avatar

Working on that article now!

Tejas Parnerkar's avatar

Great in depth article!

Cameron R. Wolfe, Ph.D.'s avatar

Thanks for the kind words!

Rahul Saini's avatar

Great in-depth article on MoE LLMs.

Michael's avatar

I agree with Dr. Bamania- this was very rewarding reading.

Cameron R. Wolfe, Ph.D.'s avatar

Thank you! I'm glad you found it helpful!

Just a Placebo's avatar

Do you expect this leapfrogging to persist for a while longer? The collective response to this tug of war seems overwhelmingly simplified and “gotcha!” meme-based. Public discourse overall appears to lean into a predictable lack of reservation for exponential leaps between companies, unknowns, and established methods.

JP's avatar

Circling back to this because it connects directly to something playing out right now. Qwen 3.5's small models shipped and the headline is '4B beats 80B'. Sounds wild until you realise it's 4B dense vs ~3B active in the MoE. The architecture lesson in this post is exactly what people need before taking that claim at face value. I broke down the whole thing here: https://reading.sh/your-laptop-is-an-ai-server-now-370bad238461?sk=1cf7a4391e614720ecbd6e9bc3f076a2

Srini Vijay, PhD's avatar

Great article. I couldn't finish it in one go though. I have book marked it and will come back to it later.