Scaling Laws for LLMs: From GPT-3 to o3

Cameron R. Wolfe, Ph.D.

Jan 6

131

Understanding the current state of LLM scaling and the future of AI research...

Read →

9 Comments

Yousef Kotp

Feb 15

Perfect article, thank you so much for your writing and amazing insights!

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Feb 16

Thank you very much for the kind words!

Expand full comment

Srivatsa Srinivas

Jan 19

Hi Cameron,

Thanks for the awesome article

You might be interested in this https://arxiv.org/abs/2501.04682 article for the o1/o3 section

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Jan 20

Thank you for sharing!

Expand full comment

madi

Jan 6

Excellent, thank you!!

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Jan 6

Thanks for reading!

Expand full comment

Michael

Jan 6

Brilliant exposition, Cameron! My untutored thought is that we have pushed scaling as far as we need to. Future improvements will hardly reward the effort. However, I don't feel we have hit a "wall". Physics has come to these grinding halts before and always a new paradigm emerges and we trudge forward again. So I'm hopeful. I do dislike that research is becoming secret. Driven by commercial and nationalistic rivalries this does no one any good. We'll come to regret it.

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Jan 6

Thank you! And, I totally agree!

Expand full comment

Jay Titus

Jul 9

Great article. Thank you for writing this.

I'm curious to get your thoughts on something if you have time. It is related to scaling laws in the sense that creating a foundation/frontier model is only for the big players, and that the natural progression of frontier/foundation models is to absorb more and more functionality. It may be that we are hitting the upper bound of scraping, but agents, RLHF, and tool use are continually feeding and improving context.

While there will always be highly specialized models like Alpha Fold or Alpha Genome it seems to me that most software and functionality can and will be absorbed into foundation/frontier models over time. A form of this scaling law has always existed. A big company can always enter the fray when a little guy invents something new, but that is a deliberate effort involving new product development, additional investment, etc.... I think Foundation models kind of do this naturally with RL, RLHF, pretraining, and inference time scaling.

Some examples are Accounting, Financial Planning, Health Services, etc... You could end up with an Apple like shakedown of 30% equity on every agent someone writes, or maybe they intentionally or unintentionally cut everyone out and kill competition as there isn't anything better. Tool use is just a major game changer here.

The jury is still out on what will happen and this is just a thought experiment, but I think it might also be a scaling law that isn't discussed much. GenAI is also a new UI and a lot will change over the few years. Anyway, just throwing it out there to see if anyone has thoughts on other scaling laws or other laws in general that may impacted by foundation/frontier model scaling.

Expand full comment

Deep (Learning) Focus

Scaling Laws for LLMs: From GPT-3 to o3