GPT-oss from the Ground Up

Cameron R. Wolfe, Ph.D.

Aug 18

Everything you should know about OpenAI's new open-weight language models...

Read →

13 Comments

Benjamin Riley

Aug 18

First of all, this is incredible, thanks for sharing this with us all.

Second, at risk of exposing my lack of technical knowledge -- in the Harmony prompt example you have "always respond in riddles" nested in the developer level of the hierarchy. Should the final output have therefore been a riddle or...? Would love to have this riddle about a riddle explained!

Expand full comment

Reply (2)

Cameron R. Wolfe, Ph.D.

Aug 18

Yes it should be a riddle! Great point! I'll fix this LOL

Expand full comment

Cameron R. Wolfe, Ph.D.

Aug 18

Riddle is now included :)

Expand full comment

Paul

20h

I finally found the time to read the blog, quality content as always, thank you very much!

One little typo caught my eye, "For example, Gemma-3 adopts a 5:1 ratio, meaning that there is one dense attention layer for ever*missing_y* five sliding window attention layers."

And one question regarding "Specifically, GPT-oss uses group sizes of eight—meaning that key and query values are shared among groups of eight attention heads—for grouped-query attention in both model sizes."

Is this correct or should it be "meaning that keys and values are shared among groups of eight attention heads" instead of "meaning that key and query values are shared among groups of eight attention heads". I thought, query values are not shared or do I misunderstand something?

Thank you and best regards

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

20h

Thank you for the typo - just fixed it.

You're correct. It should be keys / values (not keys / queries). I just fixed this as well - thank you so much!

Expand full comment