13 Comments
User's avatar
Benjamin Riley's avatar

First of all, this is incredible, thanks for sharing this with us all.

Second, at risk of exposing my lack of technical knowledge -- in the Harmony prompt example you have "always respond in riddles" nested in the developer level of the hierarchy. Should the final output have therefore been a riddle or...? Would love to have this riddle about a riddle explained!

Expand full comment
Cameron R. Wolfe, Ph.D.'s avatar

Yes it should be a riddle! Great point! I'll fix this LOL

Expand full comment
Cameron R. Wolfe, Ph.D.'s avatar

Riddle is now included :)

Expand full comment
Paul's avatar

I finally found the time to read the blog, quality content as always, thank you very much!

One little typo caught my eye, "For example, Gemma-3 adopts a 5:1 ratio, meaning that there is one dense attention layer for ever*missing_y* five sliding window attention layers."

And one question regarding "Specifically, GPT-oss uses group sizes of eight—meaning that key and query values are shared among groups of eight attention heads—for grouped-query attention in both model sizes."

Is this correct or should it be "meaning that keys and values are shared among groups of eight attention heads" instead of "meaning that key and query values are shared among groups of eight attention heads". I thought, query values are not shared or do I misunderstand something?

Thank you and best regards

Expand full comment
Cameron R. Wolfe, Ph.D.'s avatar

Thank you for the typo - just fixed it.

You're correct. It should be keys / values (not keys / queries). I just fixed this as well - thank you so much!

Expand full comment
sian cao's avatar

I think the image in the Routing section has a minor bug? In the formula of softmax, the denominator should be the sum of all N experts, not K?

Expand full comment
Cameron R. Wolfe, Ph.D.'s avatar

You’re right! Just fixed it. Thank you for calling this out.

Expand full comment
Anay's avatar

Superb breakdown @Cameron.

ROPE section is really intuitive.

Expand full comment
Cameron R. Wolfe, Ph.D.'s avatar

Thanks so much!

Expand full comment
Pankaj's avatar

Thanks a lot @Cameron ! This is exactly what I wanted to read. This article is so rich in knowledge.

May I ask you one question? May I use your articles as the reference for my educational writing/tutorials purposes? Thank you.

Expand full comment
Cameron R. Wolfe, Ph.D.'s avatar

Go for it as long as you provide a citation / reference!

Expand full comment
JM Guitera's avatar

thanks Cameron , very impressive recap!

Expand full comment
Cameron R. Wolfe, Ph.D.'s avatar

Of course! Thanks for reading

Expand full comment