Mixture-of-Experts (MoE): The Birth and Rise…

Cameron R. Wolfe, Ph.D.

Mar 18, 2024

What two decades of research taught us about sparse language models...

Read →

16 Comments

CodeCompass

Apr 7, 2024

Thanks for this fantastically detailed write-up!

Since I am from a computer vision background I have seen MoEs used for a different modality than text. I have seen them being used "conditionally fuse" information based on the "quality and content" of various inputs.

Imagine a CNN that does semantic segmentation of a scene with multi-modal inputs such as RGB images, infrared image, etc. The model then learns to "weigh" the output of each modality branch. The weighting is conditioned on the inputs. So if the RGB image is washed out due to high exposure because your RGB camera is facing the sun, the model can give the RGB branch a lower weight and prefer information from other branches to produce the segmentation mask output.

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Apr 8, 2024

Yep totally! In general, language models don't have a clear analysis showing that certain experts specialize in certain skills, but my gut feeling is that you can use analysis similar to the paper below to find some type of specialization.

https://transformer-circuits.pub/2023/monosemantic-features

Expand full comment

Reply (1)

CodeCompass

Apr 9, 2024

That is interesting! Thanks for sharing the relevant paper. :)

Expand full comment

Michael

Mar 28, 2024

Cameron, thiscwas a really excellent overview- shows your impressive command of the materials. Would love to see a book by you on the topic.

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Mar 28, 2024

Thanks for the kind words. I might try to write a book in the future once I build up enough content on the newsletter to serve as a starting point :)

Expand full comment

Daniel Situnayake

Mar 21, 2024

Absolutely fantastic article, thank you!

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Mar 21, 2024

Glad you liked it! Thanks for reading

Expand full comment

Chirag Bansal

Mar 20, 2024

Thank you for sharing this

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Mar 20, 2024

Of course! Thank you for reading 🙂

Expand full comment

Reply (1)

Chirag Bansal

Mar 22, 2024

Great work

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Mar 22, 2024

Thanks! Thank you for reading

Expand full comment

Fermin Blanco

Mar 19, 2024

Great!

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Mar 19, 2024

Thanks!

Expand full comment

Roy Barton

Mar 18, 2024

This was great for me, thanks Cameron, you went ALL out! I’ve invested (year ago) in an MOE network called BitTensor and thought I knew this. Did not but do now. I’m not exactly sure if they’re MOE still, are you familiar with this network, and if so any thoughts on the mechanism underlying it? There are a number of very highly qualified AI groups building on it. I would like to as well but haven’t learnt enough yet.

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Mar 18, 2024

I'm not familiar with BitTensor, but it looks interesting!

Expand full comment

Roy Barton

Mar 19, 2024

Yes, very much so, if you find it compelling enough and are available - I’m looking for an AI consultant to hire regarding building my position on the network (fine-tuning, host models).

Expand full comment

Deep (Learning) Focus

Mixture-of-Experts (MoE): The Birth and Rise…