2 Comments
Dec 24, 2023Liked by Cameron R. Wolfe, Ph.D.

It does seem that the CMU results for Mixtral are off.

LMsys' leaderboard has both Mixtral and Gemini Pro comparable to GPT 3.5 Turbo: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard (last edit 20th of December)

For Mixtral, this complies with OpenCompass' recent results (24th): https://github.com/open-compass/MixtralKit

Also according to OpenCompass, Vision-Language of Gemini Pro and GPT 4V are comparable:

https://opencompass.org.cn/leaderboard-multimodal

(though it's unclear what "detail: low" means for GPT 4)

Expand full comment
author

Yep! They also published an updated version of the manuscript to address the issues that were brought up.

Expand full comment