2 Comments

It does seem that the CMU results for Mixtral are off.

LMsys' leaderboard has both Mixtral and Gemini Pro comparable to GPT 3.5 Turbo: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard (last edit 20th of December)

For Mixtral, this complies with OpenCompass' recent results (24th): https://github.com/open-compass/MixtralKit

Also according to OpenCompass, Vision-Language of Gemini Pro and GPT 4V are comparable:

https://opencompass.org.cn/leaderboard-multimodal

(though it's unclear what "detail: low" means for GPT 4)

Expand full comment

Yep! They also published an updated version of the manuscript to address the issues that were brought up.

Expand full comment