Discussion about this post

User's avatar
Joris C.'s avatar

It does seem that the CMU results for Mixtral are off.

LMsys' leaderboard has both Mixtral and Gemini Pro comparable to GPT 3.5 Turbo: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard (last edit 20th of December)

For Mixtral, this complies with OpenCompass' recent results (24th): https://github.com/open-compass/MixtralKit

Also according to OpenCompass, Vision-Language of Gemini Pro and GPT 4V are comparable:

https://opencompass.org.cn/leaderboard-multimodal

(though it's unclear what "detail: low" means for GPT 4)

Expand full comment
1 more comment...

No posts