Breaking down the capabilities of Google's highly anticipated OpenAI competitor...
It does seem that the CMU results for Mixtral are off.
LMsys' leaderboard has both Mixtral and Gemini Pro comparable to GPT 3.5 Turbo: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard (last edit 20th of December)
For Mixtral, this complies with OpenCompass' recent results (24th): https://github.com/open-compass/MixtralKit
Also according to OpenCompass, Vision-Language of Gemini Pro and GPT 4V are comparable:
https://opencompass.org.cn/leaderboard-multimodal
(though it's unclear what "detail: low" means for GPT 4)
Yep! They also published an updated version of the manuscript to address the issues that were brought up.
It does seem that the CMU results for Mixtral are off.
LMsys' leaderboard has both Mixtral and Gemini Pro comparable to GPT 3.5 Turbo: https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard (last edit 20th of December)
For Mixtral, this complies with OpenCompass' recent results (24th): https://github.com/open-compass/MixtralKit
Also according to OpenCompass, Vision-Language of Gemini Pro and GPT 4V are comparable:
https://opencompass.org.cn/leaderboard-multimodal
(though it's unclear what "detail: low" means for GPT 4)
Yep! They also published an updated version of the manuscript to address the issues that were brought up.