Finetuning LLM Judges for Evaluation

Cameron R. Wolfe, Ph.D.

Dec 2, 2024

The Prometheus suite, JudgeLM, PandaLM, AutoJ, and more...

Read →

4 Comments

DATUMO

Dec 6, 2024

Really enjoyed this take on fine-tuned judges and how they can improve AI evaluation!

It’s so cool to see new ideas for going beyond standard metrics to really understand model performance.

At DATUMO, we’ve been exploring ways to make LLM evaluations more reliable too—it’s such an exciting space to be in. Looking forward to more posts like this!

<a href="https://datumo.com" target="_blank">DATUMO</a>

Expand full comment

Ryan Callihan

Dec 5, 2024

This is probably the best overview of LLMs as Judges I have seen.

It made me realise that I haven’t seen a lot of new evaluation metrics / models / strategies since Prometheus.

Expand full comment

Reply (1)

Cameron R. Wolfe, Ph.D.

Dec 5, 2024

Agree! This topic is still under explored!

Expand full comment

Rohit Singh

Dec 26, 2024

Wow, this is the most detailed explanation of LLMs as judges I've found online! Thanks so much for putting it together. Would love to see a similar article or series on AI agents - that'd be amazing! ❤️🫡

Expand full comment

Deep (Learning) Focus

Finetuning LLM Judges for Evaluation