Really enjoyed this take on fine-tuned judges and how they can improve AI evaluation!
It’s so cool to see new ideas for going beyond standard metrics to really understand model performance.
At DATUMO, we’ve been exploring ways to make LLM evaluations more reliable too—it’s such an exciting space to be in. Looking forward to more posts like this!
Really enjoyed this take on fine-tuned judges and how they can improve AI evaluation!
It’s so cool to see new ideas for going beyond standard metrics to really understand model performance.
At DATUMO, we’ve been exploring ways to make LLM evaluations more reliable too—it’s such an exciting space to be in. Looking forward to more posts like this!
<a href="https://datumo.com" target="_blank">DATUMO</a>
This is probably the best overview of LLMs as Judges I have seen.
It made me realise that I haven’t seen a lot of new evaluation metrics / models / strategies since Prometheus.
Agree! This topic is still under explored!