Discussion about this post

User's avatar
ToxSec's avatar

Incredibly in-depth article on this subject. I feel like i can re-read this a few times to fully get all the useful information here.

Hodman Murad's avatar

This is very cool and very needed. It's important that we design agent evaluations that don't accidentally reward cheating

3 more comments...

No posts

Ready for more?