Exploiting the most prominent AI agent benchmarks

Article URL: https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/

Comments URL: https://news.ycombinator.com/item?id=47733217

Points: 462

# Comments: 114