flatreader

A benchmark that you could run locally to test out LLM & AI agents' abilities to do real-world tasks

Points: 3

# Comments: 0