Researchers from Tsinghua, Ohio State, and UC Berkeley collaborated on AgentBench, a pioneering method assessing large language models (LLMs) like ChatGPT and Claude as real-world agents. Results show top LLMs excel in practical tasks, suggesting potential for potent, continuously learning agents, while acknowledging room for growth among open-source models.