How should we test AI for human-level intelligence? OpenAI’s o3 electrifies quest

OpenAI made headlines last month when its latest experimental chatbot model, o3, achieved a high score on a test that marks progress towards artificial general intelligence (AGI). OpenAI’s o3 scored 87.5%, trouncing the previous best score for an artificial intelligence (AI) system of 55.5%.

So what does this mean? How smart is AI these days, and what are the tests that researchers are developing to measure that?

My story for Nature: https://www.nature.com/articles/d41586-025-00110-6

Search This Blog

Nicola Jones

How should we test AI for human-level intelligence? OpenAI’s o3 electrifies quest

Comments

Post a Comment