OpenAI made headlines last month when its latest experimental chatbot model, o3, achieved a high score on a test that marks progress towards artificial general intelligence (AGI). OpenAI’s o3 scored 87.5%, trouncing the previous best score for an artificial intelligence (AI) system of 55.5%.
So what does this mean? How smart is AI these days, and what are the tests that researchers are developing to measure that?
My story for Nature: https://www.nature.com/articles/d41586-025-00110-6