O3 is a superhuman coder

eAwazURDU NEWS, Variety Vibes

OpenAI just dropped its O3 model as the grand finale to close out 2024.

There’s lots of raw numbers being thrown around without any context, so I just wanted to put some of them into perspective for you:

(1) O3 is a superhuman coder. There’s an online competition called CodeForces. For perspective:

The average Google engineer scores around 940.  O3 scored 2727. This would rank it in the top 200 programmers of all time.  The previous model o1-preview released in Sep 2024 (only 3 months ago!) was at 1258 which barely breaks the top 40,000.

(2) O3 is a supergenius at math. The AIME is a really hard math competition. For perspective:  I studied math at Cornell and Oxford, and was objectively one of the best math students in my class (based on class ranking / GPA / vibes). My AIME score was 30%.  o1-preview’s previous score was 40%.  o3’s score? It blows me out of the water at 96.7%.

(3) But maybe O3 is just a nerd? There’s an organization called the ARC Prize Foundation whose sole mission is to put together problems that are EXPLICITLY “easy” for normal average humans, but are “hard” for AI and computers. The benchmark they put together is called ARC-AGI and is “semi-private” meaning it is hidden from OpenAI / other model companies during training.  Claude 3.5 Sonnet, the best model from Anthropic, scored a measly 14%.  o1-preview scored 18% – only a tiny bit better.  Partially due to these results, many pessimists loved to point to this benchmark as an example of “AI hitting a wall” and “AI is a bubble”.  But o3’s score? A staggering 75.7%.

The average human scores 85%, so AI still has a “little room” to go. But keep in mind that this test was literally designed to have the BIGGEST GAP between humans and computers possible, when trying our VERY BEST to maximize the gap. Yet the gap is rapidly closing and nearing zero. OpenAI’s new O3 model isn’t just “better”—it’s rewriting the rules of what we thought AI could do.

O3 is due to be released late Jan / early Feb 2025. What are you going to do with the supergenius that you’ll have in your pocket?