ChatGPT brings AI to the mainstream
2022-11ChatGPT reached 100M users in two months — the fastest-adopted app to date and AI's consumer inflection point.
How fast is frontier AI scaling — and how close to general capability?
The single most intuitive view — current position against the end goal, on a log scale.
Standings by actor, within this field only.
Clear-cut events: crossed or not crossed.
ChatGPT reached 100M users in two months — the fastest-adopted app to date and AI's consumer inflection point.
GPT-4 was the first model at the 1e25 FLOP scale; over 30 models from 12 developers have since crossed it.
o3 scored 76–88% on ARC-AGI-1 (human ~85%) — the first AI to move beyond memorization on it.
A harder successor — still easy for humans, hard for AI — resetting the abstraction frontier as v1 saturated.
The first fully interactive ARC benchmark: hand-built game environments with no instructions — agents must discover the rules. At launch every frontier model scored <1% (best 0.37%) while humans solve them all; $2M+ prize pool, results Dec 2026.
Largest models crossed 1e26 FLOP — a 10× jump over GPT-4, with compute still growing ~4–5× per year.
DeepSeek-R1, an openly released RL-trained reasoning model, matched leading closed models on math and coding — triggering a market reckoning over AI capex.
DeepSeek's 1.6T-parameter V4 runs on Huawei Ascend (950PR), and a Huawei-led team completed full-parameter post-training on ~1,000 Ascend 910Cs — a compute-sovereignty landmark. Pre-training hardware remains undisclosed, so "trained without Nvidia" is NOT established.
OpenAI confirmed a confidential S-1 draft with the SEC (8 Jun 2026) — last valued at $852B after a $122B round, with $25B+ annualized revenue. No timing set; reports point to a possible Sep–Nov window. Would be the defining AI listing.
As models saturated existing tests, a 2,500-question expert exam launched on which frontier models initially scored in the single digits — a fresh yardstick for the distance to general capability.
Anthropic's Claude Opus 4 launched with extended thinking and sustained autonomous coding over long tasks — part of a 2025 shift where reasoning/agentic models, not raw scale alone, drove the frontier.
Anthropic released Claude Fable 5 — a Mythos-class model exceeding any it had made generally available — gated so ~5% of sensitive (e.g. cyber) sessions get a conservatively-tuned model, while the unrestricted Mythos 5 went only to vetted cyberdefenders via Project Glasswing with the US government. Days later the US Commerce Department export-controlled both models, barring all foreign-national access; unable to enforce that selectively in real time, Anthropic shut Fable 5 and Mythos 5 off worldwide (its other models unaffected) — the first time a deployed frontier AI model was export-controlled like a strategic technology.
A system matching humans across most economically valuable tasks — definition contested, and not here yet.
Every figure links to a primary source. We publish no invented scores. Tracker numbers are neutral; analysis is labelled separately.