All fields

Frontier AI

How fast is frontier AI scaling — and how close to general capability?

Live Updated 2026-06

Distance to goal

The single most intuitive view — current position against the end goal, on a log scale.

Largest training compute

Metric detail
now
1×10²⁶
frontier labs
goal
~0.0 orders of magnitude to go

Headline indicators

Who leads

Standings by actor, within this field only.

Training compute

Metric detail
Google DeepMindGemini Ultra (Epoch est.)5×10²⁵
Meta AILlama 3.1 405B (2024)3.8×10²⁵
OpenAIGPT-4 (2023)2.1×10²⁵
DeepSeekDeepSeek-V3, efficient (2024)3.4×10²⁴

ARC-AGI score

Metric detail
OpenAIo3, high-compute (2024)88%

Private investment

Metric detail
United States$109.1B (2024)$109.1B
China$9.3B (2024)$9.3B

Milestone timeline

Clear-cut events: crossed or not crossed.

11 achieved 1 in progress 1 locked

ChatGPT brings AI to the mainstream

2022-11
OpenAI

ChatGPT reached 100M users in two months — the fastest-adopted app to date and AI's consumer inflection point.

First model trained at 1e25 FLOP

2023-03
OpenAI (GPT-4)

GPT-4 was the first model at the 1e25 FLOP scale; over 30 models from 12 developers have since crossed it.

AI beats the ARC-AGI abstraction test

2024-12
OpenAI (o3)

o3 scored 76–88% on ARC-AGI-1 (human ~85%) — the first AI to move beyond memorization on it.

Goalposts move: ARC-AGI-2 launches

2025-03
ARC Prize

A harder successor — still easy for humans, hard for AI — resetting the abstraction frontier as v1 saturated.

ARC-AGI-3 — first interactive benchmark; AI under 1%

2026-03
ARC Prize

The first fully interactive ARC benchmark: hand-built game environments with no instructions — agents must discover the rules. At launch every frontier model scored <1% (best 0.37%) while humans solve them all; $2M+ prize pool, results Dec 2026.

Frontier training compute passes 1e26 FLOP

2025
frontier labs

Largest models crossed 1e26 FLOP — a 10× jump over GPT-4, with compute still growing ~4–5× per year.

Open reasoning model rivals the frontier — at a fraction of the cost

2025-01
DeepSeek (R1)

DeepSeek-R1, an openly released RL-trained reasoning model, matched leading closed models on math and coding — triggering a market reckoning over AI capex.

First frontier model shipped on domestic Chinese silicon

2026-04
DeepSeek / Huawei

DeepSeek's 1.6T-parameter V4 runs on Huawei Ascend (950PR), and a Huawei-led team completed full-parameter post-training on ~1,000 Ascend 910Cs — a compute-sovereignty landmark. Pre-training hardware remains undisclosed, so "trained without Nvidia" is NOT established.

OpenAI files confidentially for IPO

~ 2026-06
OpenAI

OpenAI confirmed a confidential S-1 draft with the SEC (8 Jun 2026) — last valued at $852B after a $122B round, with $25B+ annualized revenue. No timing set; reports point to a possible Sep–Nov window. Would be the defining AI listing.

A benchmark built to resist saturation: Humanity's Last Exam

2025-01
CAIS · Scale AI

As models saturated existing tests, a 2,500-question expert exam launched on which frontier models initially scored in the single digits — a fresh yardstick for the distance to general capability.

Reasoning & agentic coding becomes the frontier (Claude Opus 4)

2025-05
Anthropic (Claude Opus 4)

Anthropic's Claude Opus 4 launched with extended thinking and sustained autonomous coding over long tasks — part of a 2025 shift where reasoning/agentic models, not raw scale alone, drove the frontier.

Tiered safety deployment of a frontier model (Fable 5 / Mythos 5)

2026-06
Anthropic

Anthropic released Claude Fable 5 — a Mythos-class model exceeding any it had made generally available — gated so ~5% of sensitive (e.g. cyber) sessions get a conservatively-tuned model, while the unrestricted Mythos 5 went only to vetted cyberdefenders via Project Glasswing with the US government. Days later the US Commerce Department export-controlled both models, barring all foreign-national access; unable to enforce that selectively in real time, Anthropic shut Fable 5 and Mythos 5 off worldwide (its other models unaffected) — the first time a deployed frontier AI model was export-controlled like a strategic technology.

AGI — broadly human-level capability

~ 20??

A system matching humans across most economically valuable tasks — definition contested, and not here yet.

Related news & analysis

i

Every figure links to a primary source. We publish no invented scores. Tracker numbers are neutral; analysis is labelled separately.