WarpSpeed and the Need for Artificial Expert Intelligence
WarpSpeed showcases Artificial Expert Intelligence as a solution to expertise bottleneck. Machines can surpass human experts even in data-scarce, hard-to-validate, deeply technical domains.

Read the WarpSpeed Technical Post
Humanity's progress is gated by experts. Not by compute, not by ambition - experts. Every major field - chip design, GPU performance engineering, molecular modeling, security, robotics, materials - advances only as fast as the very small number of people who truly understand the details. If we want to accelerate science and technology by orders of magnitude, we need systems that can operate at that expert level or beyond.
That is the mission of Artificial Expert Intelligence (AEI).
Wasn't this achieved by AI already?
It's tempting to think so. Today's frontier models win gold medals in IMO, outperform top programmers on Codeforces, and write correct code across the internet's long tail. Surely this is expert level? Not quite. These successes share three hidden prerequisites:
- Massive training data.
- Easy automatic validation.
- Shallow reasoning horizons.
In the place where all three conditions hold, today's AI shines. In the case where any one of these breaks, AI collapses.
GPU performance engineering breaks all three.
Consider optimizing GPU kernels for a system like cuGraph - a library written and refined by top NVIDIA engineers for a decade. This domain is the opposite of today's comfortable regime:
- Data scarcity. The internet contains only hundreds of optimized CUDA kernels.
- Hard-to-validate outputs. Many graph algorithms admit multiple correct answers; correctness can't be determined by a simple comparison. Generally, establishing correctness and measuring performance accurately on GPUs is harder than it appears.
- Deep reasoning with many uncertain reasoning steps. Performance comes from long chains of interacting choices: memory layout, warp behavior, frontier structure, caching, scheduling, graph morphology, and more.
As shown in our full technical post, even state-of-the-art agents like Claude Code, Codex, and Gemini CLI fail dramatically on this problem - often producing incorrect implementations even when handed cuGraph's own test suite. This is the expert bottleneck in the wild.
New Ideas are required
To break this barrier, scaling alone is insufficient. We needed new algorithmic ideas, new verification methods, new agentic search structures, and new training signals. Building on:
- Our diligent learning framework
- Our PAC reasoning methodology
- Novel techniques for learning from extremely small datasets
The result is WarpSpeed - our first Artificial Expert System.
WarpSpeed: Superhuman GPU Performance Engineering
We independently deployed WarpSpeed to autonomously rewrite and re-optimize cuGraph’s kernels across three GPU architectures, producing:
- 3.6× average (geometric mean) speedup over human experts
- Faster implementations for 100% of algorithms
- 2× speedups for 55% of them
WarpSpeed beats a decade of expert-engineered GPU kernels - on every algorithm, on every GPU. This is not the ordinary case of an AI that does well on toy benchmarks. This is AI surpassing world-class engineers on one of the hardest, least data-rich, least automatable software domains we know of.
Why this matters
WarpSpeed is not about cuGraph. cuGraph is simply the proving ground where:
- The data is scarce
- The stakes are high
- The validation is hard
- The reasoning depth is extreme
- The human baselines are very strong
If an artificial expert system works here, it can work anywhere expertise is the bottleneck. This is the beginning of AEI - not artificial general intelligence, but something humanity arguably needs more urgently:
Systems that can reliably surpass human experts in the domains where expertise is rarest, slowest, and most valuable. WarpSpeed is our first step. More domains will follow.