Blog

WarpSpeed and the Need for Artificial Expert Intelligence

WarpSpeed showcases Artificial Expert Intelligence as a solution to the expertise bottleneck. Machines can surpass human experts even in data-scarce, hard-to-validate, deeply technical domains.

Published on

March 12, 2026

Share this post

Humanity's progress is gated by experts. Not by compute, not by ambition - experts. Every major field - chip design, GPU performance engineering, molecular modeling, security, robotics, materials - advances only as fast as the very small number of people who truly understand the details. If we want to accelerate science and technology by orders of magnitude, we need systems that can operate at that expert level or beyond.

That is the mission of Artificial Expert Intelligence (AEI).

Wasn't this achieved by AI already?

It's tempting to think so. Today's frontier models win gold medals in IMO, outperform top programmers on Codeforces, and write correct code across the internet's long tail. Surely this is expert level? Not quite. These successes share three hidden prerequisites:

Massive training data.
Easy automatic validation.
Shallow reasoning horizons.

In the place where all three conditions hold, today's AI shines. In the case where any one of these breaks, AI collapses.

‍GPU performance engineering breaks all three.

Consider optimizing GPU kernels for a system like cuGraph - a library written and refined by top NVIDIA engineers for a decade. This domain is the opposite of today's comfortable regime:

Data scarcity. The internet contains only hundreds of optimized CUDA kernels.
Hard-to-validate outputs. Many graph algorithms admit multiple correct answers; correctness can't be determined by a simple comparison. Generally, establishing correctness and measuring performance accurately on GPUs is harder than it appears.
Deep reasoning with many uncertain reasoning steps. Performance comes from long chains of interacting choices: memory layout, warp behavior, frontier structure, caching, scheduling, graph morphology, and more.

As shown in our full technical post, even state-of-the-art agents like Claude Code, Codex, and Gemini CLI fail dramatically on this problem - often producing incorrect implementations even when handed cuGraph's own test suite. This is the expert bottleneck in the wild.

New Ideas are required

To break this barrier, scaling alone is insufficient. We needed new algorithmic ideas, new verification methods, new agentic search structures, and new training signals. Building on:

Our diligent learning framework
Our PAC reasoning methodology
Novel techniques for learning from extremely small datasets

The result is WarpSpeed - our first Artificial Expert System.

WarpSpeed: Superhuman GPU Performance Engineering

We independently deployed WarpSpeed to autonomously rewrite and re-optimize cuGraph’s kernels across three GPU architectures, producing:

3.6× average (geometric mean) speedup over human experts
Faster implementations for 100% of algorithms
2× speedups for 55% of them

WarpSpeed beats a decade of expert-engineered GPU kernels - on every algorithm, on every GPU. This is not the ordinary case of an AI that does well on toy benchmarks. This is AI surpassing world-class engineers on one of the hardest, least data-rich, least automatable software domains we know of.

Why this matters

WarpSpeed is not about cuGraph. cuGraph is simply the proving ground where:

The data is scarce
The stakes are high
The validation is hard
The reasoning depth is extreme
The human baselines are very strong

If an artificial expert system works here, it can work anywhere expertise is the bottleneck. This is the beginning of AEI - not artificial general intelligence, but something humanity arguably needs more urgently:
Systems that can reliably surpass human experts in the domains where expertise is rarest, slowest, and most valuable. WarpSpeed is our first step. More domains will follow.