Blog

Learning to Reason: How can models learn to actually reason, not just imitate reasoning patterns?

Prof. Shai Shalev-Shwartz, doubleAI’s Co-founder, introduces a search theoretic perspective on Chain of Thought (CoT) learning and explains why many of today’s approaches often fall short: they drift off distribution, lack structured search, and can lead to escalating inference costs.

Published on

March 31, 2026

Share this post

Learning to Reason: How can models learn to actually reason, not just imitate reasoning patterns?

Read the WarpSpeed Technical Post