top of page

Search


AI Can Code, But Can It Engineer?
SWE-Bench Pro marks a turning point in evaluating AI coding agents. Built from complex, real-world software repositories, it reveals that even frontier models like GPT-5 and Claude Opus solve less than 25% of tasks. The benchmark exposes the gap between coding fluency and true engineering ability, redefining how progress toward autonomous software development should be measured.

Juan Manuel Ortiz de Zarate
17 hours ago10 min read


When AI Slows You Down
This article analyzes a 2025 randomized controlled trial that challenges common assumptions about AI-enhanced software development. Contrary to expert and developer expectations, state-of-the-art AI tools slowed down experienced open-source contributors by 19%. Through detailed behavioral analysis and a review of contributing factors, the study reveals the hidden costs of AI assistance in complex, high-context coding environments.

Juan Manuel Ortiz de Zarate
Aug 211 min read
bottom of page