top of page

Search


Adventuring with AI: What Classic Games Teach Us About Modern Models
TextQuests introduces a benchmark built on 25 Infocom text-based adventure games to evaluate LLMs in dynamic, exploratory environments. Unlike static benchmarks, it tests long-context reasoning, trial-and-error learning, and ethical decision-making without external tools. Results show that even advanced models like GPT-5 struggle with sustained strategy, highlighting current limits in autonomy, memory, and adaptive reasoning

Juan Manuel Ortiz de Zarate
Aug 2310 min read


A Foundation for Agent Collaboration
This article explores the Model Context Protocol (MCP), a standardized interface that enables AI agents to dynamically discover and invoke external tools. It covers MCP’s architecture, real-world applications, and security risks across its lifecycle. By decoupling tool logic from AI behavior, MCP empowers agents to perform complex workflows with greater flexibility, setting a foundation for the next generation of tool-integrated AI systems.

Juan Manuel Ortiz de Zarate
Jul 259 min read


Misaligned Intelligence
This article explores the concept of agentic misalignment in large language models, based on Anthropic's 2025 study. Through the “Summit Bridge” simulation, it reveals how advanced AIs can adopt deceptive, coercive strategies when facing threats to their objectives. The piece analyzes experimental results, ethical implications, mitigation strategies, and the broader risks of deploying increasingly autonomous AI systems without robust safeguards.

Juan Manuel Ortiz de Zarate
Jul 1710 min read


AI Against Racism
This article explores how an open-source AI system helped Santa Clara County identify and redact thousands of racially restrictive covenants buried in millions of historical property deeds. By fine-tuning a legal-specific language model, the project achieved near-perfect accuracy while cutting costs dramatically. The work demonstrates how AI can support legal reform, scale archival justice, and preserve public accountability.

Juan Manuel Ortiz de Zarate
Jul 410 min read


The Illusion of Thinking: Understanding Reasoning Models in AI
This article explores the limits of reasoning in large language models, revealing how their apparent intelligence breaks down under increasing complexity. Using controlled puzzle environments, it analyzes their “thinking traces” and uncovers patterns of overthinking, execution failures, and lack of adaptability. The findings raise critical questions for building AI systems capable of genuine reasoning.

Juan Manuel Ortiz de Zarate
Jun 2610 min read


Foundation Models
Foundation models like GPT-3 and CLIP are reshaping AI by enabling general-purpose systems trained on massive, unlabelled data. This article explores their key concepts—emergence and homogenization—their capabilities across language, vision, and more, and the risks they pose, from bias to environmental impact. Based on the Stanford report, it highlights why foundation models are powerful, unpredictable, and demand responsible development.

Juan Manuel Ortiz de Zarate
May 79 min read


How Bigger Models Get Better
This article explores the groundbreaking findings of Kaplan et al. on scaling laws for neural language models. It explains how model performance improves predictably with increased model size, dataset size, and compute budget, highlighting power-law relationships. The piece discusses implications for efficient AI training, optimal resource allocation, overfitting avoidance, and future research directions.

Juan Manuel Ortiz de Zarate
Apr 3010 min read


How AI is Transforming Science and Medicine
This article explores how AI is transforming science and medicine in 2025. From breakthroughs in protein engineering and brain mapping to outperforming doctors in clinical diagnosis, AI is becoming an active research partner and clinical assistant. It highlights key findings from Stanford’s AI Index Report, including the rise of virtual labs, predictive healthcare models, AI scribes, and the importance of ethical, inclusive, and regulated deployment.

Juan Manuel Ortiz de Zarate
Apr 1511 min read


Tech Titans Turn to Atomic Power to Fuel the Future
Tech giants turn to nuclear energy to power AI, tackling rising energy demands and environmental impact with bold new strategies.

Juan Manuel Ortiz de Zarate
Mar 3010 min read


Diffusion Models: From Noise to Masterpiece
Explore how Diffusion Models are revolutionizing generative AI, from their mathematical foundations to applications in image and audio.

Juan Manuel Ortiz de Zarate
Mar 208 min read


The Brains Behind AI’s Evolution
Discover how neural networks power modern AI, from deep learning to generative models, shaping the future of technology and innovation.

Juan Manuel Ortiz de Zarate
Mar 149 min read


Diffusion LLM: Closer to Human Thought
SEDD redefines generative AI with human-like reasoning, enabling faster, high-quality text and code through discrete diffusion models.

Juan Manuel Ortiz de Zarate
Mar 79 min read


Benchmarking AI Across Disciplines
SuperGPQA evaluates LLMs across 285 disciplines with 26,529 questions, testing their reasoning and knowledge beyond traditional fields.

Juan Manuel Ortiz de Zarate
Feb 269 min read


AI That Thinks Before It Speaks
Optimizing AI reasoning with adaptive test-time computation using recurrent depth transformers for smarter, efficient problem-solving.

Juan Manuel Ortiz de Zarate
Feb 199 min read


Saving AI from Itself: How to Prevent Model Collapse
Active Inheritance curates synthetic data to control LLM behavior, preventing AI model collapse and improving diversity, safety, and bias.

Juan Manuel Ortiz de Zarate
Feb 68 min read


DeepSeek, the game-changing model
DeepSeek R1 enhances AI reasoning with reinforcement learning and distillation, achieving top-tier performance while maintaining efficiency

Juan Manuel Ortiz de Zarate
Jan 319 min read


Orca: The New LLM Teacher
Orca 2: A smaller AI model that rivals larger ones by mastering task-specific reasoning, achieving high performance with less computation.

Juan Manuel Ortiz de Zarate
Oct 9, 20249 min read


AI Researchers
AI Scientist automates research, generating ideas, running experiments, and writing papers, challenging AI's role in novel scientific discov

Juan Manuel Ortiz de Zarate
Aug 27, 20249 min read


Understanding what the ML models have learned
Models could spread bias and discrimination if you don't know what they have learned. Here we show a technique to prevent it.

Juan Manuel Ortiz de Zarate
Aug 2, 202410 min read


Biases in LLMs
Explore the hidden biases in LLMs and their impact. The opinions of which sector of society are reflected in them?

Juan Manuel Ortiz de Zarate
Jul 17, 202410 min read
bottom of page