top of page

Search


How Language Models Learned to Reason
The article explores the paper Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, showing that large language models can perform complex reasoning when prompted to generate intermediate reasoning steps in natural language. By providing examples with explicit “chains of thought,” models learn to decompose problems and significantly improve performance on arithmetic, commonsense, and symbolic reasoning tasks—without fine-tuning or architectural changes.

Juan Manuel Ortiz de Zarate
Feb 1510 min read


What If Reasoning Doesn’t Need Billion-Parameter Models?
Large language models excel at language but often struggle with structured reasoning tasks. This article explores Tiny Recursive Models (TRMs), a radically simpler approach that uses small neural networks with recursive refinement to outperform massive LLMs on puzzles like Sudoku, mazes, and ARC-AGI. By prioritizing iterative reasoning over scale, TRMs show that deep thinking can emerge from minimal architectures, challenging prevailing assumptions about model size and intell

Juan Manuel Ortiz de Zarate
Dec 18, 202510 min read


Teaching Robots to Dance
RoboBallet explores a new approach to multi-robot task and motion planning by combining graph neural networks with reinforcement learning. Instead of decomposing planning into brittle subproblems, the system learns to coordinate multiple robotic arms directly through structured relational reasoning. Trained in simulation and generalizing zero-shot to real workcells, RoboBallet demonstrates how learning-based coordination can scale to industrial environments where classical pl

Juan Manuel Ortiz de Zarate
Dec 13, 202511 min read


When Models Learn to Think Before Painting
This article explores HunyuanImage 3.0, Tencent’s groundbreaking open-source multimodal model that unifies language understanding, visual reasoning, and image generation. It examines the model’s data pipeline, architecture, Chain-of-Thought workflow, and progressive training strategy, showing how HunyuanImage 3.0 achieves state-of-the-art text-to-image performance while enabling richer control, coherence, and creativity.

Juan Manuel Ortiz de Zarate
Dec 6, 20259 min read


Breaking the Amnesia Cycle in Large Sequence Models
Nested Learning reframes neural models as multi-loop systems updating at different frequencies, revealing that depth stacking hides gradient mechanics and limits continual learning. It interprets optimizers like Momentum and Adam as associative gradient memories and introduces CMS for incremental abstraction. The HOPE module combines self-modification, multi-clock updates, and deep contextual compression, offering a white-box path beyond static backbones for long-context and

Juan Manuel Ortiz de Zarate
Nov 27, 20259 min read


Make Neural Circuits Understandable
The article introduces weight-sparse transformers (models where most weights are zero) to make neural circuits interpretable. These models reveal clear, human-understandable algorithms for language tasks. Sparsity trades off raw capability for clarity, allowing researchers to fully trace mechanisms inside networks and bridge them to dense models for transparency in AI reasoning.

Juan Manuel Ortiz de Zarate
Nov 20, 20259 min read


Compute Among the Stars
Google’s Project Suncatcher envisions moving AI computation into orbit, building constellations of solar-powered satellites equipped with TPUs and laser interlinks. By harnessing the Sun’s constant energy and future low-cost launches, the project proposes a scalable, space-based infrastructure for machine learning. It’s a blueprint for computing beyond Earth—where data centers orbit, powered by sunlight instead of fossil grids.

Juan Manuel Ortiz de Zarate
Nov 11, 20259 min read


AI Can Code, But Can It Engineer?
SWE-Bench Pro marks a turning point in evaluating AI coding agents. Built from complex, real-world software repositories, it reveals that even frontier models like GPT-5 and Claude Opus solve less than 25% of tasks. The benchmark exposes the gap between coding fluency and true engineering ability, redefining how progress toward autonomous software development should be measured.

Juan Manuel Ortiz de Zarate
Nov 5, 202510 min read


The AlphaGo Moment of Neural Architecture Design
ASI-ARCH marks a breakthrough in AI self-innovation: an autonomous system that designs, codes, and validates new neural network architectures without human input. Conducting 1,773 experiments, it discovered 106 state-of-the-art models, revealing a scaling law for scientific discovery. Like AlphaGo’s Move 37, ASI-ARCH exposes principles beyond human intuition, signaling a new era where AI invents AI.

Juan Manuel Ortiz de Zarate
Oct 29, 202510 min read


The Lightning Mind
DeepSeek-V3.2-Exp introduces a new sparse-attention system that lets large language models handle ultra-long contexts efficiently. Using a “lightning indexer” to select only the most relevant tokens, it cuts computation costs while preserving reasoning power. The result is a faster, cheaper, and more cognitively elegant AI that learns what to ignore, bringing machine focus closer to human intelligence.

Juan Manuel Ortiz de Zarate
Oct 22, 20259 min read


Inference at Scale
This article explores how to optimize large language model inference at scale, detailing techniques such as quantization, pruning, distillation, attention and cache optimization, speculative decoding, and dynamic batching. It explains the architectural bottlenecks, trade-offs, and engineering practices that enable faster, cheaper, and more efficient deployment of LLMs in real-world systems.

Juan Manuel Ortiz de Zarate
Oct 8, 20259 min read


Training-Efficient RL
Inefficient Reinforcement Fine-tuning (RFT) relies on heuristic metrics. The GAIN-RL framework utilizes angle concentration, an intrinsic model signal from token hidden states, which correlates directly with gradient strength and learning capacity. GAIN-RL dynamically selects data for consistently impactful updates. It achieves over 2.5x training acceleration and superior performance using only half the original data.

Juan Manuel Ortiz de Zarate
Oct 3, 202510 min read


Building Secure AI Agents
LlamaFirewall is an open-source, system-level guardrail system designed to mitigate critical security risks in autonomous AI agents, such as prompt injection, goal misalignment, and insecure code generation. Serving as a final layer of defense, it employs three core guardrails: **PromptGuard 2** detects direct jailbreaks, **AlignmentCheck** audits agent chain-of-thought for subtle misalignment and indirect injections, and CodeShield performs fast, real-time static analysis to

Juan Manuel Ortiz de Zarate
Sep 26, 202510 min read


Understanding the ChatGPT Revolution
ChatGPT, adopted by 10% of adults globally, now sees over 70% non-work usage. Dominant topics include practical guidance, info seeking, and writing, with writing prominent in work. It offers significant value in decision support. The gender gap in usage has narrowed, and growth is high in lower-income countries. This was analyzed using privacy-preserving methods on billions of messages.

Juan Manuel Ortiz de Zarate
Sep 18, 202511 min read


Unveiling the Enigma of AI Hallucinations
Large Language Models hallucinate because training and evaluation reward guessing over admitting uncertainty. Errors stem statistically from pretraining (binary classification). They persist as most post-training evaluations use binary scoring, penalizing "I don't know" responses and incentivizing confident falsehoods. The proposed solution is a socio-technical modification: adjust existing benchmarks with explicit confidence targets to foster more trustworthy AI by rewardin

Juan Manuel Ortiz de Zarate
Sep 11, 202512 min read


The Checklist Shortcut to Smarter, Safer AI
This article explores Reinforcement Learning from Checklist Feedback (RLCF), a new approach that replaces reward models with checklists to align large language models. By breaking instructions into clear, verifiable steps, checklists provide richer, more interpretable feedback and consistently improve performance across benchmarks. The piece examines how this shift could make AI more reliable, transparent, and user-aligned.

Juan Manuel Ortiz de Zarate
Sep 4, 202512 min read


The Flattering Machine
This article explores Social Sycophancy, a broader form of flattery in large language models where systems preserve users’ self-image rather than offer balanced guidance. Building on Goffman’s face theory, it introduces the ELEPHANT framework to measure emotional validation, moral endorsement, indirectness, and framing acceptance. Findings show LLMs are far more sycophantic than humans, raising risks for users, society, and developers, and calling for new safeguards.

Juan Manuel Ortiz de Zarate
Aug 29, 20259 min read


Adventuring with AI: What Classic Games Teach Us About Modern Models
TextQuests introduces a benchmark built on 25 Infocom text-based adventure games to evaluate LLMs in dynamic, exploratory environments. Unlike static benchmarks, it tests long-context reasoning, trial-and-error learning, and ethical decision-making without external tools. Results show that even advanced models like GPT-5 struggle with sustained strategy, highlighting current limits in autonomy, memory, and adaptive reasoning

Juan Manuel Ortiz de Zarate
Aug 22, 202510 min read


Language-Driven Precision in the Operating Room
The Hierarchical Surgical Robot Transformer (SRT-H) brings step-level autonomy to surgery by combining a language-driven high-level planner with a vision-guided low-level executor. Trained on over 16,000 demonstrations, it completed the clipping-and-cutting phase of gallbladder removal with 100% success in ex-vivo trials, adapting to variations and self-correcting without human intervention—marking a milestone toward clinically viable autonomous surgery.

Juan Manuel Ortiz de Zarate
Aug 13, 202510 min read


The Carbon Cost of Conversation
This article explores the environmental impact of large language models (LLMs), based on Dauner and Socher’s 2025 study. By analyzing 14 models across reasoning tasks, it reveals a trade-off between accuracy and CO₂ emissions. Larger models and reasoning modes achieve higher performance but drastically increase energy use due to verbose outputs. The findings highlight the urgent need for optimizing reasoning efficiency and integrating sustainability into AI development.

Juan Manuel Ortiz de Zarate
Aug 7, 202510 min read
bottom of page