Reinforcement Learning | Transcendent AI

Teaching Robots to Dance

RoboBallet explores a new approach to multi-robot task and motion planning by combining graph neural networks with reinforcement learning. Instead of decomposing planning into brittle subproblems, the system learns to coordinate multiple robotic arms directly through structured relational reasoning. Trained in simulation and generalizing zero-shot to real workcells, RoboBallet demonstrates how learning-based coordination can scale to industrial environments where classical pl

Juan Manuel Ortiz de Zarate

Dec 13, 202511 min read

The Lightning Mind

DeepSeek-V3.2-Exp introduces a new sparse-attention system that lets large language models handle ultra-long contexts efficiently. Using a “lightning indexer” to select only the most relevant tokens, it cuts computation costs while preserving reasoning power. The result is a faster, cheaper, and more cognitively elegant AI that learns what to ignore, bringing machine focus closer to human intelligence.

Juan Manuel Ortiz de Zarate

Oct 22, 20259 min read

Training-Efficient RL

Inefficient Reinforcement Fine-tuning (RFT) relies on heuristic metrics. The GAIN-RL framework utilizes angle concentration, an intrinsic model signal from token hidden states, which correlates directly with gradient strength and learning capacity. GAIN-RL dynamically selects data for consistently impactful updates. It achieves over 2.5x training acceleration and superior performance using only half the original data.

Juan Manuel Ortiz de Zarate

Oct 3, 202510 min read

The Checklist Shortcut to Smarter, Safer AI

This article explores Reinforcement Learning from Checklist Feedback (RLCF), a new approach that replaces reward models with checklists to align large language models. By breaking instructions into clear, verifiable steps, checklists provide richer, more interpretable feedback and consistently improve performance across benchmarks. The piece examines how this shift could make AI more reliable, transparent, and user-aligned.

Juan Manuel Ortiz de Zarate

Sep 4, 202512 min read

Training Harmless AI at Scale

This article explores Constitutional AI, a framework developed by Anthropic to train AI systems that are helpful, harmless, and non-evasive—without relying on human labels for harmfulness. By guiding models through critique–revision loops and reinforcement learning from AI-generated feedback, this method offers a scalable, transparent alternative to RLHF and advances the field of AI alignment and self-supervised safety

Juan Manuel Ortiz de Zarate

May 8, 202511 min read

DeepSeek, the game-changing model

DeepSeek R1 enhances AI reasoning with reinforcement learning and distillation, achieving top-tier performance while maintaining efficiency

Juan Manuel Ortiz de Zarate

Jan 31, 20259 min read

Introduction to Reinforcement Learning

Master the magic of reinforcement learning! See how AI learns to make decisions from scratch and optimizes actions.

Juan Manuel Ortiz de Zarate

Jun 7, 202410 min read