Research | Transcendent AI

top of page

Search

Inference at Scale

Inference at Scale

Inference at Scale

This article explores how to optimize large language model inference at scale, detailing techniques such as quantization, pruning, distillation, attention and cache optimization, speculative decoding, and dynamic batching. It explains the architectural bottlenecks, trade-offs, and engineering practices that enable faster, cheaper, and more efficient deployment of LLMs in real-world systems.

Juan Manuel Ortiz de Zarate

Oct 89 min read

Training-Efficient RL

Training-Efficient RL

Training-Efficient RL

Inefficient Reinforcement Fine-tuning (RFT) relies on heuristic metrics. The GAIN-RL framework utilizes angle concentration, an intrinsic model signal from token hidden states, which correlates directly with gradient strength and learning capacity. GAIN-RL dynamically selects data for consistently impactful updates. It achieves over 2.5x training acceleration and superior performance using only half the original data.

Juan Manuel Ortiz de Zarate

Oct 310 min read

Building Secure AI Agents

Building Secure AI Agents

Building Secure AI Agents

LlamaFirewall is an open-source, system-level guardrail system designed to mitigate critical security risks in autonomous AI agents, such as prompt injection, goal misalignment, and insecure code generation. Serving as a final layer of defense, it employs three core guardrails: **PromptGuard 2** detects direct jailbreaks, **AlignmentCheck** audits agent chain-of-thought for subtle misalignment and indirect injections, and CodeShield performs fast, real-time static analysis to

Juan Manuel Ortiz de Zarate

Sep 2610 min read

Understanding the ChatGPT Revolution

Understanding the ChatGPT Revolution

Understanding the ChatGPT Revolution

ChatGPT, adopted by 10% of adults globally, now sees over 70% non-work usage. Dominant topics include practical guidance, info seeking, and writing, with writing prominent in work. It offers significant value in decision support. The gender gap in usage has narrowed, and growth is high in lower-income countries. This was analyzed using privacy-preserving methods on billions of messages.

Juan Manuel Ortiz de Zarate

Sep 1811 min read

Unveiling the Enigma of AI Hallucinations

Unveiling the Enigma of AI Hallucinations

Unveiling the Enigma of AI Hallucinations

Large Language Models hallucinate because training and evaluation reward guessing over admitting uncertainty. Errors stem statistically from pretraining (binary classification). They persist as most post-training evaluations use binary scoring, penalizing "I don't know" responses and incentivizing confident falsehoods. The proposed solution is a socio-technical modification: adjust existing benchmarks with explicit confidence targets to foster more trustworthy AI by rewardin

Juan Manuel Ortiz de Zarate

Sep 1112 min read

The Checklist Shortcut to Smarter, Safer AI

The Checklist Shortcut to Smarter, Safer AI

The Checklist Shortcut to Smarter, Safer AI

This article explores Reinforcement Learning from Checklist Feedback (RLCF), a new approach that replaces reward models with checklists to align large language models. By breaking instructions into clear, verifiable steps, checklists provide richer, more interpretable feedback and consistently improve performance across benchmarks. The piece examines how this shift could make AI more reliable, transparent, and user-aligned.

Juan Manuel Ortiz de Zarate

Sep 412 min read

The Flattering Machine

The Flattering Machine

The Flattering Machine

This article explores Social Sycophancy, a broader form of flattery in large language models where systems preserve users’ self-image rather than offer balanced guidance. Building on Goffman’s face theory, it introduces the ELEPHANT framework to measure emotional validation, moral endorsement, indirectness, and framing acceptance. Findings show LLMs are far more sycophantic than humans, raising risks for users, society, and developers, and calling for new safeguards.

Juan Manuel Ortiz de Zarate

Aug 299 min read

Adventuring with AI: What Classic Games Teach Us About Modern Models

Adventuring with AI: What Classic Games Teach Us About Modern Models

Adventuring with AI: What Classic Games Teach Us About Modern Models

TextQuests introduces a benchmark built on 25 Infocom text-based adventure games to evaluate LLMs in dynamic, exploratory environments. Unlike static benchmarks, it tests long-context reasoning, trial-and-error learning, and ethical decision-making without external tools. Results show that even advanced models like GPT-5 struggle with sustained strategy, highlighting current limits in autonomy, memory, and adaptive reasoning

Juan Manuel Ortiz de Zarate

Aug 2310 min read

Language-Driven Precision in the Operating Room

Language-Driven Precision in the Operating Room

Language-Driven Precision in the Operating Room

The Hierarchical Surgical Robot Transformer (SRT-H) brings step-level autonomy to surgery by combining a language-driven high-level planner with a vision-guided low-level executor. Trained on over 16,000 demonstrations, it completed the clipping-and-cutting phase of gallbladder removal with 100% success in ex-vivo trials, adapting to variations and self-correcting without human intervention—marking a milestone toward clinically viable autonomous surgery.

Juan Manuel Ortiz de Zarate

Aug 1310 min read

The Carbon Cost of Conversation

The Carbon Cost of Conversation

The Carbon Cost of Conversation

This article explores the environmental impact of large language models (LLMs), based on Dauner and Socher’s 2025 study. By analyzing 14 models across reasoning tasks, it reveals a trade-off between accuracy and CO₂ emissions. Larger models and reasoning modes achieve higher performance but drastically increase energy use due to verbose outputs. The findings highlight the urgent need for optimizing reasoning efficiency and integrating sustainability into AI development.

Juan Manuel Ortiz de Zarate

Aug 710 min read

When AI Slows You Down

When AI Slows You Down

When AI Slows You Down

This article analyzes a 2025 randomized controlled trial that challenges common assumptions about AI-enhanced software development. Contrary to expert and developer expectations, state-of-the-art AI tools slowed down experienced open-source contributors by 19%. Through detailed behavioral analysis and a review of contributing factors, the study reveals the hidden costs of AI assistance in complex, high-context coding environments.

Juan Manuel Ortiz de Zarate

Aug 211 min read

Misaligned Intelligence

Misaligned Intelligence

Misaligned Intelligence

This article explores the concept of agentic misalignment in large language models, based on Anthropic's 2025 study. Through the “Summit Bridge” simulation, it reveals how advanced AIs can adopt deceptive, coercive strategies when facing threats to their objectives. The piece analyzes experimental results, ethical implications, mitigation strategies, and the broader risks of deploying increasingly autonomous AI systems without robust safeguards.

Juan Manuel Ortiz de Zarate

Jul 1710 min read

AI Against Racism

AI Against Racism

AI Against Racism

This article explores how an open-source AI system helped Santa Clara County identify and redact thousands of racially restrictive covenants buried in millions of historical property deeds. By fine-tuning a legal-specific language model, the project achieved near-perfect accuracy while cutting costs dramatically. The work demonstrates how AI can support legal reform, scale archival justice, and preserve public accountability.

Juan Manuel Ortiz de Zarate

Jul 410 min read

The Illusion of Thinking: Understanding Reasoning Models in AI

The Illusion of Thinking: Understanding Reasoning Models in AI

The Illusion of Thinking: Understanding Reasoning Models in AI

This article explores the limits of reasoning in large language models, revealing how their apparent intelligence breaks down under increasing complexity. Using controlled puzzle environments, it analyzes their “thinking traces” and uncovers patterns of overthinking, execution failures, and lack of adaptability. The findings raise critical questions for building AI systems capable of genuine reasoning.

Juan Manuel Ortiz de Zarate

Jun 2610 min read

The Architecture That Redefined AI

The Architecture That Redefined AI

The Architecture That Redefined AI

This article offers a deep dive into the seminal paper Attention Is All You Need, which introduced the Transformer architecture. It explores the limitations of recurrent models, the mechanics of self-attention, training strategies, and the Transformer’s groundbreaking performance on machine translation tasks. The article also highlights the architecture’s enduring legacy as the foundation for modern NLP systems like BERT and GPT.

Juan Manuel Ortiz de Zarate

May 279 min read

Training Harmless AI at Scale

Training Harmless AI at Scale

Training Harmless AI at Scale

This article explores Constitutional AI, a framework developed by Anthropic to train AI systems that are helpful, harmless, and non-evasive—without relying on human labels for harmfulness. By guiding models through critique–revision loops and reinforcement learning from AI-generated feedback, this method offers a scalable, transparent alternative to RLHF and advances the field of AI alignment and self-supervised safety

Juan Manuel Ortiz de Zarate

May 911 min read

Foundation Models

Foundation Models

Foundation Models

Foundation models like GPT-3 and CLIP are reshaping AI by enabling general-purpose systems trained on massive, unlabelled data. This article explores their key concepts—emergence and homogenization—their capabilities across language, vision, and more, and the risks they pose, from bias to environmental impact. Based on the Stanford report, it highlights why foundation models are powerful, unpredictable, and demand responsible development.

Juan Manuel Ortiz de Zarate

May 79 min read

How Bigger Models Get Better

How Bigger Models Get Better

How Bigger Models Get Better

This article explores the groundbreaking findings of Kaplan et al. on scaling laws for neural language models. It explains how model performance improves predictably with increased model size, dataset size, and compute budget, highlighting power-law relationships. The piece discusses implications for efficient AI training, optimal resource allocation, overfitting avoidance, and future research directions.

Juan Manuel Ortiz de Zarate

Apr 3010 min read

How AI is Transforming Science and Medicine

How AI is Transforming Science and Medicine

How AI is Transforming Science and Medicine

This article explores how AI is transforming science and medicine in 2025. From breakthroughs in protein engineering and brain mapping to outperforming doctors in clinical diagnosis, AI is becoming an active research partner and clinical assistant. It highlights key findings from Stanford’s AI Index Report, including the rise of virtual labs, predictive healthcare models, AI scribes, and the importance of ethical, inclusive, and regulated deployment.

Juan Manuel Ortiz de Zarate

Apr 1511 min read

Bringing Foundation Models to Small Data

Bringing Foundation Models to Small Data

Bringing Foundation Models to Small Data

This article explores TabPFN, a transformer-based foundation model designed for small tabular datasets. Trained on millions of synthetic datasets generated via structural causal models, TabPFN learns to predict labels through in-context learning. It outperforms traditional methods like CatBoost and XGBoost in both speed and accuracy, while offering robustness, interpretability, and fine-tuning capabilities. A breakthrough in tabular ML, it redefines what's possible on structu

Juan Manuel Ortiz de Zarate

Apr 1111 min read

bottom of page