Practical Machine Learning | Transcendent AI

The Tiny Trick That Tamed Giant Language Models

LoRA transformed how we fine-tune massive AI models like GPT-3. By adding tiny low-rank matrices instead of retraining billions of parameters, it made adaptation faster, cheaper, and accessible to everyone. This article explores the origins, mechanics, and lasting impact of the “small trick” that tamed the giants of artificial intelligence.

Juan Manuel Ortiz de Zarate

Oct 15, 202511 min read

Inference at Scale

This article explores how to optimize large language model inference at scale, detailing techniques such as quantization, pruning, distillation, attention and cache optimization, speculative decoding, and dynamic batching. It explains the architectural bottlenecks, trade-offs, and engineering practices that enable faster, cheaper, and more efficient deployment of LLMs in real-world systems.

Juan Manuel Ortiz de Zarate

Oct 8, 20259 min read

A Foundation for Agent Collaboration

This article explores the Model Context Protocol (MCP), a standardized interface that enables AI agents to dynamically discover and invoke external tools. It covers MCP’s architecture, real-world applications, and security risks across its lifecycle. By decoupling tool logic from AI behavior, MCP empowers agents to perform complex workflows with greater flexibility, setting a foundation for the next generation of tool-integrated AI systems.

Juan Manuel Ortiz de Zarate

Jul 25, 20259 min read