GPT | Transcendent AI

The Tiny Trick That Tamed Giant Language Models

LoRA transformed how we fine-tune massive AI models like GPT-3. By adding tiny low-rank matrices instead of retraining billions of parameters, it made adaptation faster, cheaper, and accessible to everyone. This article explores the origins, mechanics, and lasting impact of the “small trick” that tamed the giants of artificial intelligence.

Juan Manuel Ortiz de Zarate

Oct 1511 min read

Inference at Scale

This article explores how to optimize large language model inference at scale, detailing techniques such as quantization, pruning, distillation, attention and cache optimization, speculative decoding, and dynamic batching. It explains the architectural bottlenecks, trade-offs, and engineering practices that enable faster, cheaper, and more efficient deployment of LLMs in real-world systems.

Juan Manuel Ortiz de Zarate

Oct 89 min read

Understanding the ChatGPT Revolution

ChatGPT, adopted by 10% of adults globally, now sees over 70% non-work usage. Dominant topics include practical guidance, info seeking, and writing, with writing prominent in work. It offers significant value in decision support. The gender gap in usage has narrowed, and growth is high in lower-income countries. This was analyzed using privacy-preserving methods on billions of messages.

Juan Manuel Ortiz de Zarate

Sep 1811 min read

Unveiling the Enigma of AI Hallucinations

Large Language Models hallucinate because training and evaluation reward guessing over admitting uncertainty. Errors stem statistically from pretraining (binary classification). They persist as most post-training evaluations use binary scoring, penalizing "I don't know" responses and incentivizing confident falsehoods. The proposed solution is a socio-technical modification: adjust existing benchmarks with explicit confidence targets to foster more trustworthy AI by rewardin

Juan Manuel Ortiz de Zarate

Sep 1112 min read

How Bigger Models Get Better

This article explores the groundbreaking findings of Kaplan et al. on scaling laws for neural language models. It explains how model performance improves predictably with increased model size, dataset size, and compute budget, highlighting power-law relationships. The piece discusses implications for efficient AI training, optimal resource allocation, overfitting avoidance, and future research directions.

Juan Manuel Ortiz de Zarate

Apr 3010 min read