top of page

Search


The Tiny Trick That Tamed Giant Language Models
LoRA transformed how we fine-tune massive AI models like GPT-3. By adding tiny low-rank matrices instead of retraining billions of parameters, it made adaptation faster, cheaper, and accessible to everyone. This article explores the origins, mechanics, and lasting impact of the “small trick” that tamed the giants of artificial intelligence.

Juan Manuel Ortiz de Zarate
Oct 1511 min read


Inference at Scale
This article explores how to optimize large language model inference at scale, detailing techniques such as quantization, pruning, distillation, attention and cache optimization, speculative decoding, and dynamic batching. It explains the architectural bottlenecks, trade-offs, and engineering practices that enable faster, cheaper, and more efficient deployment of LLMs in real-world systems.

Juan Manuel Ortiz de Zarate
Oct 89 min read


Understanding the ChatGPT Revolution
ChatGPT, adopted by 10% of adults globally, now sees over 70% non-work usage. Dominant topics include practical guidance, info seeking, and writing, with writing prominent in work. It offers significant value in decision support. The gender gap in usage has narrowed, and growth is high in lower-income countries. This was analyzed using privacy-preserving methods on billions of messages.

Juan Manuel Ortiz de Zarate
Sep 1811 min read


Unveiling the Enigma of AI Hallucinations
Large Language Models hallucinate because training and evaluation reward guessing over admitting uncertainty. Errors stem statistically from pretraining (binary classification). They persist as most post-training evaluations use binary scoring, penalizing "I don't know" responses and incentivizing confident falsehoods. The proposed solution is a socio-technical modification: adjust existing benchmarks with explicit confidence targets to foster more trustworthy AI by rewardin

Juan Manuel Ortiz de Zarate
Sep 1112 min read


How Bigger Models Get Better
This article explores the groundbreaking findings of Kaplan et al. on scaling laws for neural language models. It explains how model performance improves predictably with increased model size, dataset size, and compute budget, highlighting power-law relationships. The piece discusses implications for efficient AI training, optimal resource allocation, overfitting avoidance, and future research directions.

Juan Manuel Ortiz de Zarate
Apr 3010 min read
bottom of page