top of page

Search


The Tiny Trick That Tamed Giant Language Models
LoRA transformed how we fine-tune massive AI models like GPT-3. By adding tiny low-rank matrices instead of retraining billions of parameters, it made adaptation faster, cheaper, and accessible to everyone. This article explores the origins, mechanics, and lasting impact of the “small trick” that tamed the giants of artificial intelligence.

Juan Manuel Ortiz de Zarate
Oct 15, 202511 min read


Inference at Scale
This article explores how to optimize large language model inference at scale, detailing techniques such as quantization, pruning, distillation, attention and cache optimization, speculative decoding, and dynamic batching. It explains the architectural bottlenecks, trade-offs, and engineering practices that enable faster, cheaper, and more efficient deployment of LLMs in real-world systems.

Juan Manuel Ortiz de Zarate
Oct 8, 20259 min read


Training-Efficient RL
Inefficient Reinforcement Fine-tuning (RFT) relies on heuristic metrics. The GAIN-RL framework utilizes angle concentration, an intrinsic model signal from token hidden states, which correlates directly with gradient strength and learning capacity. GAIN-RL dynamically selects data for consistently impactful updates. It achieves over 2.5x training acceleration and superior performance using only half the original data.

Juan Manuel Ortiz de Zarate
Oct 3, 202510 min read


Building Secure AI Agents
LlamaFirewall is an open-source, system-level guardrail system designed to mitigate critical security risks in autonomous AI agents, such as prompt injection, goal misalignment, and insecure code generation. Serving as a final layer of defense, it employs three core guardrails: **PromptGuard 2** detects direct jailbreaks, **AlignmentCheck** audits agent chain-of-thought for subtle misalignment and indirect injections, and CodeShield performs fast, real-time static analysis to

Juan Manuel Ortiz de Zarate
Sep 26, 202510 min read


Understanding the ChatGPT Revolution
ChatGPT, adopted by 10% of adults globally, now sees over 70% non-work usage. Dominant topics include practical guidance, info seeking, and writing, with writing prominent in work. It offers significant value in decision support. The gender gap in usage has narrowed, and growth is high in lower-income countries. This was analyzed using privacy-preserving methods on billions of messages.

Juan Manuel Ortiz de Zarate
Sep 18, 202511 min read


Unveiling the Enigma of AI Hallucinations
Large Language Models hallucinate because training and evaluation reward guessing over admitting uncertainty. Errors stem statistically from pretraining (binary classification). They persist as most post-training evaluations use binary scoring, penalizing "I don't know" responses and incentivizing confident falsehoods. The proposed solution is a socio-technical modification: adjust existing benchmarks with explicit confidence targets to foster more trustworthy AI by rewardin

Juan Manuel Ortiz de Zarate
Sep 11, 202512 min read
bottom of page