top of page

Search


The Lightning Mind
DeepSeek-V3.2-Exp introduces a new sparse-attention system that lets large language models handle ultra-long contexts efficiently. Using a “lightning indexer” to select only the most relevant tokens, it cuts computation costs while preserving reasoning power. The result is a faster, cheaper, and more cognitively elegant AI that learns what to ignore, bringing machine focus closer to human intelligence.

Juan Manuel Ortiz de Zarate
Oct 22, 20259 min read


The Tiny Trick That Tamed Giant Language Models
LoRA transformed how we fine-tune massive AI models like GPT-3. By adding tiny low-rank matrices instead of retraining billions of parameters, it made adaptation faster, cheaper, and accessible to everyone. This article explores the origins, mechanics, and lasting impact of the “small trick” that tamed the giants of artificial intelligence.

Juan Manuel Ortiz de Zarate
Oct 16, 202511 min read


Inference at Scale
This article explores how to optimize large language model inference at scale, detailing techniques such as quantization, pruning, distillation, attention and cache optimization, speculative decoding, and dynamic batching. It explains the architectural bottlenecks, trade-offs, and engineering practices that enable faster, cheaper, and more efficient deployment of LLMs in real-world systems.

Juan Manuel Ortiz de Zarate
Oct 8, 20259 min read


Training-Efficient RL
Inefficient Reinforcement Fine-tuning (RFT) relies on heuristic metrics. The GAIN-RL framework utilizes angle concentration, an intrinsic model signal from token hidden states, which correlates directly with gradient strength and learning capacity. GAIN-RL dynamically selects data for consistently impactful updates. It achieves over 2.5x training acceleration and superior performance using only half the original data.

Juan Manuel Ortiz de Zarate
Oct 3, 202510 min read


Building Secure AI Agents
LlamaFirewall is an open-source, system-level guardrail system designed to mitigate critical security risks in autonomous AI agents, such as prompt injection, goal misalignment, and insecure code generation. Serving as a final layer of defense, it employs three core guardrails: **PromptGuard 2** detects direct jailbreaks, **AlignmentCheck** audits agent chain-of-thought for subtle misalignment and indirect injections, and CodeShield performs fast, real-time static analysis to

Juan Manuel Ortiz de Zarate
Sep 26, 202510 min read


Understanding the ChatGPT Revolution
ChatGPT, adopted by 10% of adults globally, now sees over 70% non-work usage. Dominant topics include practical guidance, info seeking, and writing, with writing prominent in work. It offers significant value in decision support. The gender gap in usage has narrowed, and growth is high in lower-income countries. This was analyzed using privacy-preserving methods on billions of messages.

Juan Manuel Ortiz de Zarate
Sep 18, 202511 min read
bottom of page