top of page

Search


The Tiny Trick That Tamed Giant Language Models
LoRA transformed how we fine-tune massive AI models like GPT-3. By adding tiny low-rank matrices instead of retraining billions of parameters, it made adaptation faster, cheaper, and accessible to everyone. This article explores the origins, mechanics, and lasting impact of the “small trick” that tamed the giants of artificial intelligence.

Juan Manuel Ortiz de Zarate
Oct 1511 min read


Inference at Scale
This article explores how to optimize large language model inference at scale, detailing techniques such as quantization, pruning, distillation, attention and cache optimization, speculative decoding, and dynamic batching. It explains the architectural bottlenecks, trade-offs, and engineering practices that enable faster, cheaper, and more efficient deployment of LLMs in real-world systems.

Juan Manuel Ortiz de Zarate
Oct 89 min read


A Foundation for Agent Collaboration
This article explores the Model Context Protocol (MCP), a standardized interface that enables AI agents to dynamically discover and invoke external tools. It covers MCP’s architecture, real-world applications, and security risks across its lifecycle. By decoupling tool logic from AI behavior, MCP empowers agents to perform complex workflows with greater flexibility, setting a foundation for the next generation of tool-integrated AI systems.

Juan Manuel Ortiz de Zarate
Jul 259 min read


Saving AI from Itself: How to Prevent Model Collapse
Active Inheritance curates synthetic data to control LLM behavior, preventing AI model collapse and improving diversity, safety, and bias.

Juan Manuel Ortiz de Zarate
Feb 68 min read


Handling missing data
Effectively handling missing data with univariate and multivariate imputation ensures reliable analysis and accurate machine learning models

Juan Manuel Ortiz de Zarate
Sep 10, 202411 min read


Retrieval Augmented Generation: Increasing knowledge of your LLM
Dive into the world of Retrieval-Augmented Generation! See how RAG transforms AI responses by blending retrieval with generation.

Juan Manuel Ortiz de Zarate
May 24, 20249 min read


MLFlow + Hydra: A Framework for Experimentation with Python
In this article I share a experimentation framework I work with in my daily job. It uses MLFlow and Hydra to facilitate hypothesis testing.
Cristian Cardellino
May 23, 202410 min read


Tracking Multiple Experiments with Hydra
In this article we'll explore Hydra, a tool for managing multiple configuration parameters and file when doing machine learning for research
Cristian Cardellino
May 5, 202410 min read


Deploying Machine Learning Models with FastAPI and Docker
In this article, we will see how we can leverage FastAPI and Docker to build a wrapper REST API around ML models and deploy it.
Cristian Cardellino
Mar 15, 202412 min read


Practical Machine Learning for Industry
In this article I'll share with you some advice based on personal experience when doing practical machine learning for the industry.
Cristian Cardellino
Feb 22, 202411 min read
bottom of page