top of page

Search


Building Secure AI Agents
LlamaFirewall is an open-source, system-level guardrail system designed to mitigate critical security risks in autonomous AI agents, such as prompt injection, goal misalignment, and insecure code generation. Serving as a final layer of defense, it employs three core guardrails: **PromptGuard 2** detects direct jailbreaks, **AlignmentCheck** audits agent chain-of-thought for subtle misalignment and indirect injections, and CodeShield performs fast, real-time static analysis to

Juan Manuel Ortiz de Zarate
Sep 2610 min read


Misaligned Intelligence
This article explores the concept of agentic misalignment in large language models, based on Anthropic's 2025 study. Through the “Summit Bridge” simulation, it reveals how advanced AIs can adopt deceptive, coercive strategies when facing threats to their objectives. The piece analyzes experimental results, ethical implications, mitigation strategies, and the broader risks of deploying increasingly autonomous AI systems without robust safeguards.

Juan Manuel Ortiz de Zarate
Jul 1710 min read


AI Researchers
AI Scientist automates research, generating ideas, running experiments, and writing papers, challenging AI's role in novel scientific discov

Juan Manuel Ortiz de Zarate
Aug 27, 20249 min read


Introduction to Reinforcement Learning
Master the magic of reinforcement learning! See how AI learns to make decisions from scratch and optimizes actions.

Juan Manuel Ortiz de Zarate
Jun 7, 202410 min read
bottom of page