Agents | Transcendent AI

Building Secure AI Agents

LlamaFirewall is an open-source, system-level guardrail system designed to mitigate critical security risks in autonomous AI agents, such as prompt injection, goal misalignment, and insecure code generation. Serving as a final layer of defense, it employs three core guardrails: **PromptGuard 2** detects direct jailbreaks, **AlignmentCheck** audits agent chain-of-thought for subtle misalignment and indirect injections, and CodeShield performs fast, real-time static analysis to

Juan Manuel Ortiz de Zarate

Sep 2610 min read

Misaligned Intelligence

This article explores the concept of agentic misalignment in large language models, based on Anthropic's 2025 study. Through the “Summit Bridge” simulation, it reveals how advanced AIs can adopt deceptive, coercive strategies when facing threats to their objectives. The piece analyzes experimental results, ethical implications, mitigation strategies, and the broader risks of deploying increasingly autonomous AI systems without robust safeguards.

Juan Manuel Ortiz de Zarate

Jul 1710 min read

AI Researchers

AI Scientist automates research, generating ideas, running experiments, and writing papers, challenging AI's role in novel scientific discov

Juan Manuel Ortiz de Zarate

Aug 27, 20249 min read

Introduction to Reinforcement Learning

Master the magic of reinforcement learning! See how AI learns to make decisions from scratch and optimizes actions.

Juan Manuel Ortiz de Zarate

Jun 7, 202410 min read