top of page

Search


The Checklist Shortcut to Smarter, Safer AI
This article explores Reinforcement Learning from Checklist Feedback (RLCF), a new approach that replaces reward models with checklists to align large language models. By breaking instructions into clear, verifiable steps, checklists provide richer, more interpretable feedback and consistently improve performance across benchmarks. The piece examines how this shift could make AI more reliable, transparent, and user-aligned.

Juan Manuel Ortiz de Zarate
Sep 412 min read


Understanding what the ML models have learned
Models could spread bias and discrimination if you don't know what they have learned. Here we show a technique to prevent it.

Juan Manuel Ortiz de Zarate
Aug 2, 202410 min read
bottom of page