A Brief Introduction to Mixtures-of-Experts

Cristian Cardellino
Mar 26, 2024
8 min read

Updated: Apr 4, 2024

Near the end of 2023, there was a buzz around the French Startup company Mistral [1] after they released an open-source model rivaling ChatGPT in performance. In particular, one of their more powerful models is named "Mixtral of Experts" [2]. This model is called a "Sparse Mixture-of-Experts" model (or SMoE), but what is that? In this article, we will explore the Mixture-of-Experts models and discuss the idea behind the gating mechanism used by the Sparse Mixture-of-Experts. We will also discuss the use of Mixture-of-Experts models in the Transformer architecture.

Want to read more?

Subscribe to transcendent-ai.com to keep reading this exclusive post.

Subscribe Now