A Brief Introduction to Mixtures-of-Experts
- Cristian Cardellino
- Mar 26, 2024
- 8 min read
Updated: Apr 4, 2024
Near the end of 2023, there was a buzz around the French Startup company Mistral [1] after they released an open-source model rivaling ChatGPT in performance. In particular, one of their more powerful models is named "Mixtral of Experts" [2]. This model is called a "Sparse Mixture-of-Experts" model (or SMoE), but what is that? In this article, we will explore the Mixture-of-Experts models and discuss the idea behind the gating mechanism used by the Sparse Mixture-of-Experts. We will also discuss the use of Mixture-of-Experts models in the Transformer architecture.
Want to read more?
Subscribe to transcendent-ai.com to keep reading this exclusive post.