Mixture of Experts — Routing & Load Balancing

8
2
2.0
Gating network selects top-k experts per token — capacity factor limits expert overflow