Mixture of Experts — Routing & Load Balancing
Experts:
8
Top-k:
2
Capacity:
2.0
Route Tokens
Reset
Gating network selects top-k experts per token — capacity factor limits expert overflow