MDP Value Iteration — Grid World
Discount γ:
Noise:
Run Iteration
Reset
Auto
Iter: 0
Click: cycle reward/wall/clear
Value iteration solves the Bellman optimality equation V*(s) = max_a Σ P(s'|s,a)[R + γV*(s')] by repeated sweeps. The optimal policy arrows emerge as value propagates from high-reward states.