Markov Decision Processes
Value Iteration & Policy Iteration on a Grid World
Algorithm
Value Iteration
Policy Iteration
Discount γ:
0.90
Grid Size:
6
Edit Mode
Place Reward (+1)
Place Penalty (−1)
Toggle Wall
Set Start
Set Goal
Run Algorithm
Step
Reset Grid
Click Run to start
Iterations: 0
Max ΔV: —
Click the grid to edit. Arrows show optimal policy. Color shows state values.
Goal
Penalty
Wall