Q-Learning Gridworld

Grid: 6×6 α (learning rate): 0.10 γ (discount): 0.95 ε (explore): 0.20 Speed (ep/frame): 5 Edit Mode

Episodes: 0
Steps/ep: —
Avg Reward: —

Color = max Q-value per cell. Arrows show greedy policy. Click grid to edit. Agent path shown in teal.