Q-Learning Gridworld
Model-Free Off-Policy Control with ε-Greedy Exploration
Grid:
6
×
6
α (learning rate):
0.10
γ (discount):
0.95
ε (explore):
0.20
Speed (ep/frame):
5
Edit Mode
Toggle Wall
Set Goal (+5)
Set Trap (−2)
Set Start
Run Q-Learning
Step Episode
Reset
Episodes: 0
Steps/ep: —
Avg Reward: —
Color = max Q-value per cell. Arrows show greedy policy. Click grid to edit. Agent path shown in teal.