TD(0) Learning
Bootstrapped Value Estimation — Random Walk
States N:
7
α (TD step):
0.10
α_MC (MC step):
0.05
γ:
1.00
Speed:
5
ep/frame
Run
Reset
Episodes: 0
TD RMSE: —
MC RMSE: —
Random walk with absorbing boundaries (left=0 reward, right=1 reward). True values are linear 1/(N+1)…N/(N+1).
TD(0)
Monte Carlo
True V