REINFORCE Policy Gradient

Monte Carlo Policy Gradient — Pendulum Balancing

Episodes: 0
Avg Return: —
θ[0]: —
θ[1]: —
Pendulum: state (angle θ, ω). Action: push left/right. Policy: softmax over linear features. REINFORCE updates θ using ∇log π · G.