Reward Shaping — Potential-Based F(s,s')

Φ strength:8

γ:0.95

F(s,s') = γΦ(s') − Φ(s) — policy-invariant shaping that accelerates learning