Reward Shaping — Potential-Based F(s,s')

8
0.95
F(s,s') = γΦ(s') − Φ(s) — policy-invariant shaping that accelerates learning