Reward Shaping — Potential-Based F(s,s')
Φ strength:
8
γ:
0.95
Run Both Agents
Reset
F(s,s') = γΦ(s') − Φ(s) — policy-invariant shaping that accelerates learning