Inverse Reinforcement Learning

Recovering Reward Functions from Expert Behavior

Demos: 0
IRL iterations done: 0
Reward correlation: —
IRL (MaxEnt / gradient-based): match feature expectations between expert and learned policy. True reward shown left; recovered reward shown right. Expert paths shown in yellow.
True Reward (hidden from IRL)
Expert Trajectories
Recovered Reward (IRL output)