Inverse Reinforcement Learning

Grid size: 6 Expert demos: 10 IRL iterations: 50 Expert scenario

Demos: 0
IRL iterations done: 0
Reward correlation: —

IRL (MaxEnt / gradient-based): match feature expectations between expert and learned policy. True reward shown left; recovered reward shown right. Expert paths shown in yellow.

True Reward (hidden from IRL)

Expert Trajectories

Recovered Reward (IRL output)