Inverse Reinforcement Learning
Recovering Reward Functions from Expert Behavior
Grid size:
6
Expert demos:
10
IRL iterations:
50
Expert scenario
Simple Goal-Seeking
Goal + Obstacle Avoidance
Terrain Preference
Generate Expert Demos
Run IRL
Reset
Demos: 0
IRL iterations done: 0
Reward correlation: —
IRL (MaxEnt / gradient-based): match feature expectations between expert and learned policy. True reward shown left; recovered reward shown right. Expert paths shown in yellow.
True Reward (hidden from IRL)
Expert Trajectories
Recovered Reward (IRL output)